buildgrid.server.persistence.interface module

class buildgrid.server.persistence.interface.DataStoreMetrics(*args, **kwargs)

Bases: dict

leases: Dict[LeaseState, int]
jobs: Dict[OperationStage, int]
class buildgrid.server.persistence.interface.DataStoreInterface(storage: StorageABC)

Bases: ABC

Abstract class defining an interface to a data store for the scheduler.

The DataStoreInterface defines the interface used by the scheduler to manage storage of its internal state. It also provides some of the infrastructure for streaming messages triggered by changes in job and operation state, which can be used (via the buildgrid.server.scheduler.Scheduler itself) to stream progress updates back to clients.

Implementations of the interface are required to implement all the abstract methods in this class. However, there is no requirement about the internal details of those implementations; it may be beneficial for certain implementations to make some of these methods a noop for example.

Implementations must also implement some way to set the events that are used in stream_operations_updates, which live in the watched_jobs dictionary.

setup_grpc() None
start(*, start_job_watcher: bool = True) None
stop() None
set_instance_name(instance_name: str) None
set_action_browser_url(url: str) None
abstract activate_monitoring() None

Enable the monitoring features of the data store.

abstract deactivate_monitoring() None

Disable the monitoring features of the data store.

This method also performs any necessary cleanup of stored metrics.

abstract get_metrics() DataStoreMetrics | None

Return a dictionary of metrics for jobs, operations, and leases.

The returned dictionary is keyed by buildgrid._enums.MetricCategories values, and the values are dictionaries of counts per operation stage (or lease state, in the case of leases).

abstract create_job(job: Job) None

Add a new job to the data store.

NOTE: This method just stores the job in the data store. In order to enqueue the job to make it available for scheduling execution, the queue_job method should also be called.

Parameters:

job (buildgrid.server.job.Job) – The job to be stored.

abstract queue_job(job_name: str) None

Add an existing job to the queue of jobs waiting to be assigned.

This method adds a job with the given name to the queue of jobs. If the job is already in the queue, then this method ensures that the position of the job in the queue is correct.

abstract store_response(job: Job, commit_changes: bool = True) Dict[str, Any] | None

Store the job’s ExecuteResponse in the data store.

This method stores the response message for the job in the data store, in order to allow it to be retrieved when getting jobs in the future.

This is separate from update_job as implementations will likely need to always have a special case for handling persistence of the response message.

Parameters:

job (buildgrid.server.job.Job) – The job to store the response message of.

abstract get_job_by_action(action_digest: Digest, *, max_execution_timeout: int | None = None) Job | None

Return the job corresponding to an Action digest.

This method looks for a job object corresponding to the given Action digest in the data store. If a job is found it is returned, otherwise None is returned.

Parameters:
  • action_digest (Digest) – The digest of the Action to find the corresponding job for.

  • max_execution_timeout (int, Optional) – The max execution timeout.

Returns:

The job with the given Action digest, if it exists. Otherwise None.

Return type:

buildgrid.server.job.Job or None

abstract get_job_by_name(name: str, *, max_execution_timeout: int | None = None) Job | None

Return the job with the given name.

This method looks for a job with the specified name in the data store. If there is a matching Job it is returned, otherwise this returns None.

Parameters:
  • name (str) – The name of the job to return.

  • max_execution_timeout (int, Optional) – The max execution timeout.

Returns:

The job with the given name, if it exists. Otherwise None.

Return type:

buildgrid.server.job.Job or None

abstract get_job_by_operation(operation_name: str, *, max_execution_timeout: int | None = None) Job | None

Return the Job for a given Operation.

This method takes an Operation name, and returns the Job which corresponds to that Operation. If the Operation isn’t found, or if the data store doesn’t contain a corresponding job, this returns None.

Parameters:
  • operation (str) – Name of the Operation whose corresponding Job is to be returned.

  • max_execution_timeout (int, Optional) – The max execution timeout.

Returns:

The job related to the given operation, if it exists. Otherwise None.

Return type:

buildgrid.server.job.Job or None

abstract get_all_jobs() List[Job]

Return a list of all jobs in the data store.

This method returns a list of all incomplete jobs in the data store.

Returns:

List of all incomplete jobs in the data store.

Return type:

list

abstract get_jobs_by_stage(operation_stage: OperationStage) List[Job]

Return a list of jobs in the given stage.

This method returns a list of all jobs in a specific operation stage.

Parameters:

operation_stage (OperationStage) – The stage that the returned list of jobs should all be in.

Returns:

List of all jobs in the specified operation stage.

Return type:

list

abstract update_job(job_name: str, changes: Dict[str, Any], *, skip_notify: bool = False) None

Update a job in the data store.

This method takes a job name and a dictionary of changes to apply to the job in the data store, and updates the job with those changes. The dictionary should be keyed by the attribute names which need to be updated, with the values being the new values for the attributes.

Parameters:
  • job_name (str) – The name of the job that is being updated.

  • changes – (dict): The dictionary of changes

  • skip_notify – (bool): Whether notifying about job changes should be skipped

abstract delete_job(job_name: str) None

Delete a job from the data store.

This method removes a job from the data store.

Parameters:

job_name (str) – The name of the job to be removed.

watch_job(job: Job, operation_name: str, peer: str) None

Start watching a job and operation for changes.

If the given job is already being watched, then this method finds (or adds) the operation in the job’s entry in watched_jobs, and adds the peer to the list of peers for that operation.

Otherwise, it creates a whole new entry in watched_jobs for the given job, operation, and peer.

This method runs in a thread spawned by gRPC handling a connected peer.

Parameters:
  • job (buildgrid.server.job.Job) – The job to watch.

  • operation_name (str) – The name of the specific operation to watch.

  • peer (str) – The peer that is requesting to watch the job.

stream_operation_updates(operation_name: str, context: RpcContext, keepalive_timeout: int | None = None) Generator[Tuple[Exception | None, Operation], None, None]

Stream update messages for a given operation.

This is a generator which yields tuples of the form

(error, operation)

where error is None unless the job is cancelled, in which case error is a buildgrid._exceptions.CancelledError.

This method runs in a thread spawned by gRPC handling a connected peer, and should spend most of its time blocked waiting on an event which is set by either the thread which watches the data store for job updates or the main thread handling the gRPC termination callback.

Iteration finishes either when the provided gRPC context becomes inactive, or when the job owning the operation being watched is deleted from the data store.

Parameters:
  • operation_name (str) – The name of the operation to stream updates for.

  • context (grpc.ServicerContext) – The RPC context for the peer that is requesting a stream of events.

  • keepalive_timeout (int) – The maximum time to wait before sending the current status.

stop_watching_operation(job: Job, operation_name: str, peer: str) None

Remove the given peer from the list of peers watching the given job.

If the given job is being watched, this method triggers a JobEventType.STOP for it to cause the waiting threads to check whether their context is still active. It then removes the given peer from the list of peers watching the given operation name. If this leaves no peers then the entire entry for the operation in the tracked job is removed.

If this process leaves the job with no operations being watched, the job itself is removed from the watched_jobs dictionary, and it will no longer be checked for updates.

This runs in the main thread as part of the RPC termination callback for Execute and WaitExecution requests.

Parameters:
  • job (buildgrid.server.job.Job) – The job to stop watching.

  • operation_name (str) – The name of the specific operation to stop watching.

  • peer (str) – The peer that is requesting to stop watching the job.

abstract create_operation(operation_name: str, job_name: str, request_metadata: RequestMetadata | None = None, client_identity: ClientIdentityEntry | None = None) None

Add a new operation to the data store.

Parameters:
  • operation_name (str) – The name of the Operation to create in the data store.

  • job_name (str) – The name of the Job representing the execution of this operation.

  • request_metadata – Request metadata of the operation.

  • client_identity (Optional[ClientIdentityEntry]) – the identity of the client that creates this operation.

abstract get_operations_by_stage(operation_stage: OperationStage) Set[str]

Return a set of Job names in a specific operation stage.

Find the operations in a given stage and return a set containing the names of the Jobs related to those operations.

Parameters:

operation_stage (OperationStage) – The stage that the operations should be in.

Returns:

Set of all job names with operations in the specified state.

Return type:

set

abstract list_operations(operation_filters: List[OperationFilter], page_size: int | None = None, page_token: str | None = None, max_execution_timeout: int | None = None) Tuple[List[Operation], str]

Return all operations matching the filter.

Returns:

A page of matching operations in the data store. str: If nonempty, a token to be submitted by the requester for the next page of results.

Return type:

list

abstract update_operation(operation_name: str, changes: Dict[str, Any]) None

Update an operation in the data store.

This method takes an operation name and a dictionary of changes to apply to the operation in the data store, and updates the operation with those changes. The dictionary should be keyed by the attribute names which need to be updated, with the values being the new values for the attributes.

Parameters:
  • operation_name (str) – The name of the operation that is being updated.

  • changes – (dict): The dictionary of changes to be applied.

abstract delete_operation(operation_name: str) None

Delete a operation from the data store.

This method removes a operation from the data store.

Parameters:

operation_name (str) – The name of the operation to be removed.

abstract create_lease(lease: Lease) None

Add a new lease to the data store.

Parameters:

lease (Lease) – The Lease protobuf object representing the lease to be added to the data store.

abstract get_leases_by_state(lease_state: LeaseState) Set[str]

Return the set of IDs of leases in a given state.

Parameters:

lease_state (LeaseState) – The state that the leases should be in.

Returns:

Set of strings containing IDs of leases in the given state.

Return type:

set

abstract update_lease(job_name: str, changes: Dict[str, Any]) None

Update a lease in the data store.

This method takes a job name and a dictionary of changes to apply to the lease for that job in the data store, and updates the lease with those changes. The dictionary should be keyed by the attribute names which need to be updated, with the values being the new values for the attributes.

The job name is used as leases have no unique identifier; instead there should always be at most one active lease for the job. It is the responsibility of data store implementations to ensure this.

Parameters:
  • job_name (str) – The name of the job whose lease is being updated.

  • changes – (dict): The dictionary of changes to be applied.

abstract assign_n_leases(*, capability_hash: str, lease_count: int, assignment_callback: Callable[[List[Job]], Dict[str, Job]]) None

Attempt to assign Leases for several Jobs.

This method selects lease_count Jobs from the data store with platform properties matching the worker capabilities given by capability_hash.

The given assignment_callback function is called with this list of Jobs, and is responsible for actually passing the Jobs to workers and arranging for execution.

The assignment_callback function must return a dictionary mapping Job names to Job objects. This dictionary must contain all the Jobs which were successfully given to workers, and is used by the data store to remove assigned Jobs from the queue.

Parameters:
  • capability_hash (str) – The hash of the worker capabilities to use when selecting Jobs. This is matched to the hash of the Job’s platform properties, and so should be generated using buildgrid.utils.hash_from_dict() for consistency.

  • lease_count (int) – How many Leases we want to create. This specifies the maximum length of the list of Jobs passed to assignment_callback.

  • assignment_callback (callable) – Function which takes a list of Jobs to be assigned to workers, and returns a dictionary of Job.name -> Job containing the Jobs which were successfully assigned.

abstract get_operation_request_metadata_by_name(operation_name: str) Dict[str, Any] | None

Return a dictionary containing metadata information that was sent by a client as part of a remote_execution_pb2.RequestMetadata message.

It contains the following keys: {'tool-name', 'tool-version', 'invocation-id', 'correlated-invocations-id'}.

abstract get_operations_by_job(job_name: str) List[OperationModel]

Return a list of Operations associated with a job

Parameters:

job_name (str) – name of the job

Returns:

the list of Operations associated with the job

Return type:

List[Operation]