Internal Data Model
REAPI
The remote execution API has a concept of an Operation
. These reflect
the state of the work requested in an Execute
request. A stream of
Operation
messages are returned by both Execute
and WaitExecute
requests. WaitExecute
requests take the name of an operation to stream
updates of, implying a need to track operations in the server.
In BuildGrid, the state of each Operation
is tracked across both the
buildgrid.server.job.Job
class, and the Operation
protobuf
objects in the operations_by_name
attribute of that class.
When an update is to be communicated to the peer (client) for a specific
operation, the data in the Job
is combined with the data already in the
Operation
, and the resulting Operation
message is sent to the peer.
The Job abstraction exists for a couple of main reasons:
Allows us to deduplicate work by tying multiple operations to the same actual execution task
Allows us to tie together the REAPI
Operation
concept with the RWAPILease
concept.
In addition to tracking the various operations and lease(s) for the work,
the job class stores the Action
being executed by the relevant Execute
request plus some other attributes, such as its priority and
platform requirements.
The requirements are used for scheduling work to workers which provide an environment that matches the constraints set by the peer.
Handling an Execute request
This diagram shows how the data in an Execute
request is split up
within BuildGrid, for a request to execute an Action
that isn’t
already queued or executing. The data from the Job
is combined
with the relevant Operation
in update messages streamed back to
the peer.
If the request is for an Action
already queued or executing, the creation
of the Job
is skipped in favour of updating the priority of the job if
needed.
In the case of a WaitExecute
request, neither the Job
or the Operation
are created. Instead a message queue for the peer is created to get updates from
the specified Operation
.
RWAPI
The remote worker API has a concept of a Lease
, which contains the state
of a given task being executed by a worker. This is implemented fairly
straightforwardly in BuildGrid: a worker requests a new Lease
from the
server, and the server finds a Job
in the queue with requirements that
match the capabilities advertised by the worker. The server then creates a
Lease
for this job, and sends it to the worker in the response.
The Lease
message contains a payload
field, which BuildGrid will
populate with the Job
’s Action
message. [1]
All the state of the Lease
is in the Lease
objects themselves rather
than some being in the Job
instead. Each Job
has the capacity to track
multiple leases, to handle retrying.
which required a worker to fetch the latter from the CAS.
Handling a CreateBotSession request
The initial connection from a worker to BuildGrid should be a CreateBotSession
request. In BuildGrid, this will start tracking the bot for metrics and then
looking for queued jobs that match the platform properties for that worker.
If a job is found, a Lease
is created and the response sent, and the job
state is updated to reflect that its now being worked on.
Handling an UpdateBotSession request
The subsequent connections should be UpdateBotSession
requests. Internally,
these requests are handled very similarly. There is an initial step of checking
the state of any leases held by the bot, and updating the internal representation
to match. If the change implies a change to the job state, that is also updated
here.
After that, if the bot needs a new lease, BuildGrid looks for a queued job in the same way as before, and adds the any new lease to the response.