.. _data-model:
Internal Data Model
===================
.. _reapi-data-model-mapping:
REAPI
-----
The remote execution API has a concept of an ``Operation``. These reflect
the state of the work requested in an ``Execute`` `request`_. A stream of
``Operation`` messages are returned by both ``Execute`` and ``WaitExecute``
requests. ``WaitExecute`` requests take the name of an operation to stream
updates of, implying a need to track operations in the server.
.. _request: https://github.com/bazelbuild/remote-apis/blob/b5123b1bb2853393c7b9aa43236db924d7e32d61/build/bazel/remote/execution/v2/remote_execution.proto#L106
In BuildGrid, the state of each ``Operation`` is tracked across both the
:class:`buildgrid.server.job.Job` class, and the ``Operation`` protobuf
objects in the ``operations_by_name`` attribute of that class.
When an update is to be communicated to the peer (client) for a specific
operation, the data in the ``Job`` is combined with the data already in the
``Operation``, and the resulting ``Operation`` message is sent to the peer.
The Job abstraction exists for a couple of main reasons:
- Allows us to deduplicate work by tying multiple operations to the same
actual execution task
- Allows us to tie together the REAPI ``Operation`` concept with the RWAPI
``Lease`` concept.
In addition to tracking the various operations and lease(s) for the work,
the job class stores the ``Action`` being executed by the relevant ``Execute``
request plus some other attributes, such as its priority and
`platform requirements`_.
.. _platform requirements: https://github.com/bazelbuild/remote-apis/blob/b5123b1bb2853393c7b9aa43236db924d7e32d61/build/bazel/remote/execution/v2/remote_execution.proto#L564
The requirements are used for scheduling work to workers which provide an
environment that matches the constraints set by the peer.
.. _reapi-data-flow:
Handling an Execute request
~~~~~~~~~~~~~~~~~~~~~~~~~~~
This diagram shows how the data in an ``Execute`` request is split up
within BuildGrid, for a request to execute an ``Action`` that isn't
already queued or executing. The data from the ``Job`` is combined
with the relevant ``Operation`` in update messages streamed back to
the peer.
.. graphviz::
:align: center
digraph reapi_data_flow {
bgcolor="#fcfcfc";
rankdir=LR;
graph [fontsize=10 fontname="Verdana" compound=true]
node [shape=box fontsize=10 fontname="Verdana"];
edge [fontsize=10 fontname="Verdana"];
subgraph cluster_bgd_reapi {
label="BuildGrid Execution Service";
style="dashed";
node [shape=box];
servicer -> job, operation [
label="Creates"
];
operation -> job [
style="dashed"
label="Some state in"
];
{ rank=same; job, operation }
job, operation [
style=filled;
];
job [
label=<
Job
- action
- execute_response
- operation_stage
- priority
- platform_requirements
>
];
operation [
label=<
Operation
- name
- done
- cancelled
>
];
servicer [
label="ExecutionService"
shape="circle"
];
}
execute -> servicer [
dir="both"
lhead=cluster_bgd_reapi
label="Send Execute request\nStream Operation messages"
];
execute [
label="Peer\ne.g. recc, bazel, bst"
];
}
If the request is for an ``Action`` already queued or executing, the creation
of the ``Job`` is skipped in favour of updating the priority of the job if
needed.
In the case of a ``WaitExecute`` request, neither the ``Job`` or the ``Operation``
are created. Instead a message queue for the peer is created to get updates from
the specified ``Operation``.
.. _rwapi-data-model-mapping:
RWAPI
-----
The remote worker API has a concept of a ``Lease``, which contains the state
of a given task being executed by a worker. This is implemented fairly
straightforwardly in BuildGrid: a worker requests a new ``Lease`` from the
server, and the server finds a ``Job`` in the queue with requirements that
match the capabilities advertised by the worker. The server then creates a
``Lease`` for this job, and sends it to the worker in the response.
The ``Lease`` message contains a ``payload`` field, which BuildGrid will
populate with the ``Job``'s ``Action`` message. [#]_
All the state of the ``Lease`` is in the ``Lease`` objects themselves rather
than some being in the ``Job`` instead. Each ``Job`` has the capacity to track
multiple leases, to handle retrying.
.. [#] Previously this field was filled with the ``Digest`` of the ``Action``,
which required a worker to fetch the latter from the CAS.
.. _rwapi-create-data-flow:
Handling a CreateBotSession request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The initial connection from a worker to BuildGrid should be a ``CreateBotSession``
request. In BuildGrid, this will start tracking the bot for metrics and then
looking for queued jobs that match the platform properties for that worker.
If a job is found, a ``Lease`` is created and the response sent, and the job
state is updated to reflect that its now being worked on.
.. graphviz::
:align: center
digraph rwapi_data_flow {
bgcolor="#fcfcfc";
rankdir=LR;
graph [fontsize=10 fontname="Verdana" compound=true]
node [shape=box fontsize=10 fontname="Verdana"];
edge [fontsize=10 fontname="Verdana"];
subgraph cluster_bgd_rwapi {
label="BuildGrid Bots Service"
style="dashed";
node [shape=box];
servicer -> job [
dir="both"
label="Search,\nUpdate"
];
servicer -> lease [
label="Create"
];
lease -> job [
style="dotted"
label="Relates to"
];
{ rank=same; job, lease }
servicer [
label="BotsService"
shape="circle"
];
job, lease [
style="filled"
];
job [
label=<
Job
- action
- execute_response
- operation_stage
- priority
- platform_requirements
>
];
lease [
label=<
Lease
- id
- state
- status
>
];
}
worker -> servicer [
label="Send CreateBotSession request"
lhead="cluster_bgd_rwapi"
];
worker [
label="Worker\ne.g. buildbox-worker"
];
}
.. _rwapi-update-data-flow:
Handling an UpdateBotSession request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The subsequent connections should be ``UpdateBotSession`` requests. Internally,
these requests are handled very similarly. There is an initial step of checking
the state of any leases held by the bot, and updating the internal representation
to match. If the change implies a change to the job state, that is also updated
here.
After that, if the bot needs a new lease, BuildGrid looks for a queued job in
the same way as before, and adds the any new lease to the response.
.. graphviz::
:align: center
digraph rwapi_data_flow {
bgcolor="#fcfcfc";
rankdir=LR;
graph [fontsize=10 fontname="Verdana" compound=true]
node [shape=box fontsize=10 fontname="Verdana"];
edge [fontsize=10 fontname="Verdana"];
subgraph cluster_bgd_rwapi {
label="BuildGrid Bots Service"
style="dashed";
node [shape=box];
servicer -> job [
dir="both"
label="Search,\nUpdate"
];
servicer -> lease [
dir="both"
label="Create,\nUpdate"
];
lease -> job [
style="dotted"
label="Relates to"
];
{ rank=same; job, lease }
servicer [
label="BotsService"
];
job, lease [
style="filled"
];
job [
label=<
Job
- action
- execute_response
- operation_stage
- priority
- platform_requirements
>
];
lease [
label=<
Lease
- id
- state
- status
>
];
}
worker -> servicer [
label="Send UpdateBotSession request"
lhead="cluster_bgd_rwapi"
];
worker [
label="Worker\ne.g. buildbox-worker"
];
}