.. _architecture-overview:

Remote execution overview
=========================

Remote execution aims to speed up build processes by relying on two separate
but related concepts: remote caching and remote execution. Remote caching
allows users to share build outputs while remote execution allows the running
of operations on a remote cluster of machines which may be more powerful (or
different configurations) than what the user has access to locally.

The `Remote Execution API`_ (REAPI) describes a `gRPC`_ + `protocol-buffers`_
interface that has three main services for remote caching and execution:

- A ``ContentAddressableStorage`` (CAS) service: a remote storage end-point
  where content is addressed by digests, a digest being a pair of the hash and
  size of the data stored or retrieved.
- An ``ActionCache`` (AC) service: a mapping between build actions already
  performed and their corresponding resulting artifact (usually lives with the
  CAS service).
- An ``Execution`` service: the main end-point allowing one to request build
  job to be performed against the build farm.

The `Remote Worker API`_ (RWAPI) describes another `gRPC`_ + `protocol-buffers`_
interface that allows a central ``BotsService`` to manage a farm of pluggable workers.

BuildGrid is combining these two interfaces in order to provide a complete
remote caching and execution service. The high level architecture can be
represented like this:

.. graphviz::
   :align: center

    digraph remote_execution_overview {
	node [shape = record,
	      width=2,
	      height=1];

	ranksep = 2
	compound=true
	edge[arrowtail="vee"];
	edge[arrowhead="vee"];

	client [label = "Client",
	color="#0342af",
	fillcolor="#37c1e8",
	style=filled,
	shape=box]

	database [label = "Database",
	color="#8a2be2",
	fillcolor="#9370db",
	style=filled,
	shape=box]

	subgraph cluster_controller{
	    label = "Controller";
	    labeljust = "c";
	    fillcolor="#42edae";
	    style=filled;
	    controller [label = "{ExecutionService|BotsInterface\n}",
			fillcolor="#17e86a",
			style=filled];

	}

	subgraph cluster_worker0 {
	    label = "Worker 1";
	    labeljust = "c";
	    color="#8e7747";
	    fillcolor="#ffda8e";
	    style=filled;
	    bot0 [label = "{Bot|Host-tools}"
		  fillcolor="#ffb214",
		  style=filled];
	}

	subgraph cluster_worker1 {
	    label = "Worker 2";
	    labeljust = "c";
	    color="#8e7747";
	    fillcolor="#ffda8e";
	    style=filled;
	    bot1 [label = "{Bot|BuildBox}",
		  fillcolor="#ffb214",
		  style=filled];
	}

	client -> controller [
	    dir = "both",
	    headlabel = "REAPI",
	    labelangle = 20.0,
	    labeldistance = 9,
	    labelfontsize = 15.0,
	    lhead=cluster_controller];

	 database -> controller [
	    dir = "both",
	    headlabel = "SQL",
	    labelangle = 20.0,
	    labeldistance = 9,
	    labelfontsize = 15.0,
	    lhead=cluster_controller];

	controller -> bot0 [
	    dir = "both",
	    labelangle= 340.0,
		labeldistance = 7.5,
		labelfontsize = 15.0,
	    taillabel = "RWAPI     ",
	    lhead=cluster_worker0,
	    ltail=cluster_controller];

	controller -> bot1 [
	    dir = "both",
	    labelangle= 20.0,
	    labeldistance = 7.5,
	    labelfontsize = 15.0,
		taillabel = "     RWAPI",
	    lhead=cluster_worker1,
	    ltail=cluster_controller];

    }

BuildGrid can be split up into separate endpoints. It is possible to have
a separate ``ActionCache`` and ``CAS`` from the ``Controller``. The
following diagram shows a typical setup.

.. graphviz::
   :align: center

    digraph remote_execution_overview {

	node [shape=record,
	      width=2,
	      height=1];

	compound=true
	graph [nodesep=1,
	       ranksep=2]

	edge[arrowtail="vee"];
	edge[arrowhead="vee"];

	client [label="Client",
		color="#0342af",
		fillcolor="#37c1e8",
		style=filled,
		shape=box]

        database [label = "Database",
	          color="#8a2be2",
	          fillcolor="#9370db",
	          style=filled,
	          shape=box]

	cas [label="CAS",
	     color="#840202",
	     fillcolor="#c1034c",
	     style=filled,
	     shape=box]

	subgraph cluster_controller{
	    label="Controller";
	    labeljust="c";
	    fillcolor="#42edae";
	    style=filled;
	    controller [label="{ExecutionService|BotsInterface\n}",
			fillcolor="#17e86a",
			style=filled];

	}

	actioncache [label="ActionCache",
		     color="#133f42",
		     fillcolor="#219399",
		     style=filled,
		     shape=box]

	subgraph cluster_worker0 {
	    label="Worker";
	    labeljust="c";
	    color="#8e7747";
	    fillcolor="#ffda8e";
	    style=filled;
	    bot0 [label="{Bot}"
		  fillcolor="#ffb214",
		  style=filled];
	}

	client -> controller [
	    dir="both"];

	database -> controller [
	    dir="both"];

	client -> cas [
	    dir="both",
	    lhead=cluster_controller];

	controller -> bot0 [
	    dir="both",
	    lhead=cluster_worker0];
	    //ltail=cluster_controller];

	cas -> bot0 [
	    dir="both",
	    lhead=cluster_worker0];

	actioncache -> controller [
	    dir="both"];

	client -> actioncache [
	    dir="both",
	    constraint=false,
    ];


    }

The flow of BuildGrid requests
==============================

BuildGrid uses various threads to achieve different tasks.
The following diagram is an overview of the interactions between components of BuildGrid
in response to a GRPC Request.

The Light Green color is used to signify distinct threads, and entities outside of
the green boxes are shared among all threads.

.. graphviz::
   :align: center

	digraph buildgrid_overview {
		node [shape=record,
			width=2,
			height=1];

		fontsize=16;
		compound=true;
		graph [nodesep=0.1,
			   ranksep=0]

		edge [arrowtail="vee",
			arrowhead="vee",
			fontsize=16,
			fontcolor="#02075D",
			color="#02075D",];

		splines=polyline;
        rankdir=LR;

		subgraph cluster_clients{
			label="GRPC Clients\n(REAPI/RWAPI)";
			labeljust="c";
			fillcolor="#ffccdd";
			style=filled;
			clients [label="Remote Execution Clients|Bots|CAS Clients\n",
				fillcolor="#ff998e",
				style=filled]
		}

		subgraph cluster_bgd {
			label="BuildGrid Process";
			labeljust="c";
			fillcolor="#ffda8e";
			style=filled;

			subgraph cluster_bgd_services {
				label="BuildGrid Services";
				labeljust="c";
				fillcolor="#ffb214";
				fontsize=14;
				bgd_services [
					label="Execution|Bots|CAS\n",
					fillcolor="#ffb214",
					style=filled]
			}

			subgraph cluster_data {
				label="Persistent Data";
				labeljust="c";
				fillcolor="#9370db";
				data [label="CAS Backend";shape=cylinder;]
				data_store [label="DataStore";shape=cylinder;]
			}

			jobwatcher [
				label="Job Watcher Thread",
				labeljust="c",
				fillcolor="#42edae",
				fontsize=14,
				style=filled,
			]

			pruner [
				label="Scheduler Pruner\nThread",
				fillcolor="#42edae",
				fontsize=14,
				style=filled,
			]

			deferredwrites [
				label="Deferred Writer\nThreads",
				fillcolor="#42edae",
				fontsize=14,
				style=filled,
			]

			subgraph cluster_mainthread {
				label="Main Thread";
				fillcolor="#42edae";
				fontsize=14;
				subgraph cluster_asyncioloop {
					label="asyncio loop";
					labeljust="c";
					fillcolor="#00A572";
					style=filled;
					asyncio_loop [label="Log Writer|BotSession Reaper|State Monitoring Worker\n",
						fillcolor="#29AB87",
						style=filled];
				}
			}

			subgraph cluster_grpc {
				label="GRPC Thread";
				fillcolor="#42edae";
				fontsize=14;
				subgraph cluster_grpcserver{
					label="GRPC Server";
					labeljust="c";
					fillcolor="#37c1e8";
					style=filled;
					grpc_server [label="unary_unary|unary_stream\n",
						fillcolor="#37c1cc",
						style=filled];
				}

				grpccb [
					label="Pluggable\nTermination Callback\n(per request type)",
					fillcolor="#37c1cc",
					style=filled,
				]
			}

			subgraph cluster_grpcservicer {
				label="GRPC Servicer\n(Running within ThreadPool)\n`gRPC_Executor_n`";
				labeljust="c";
				fillcolor="#42edae";
				style=filled;
				fontsize=14;
				grpc_servicer [label="def Execute:\l|def WaitExecute:\l|def ...:\l",
					fillcolor="#17e86a",
					style=filled];
			}

			grpc_server -> grpc_servicer [
				dir="forward",
				label="2. ThreadPool.submit()",
				ltail=cluster_grpcserver,
				lhead=cluster_grpcservicer
			]

			grpc_servicer -> grpc_server [
				dir="forward",
				label="3. Prepares response",
				lhead=cluster_grpcserver,
				ltail=cluster_grpcservicer
			]

			grpc_server -> grpccb [
				dir="forward",
				label="5. Calls Termination Callback\n(optional)",
				lhead=cluster_grpcserver,
			]

		}

		subgraph cluster_metrics {
			label="Metrics Writer Process";
			labeljust="c";
			fillcolor="#ffda8e";
			style=filled;

			thread [
				label="Main Thread",
				fillcolor="#42edae",
				fontsize=14,
				style=filled,
			]
		}

		clients -> grpc_server [
			dir="forward",
			label="1.\ngrpc:Execute\lgrpc:WaitExecute\lgrpc:...\l",
			lhead=cluster_grpcserver,
			ltail = cluster_clients,
		];

		grpc_server -> clients[
			dir="forward",
			label="4. Sends Response",
			ltail=cluster_grpcserver,
			lhead = cluster_clients,
		];


		# Invisible edges to improve the layout
		bgd_services -> data [style=invis];
		asyncio_loop -> data_store [style=invis];
		data -> jobwatcher [style=invis];
		data -> pruner [style=invis];
		data -> deferredwrites [style=invis];

	}


Concurrency Model
=================

As is shown in the diagram above there are various approaches to concurrency
used in a ``bgd server`` process, and the purpose of everything isn't
necessarily immediately clear.

This section describes the concurrency model in more detail.

First, the two processes. The main ``bgd server`` process forks shortly after
startup, with the parent being the actual gRPC server process, and the child
being a separate process to handle publishing metrics.

.. _metrics-writer-process:

Metrics Writer Process
----------------------

This subprocess consumes ``LogRecord`` and ``MonitoringRecord`` messages from
a :class:`multiprocessing.Queue`. These messages are then processed and
published to the configured monitoring output, e.g. a StatsD server.

Messages are sent to the queue in the main BuildGrid process, by the methods
in :mod:`buildgrid.server.metrics_utils`.

This metrics writer lives in a subprocess because the required rate of metrics
throughput when BuildGrid is under load wasn't achievable using coroutines or
threads in the same process as the gRPC handler threads.


BuildGrid Process
-----------------

This is the main ``bgd server`` process. It uses both asyncio and threading
for concurrency, whilst it runs a gRPC server to handle incoming requests
to the configured BuildGrid services.


Coroutines
~~~~~~~~~~

There are potentially three long-running coroutines in a BuildGrid server.
There are also some other short-lived coroutines that run, as implementation
details of the `Janus`_ queue used for logging and for submitting metrics
asynchronously to the :ref:`Metrics Writer Process<metrics-writer-process>`.


Log Writer Coroutine
''''''''''''''''''''

This coroutine handles formatting and writing log messages to stdout.

In BuildGrid, the ``logging`` library gets configured to write logs into
a Janus queue. A Janus queue is a simple wrapper around a :class:`queue.Queue`
with both synchronous and asynchronous get/put methods.

The Log Writer coroutine asynchronously reads messages from this Janus
queue, and writes them to stdout. This approach is intended to ensure
that writing log messages doesn't become a bottleneck in a gRPC request
handler at some point.


State Monitoring Worker Coroutine
'''''''''''''''''''''''''''''''''

The State Monitoring Worker periodically inspects any Execution or Bots
services and related Schedulers, and generates metrics about their current
state. These metrics are then sent to the :class:`multiprocessing.Queue`
used by the Metrics Writer.

If the server doesn't have any Execution or Bots services, and doesn't have
any Schedulers configured either, then this coroutine will currently still
exist but won't actually do anything.


BotSession Reaper Coroutine
'''''''''''''''''''''''''''

.. note::
	This is only present when the server contains a Bots service.

The BotSession Reaper is part of the Bots service, and keeps track of when
specific workers were last seen by BuildGrid. If a worker fails to reconnect
by the agreed deadline, then this coroutine handles killing the relevant
BotSession and requeueing the job that was being executed by the missing
worker (if any).


Threads
~~~~~~~

There are a significant number of threads in use in a ``bgd server`` process.
The majority of these live inside the gRPC thread pool used to handle gRPC
requests concurrently, but there are a few threads that we create for both
background tasks and optimizing access to resources.


Main Thread
'''''''''''

This is the main execution thread. It contains the asyncio event loop which
runs the two coroutines noted above. It also starts the gRPC server, and
handles stopping the gRPC server cleanly.


gRPC Thread
'''''''''''

This is a daemon thread created by the gRPC server when ``Server.start`` is
called. This thread handles the actual running of the gRPC server, such as
receiving incoming requests and routing them to handler methods, and running
any callbacks when connections are closed.


gRPC Executor Threads
'''''''''''''''''''''

This is a number of threads in a ``ThreadPoolExecutor`` used by the gRPC
server to handle incoming requests. The number is configurable using the
``thread-pool-size`` configuration key.

When the gRPC server gets a new request, it locates the correct handler
method in the servicers that are registered with the server, and then
runs that handler in a thread from this pool.


Job Watcher Thread
''''''''''''''''''

.. note::
	This is only present when the server contains a Scheduler.

This thread is part of the data store implementations used by the Scheduler.

BuildGrid needs to keep track of all the jobs it is currently serving update
streams for (via either ``Execute`` or ``WaitExecution`` requests), and detect
changes in the state of those jobs so that the update messages can be sent.

Rather than having each handler thread monitor the database (or in-memory job
state) independently, the handlers register their job with the Job Watcher
thread, and wait for notification of updates.

The Job Watcher thread watches for updates in the data store (e.g. by using
PostgreSQL's ``LISTEN`` if available in the SQL implementation), and when it
finds an update to a job that is being watched will notify the relevant
handler threads of the change.


Scheduler Pruner Thread
'''''''''''''''''''''''

.. note::
	This is only present when the server contains an SQL Scheduler with
	pruning enabled in the configuration.

This thread is part of the SQL Scheduler data store implementation.

If pruning is enabled, then this thread is started to manage pruning the
``jobs`` table in the database, which otherwise will grow to very large
sizes without some external cleanup mechanism.


Deferred CAS Writer Threads
'''''''''''''''''''''''''''

.. note::
	These are only present when the server contains a ``!with-cache``
	storage configured with deferred writes enabled.

This is a number of threads which handle deferring writes of blobs to the
cache's fallback storage. It's another ``ThreadPoolExecutor``, this time
used by the implementation of the ``!with-cache`` storage backend to defer
the write of a blob to the configured fallback storage. This allows writes
to return early, after just the write to the cache layer which is likely
faster than the fallback.

The number of threads in this pool is configurable using the
``fallback-writer-threads`` configuration key in the ``!with-cache`` config
dictionary.


.. _Remote Execution API: https://github.com/bazelbuild/remote-apis/blob/master/build/bazel/remote/execution/v2
.. _gRPC: https://grpc.io
.. _protocol-buffers: https://developers.google.com/protocol-buffers
.. _Remote Worker API: https://github.com/googleapis/googleapis/tree/master/google/devtools/remoteworkers/v1test2
.. _Janus: https://pypi.org/project/janus/