Configuration ============= .. _manual-configuration: Manually deploying a BuildGrid ------------------------------ To get anything done, you first need to have a PostgreSQL database available with the migrations from ``data/revisions/all.sql`` applied. Configuration File ~~~~~~~~~~~~~~~~~~ If you'd like to get started, use the ``buildgrid/data/config/all-in-one.yml`` as an example configuration. Copy the contents of ``buildgrid/data/config/all-in-one.yml`` into a file called ``config.yml``, and edit the ``connection-string`` option to point to your database. To start BuildGrid with this configuration, run: .. code-block:: sh bgd server start --verbose /path/to/config.yml See the `reference-configuration`_ section to learn more about this file. For now, we will continue setting up BuildGrid for work. Setting up a bot ~~~~~~~~~~~~~~~~ Now we will need a worker. The recommended worker to use with BuildGrid is `buildbox-worker`_. This worker works best when used alongside a local CAS cache called `buildbox-casd`_. First, build these tools following the instructions in their READMEs. Then, start the CAS cache. .. code-block:: sh buildbox-casd --cas-remote=http://localhost:50051 --bind=127.0.0.1:50011 ~/casd & Once CASD is running we can start the worker itself, pointing it to CASD for CAS requests. .. code-block:: sh buildbox-worker --buildbox-run=buildbox-run-hosttools --bots-remote=http://localhost:50051 \ --cas-remote=http://127.0.0.1:50011 --request-timeout=30 my_bot We should be able to see this worker connecting as log messages for ``CreateBotSession`` and ``UpdateBotSession`` requests in the server logs. .. _buildbox-worker: https://gitlab.com/BuildGrid/buildbox/buildbox/-/blob/master/worker/ .. _buildbox-casd: https://gitlab.com/BuildGrid/buildbox/buildbox/-/master/casd/ Without CASD '''''''''''' .. warning:: Whilst this approach has less moving parts, it **will** make your build slower due to needing to freshly fetch the input root for every Action rather than keeping a local cache. With large input roots, this will completely wipe out any benefits gained by using remote execution. Production deployments should use ``buildbox-casd``. ``buildbox-worker`` supports running without ``buildbox-casd`` by pointing it to the remote CAS rather than the local CASD, although this isn't recommended due to the additional network load it will lead to. When running in this configuration, its important to tell the runner command to not use the LocalCAS protocol to stage the input root. .. code-block:: sh buildbox-worker --buildbox-run=buildbox-run-hosttools --bots-remote=http://localhost:50051 \ --cas-remote=http://localhost:50051 --request-timeout=30 --runner-arg=--disable-localcas my_bot .. _reference-configuration: Reference configuration ----------------------- Below is an example of the full configuration reference: .. literalinclude:: ../../../buildgrid/server/app/settings/reference.yml :language: yaml See the :ref:`Parser API reference ` for details on the tagged YAML nodes in this configuration. .. _deployment-architecture: Deployment Architecture ----------------------- BuildGrid is designed for flexibility in deployment topology. It can be configured with any combination of the supported services in a single server configuration. Due to BuildGrid's use of a thread pool for handling gRPC requests, along with the Python `GIL`_, it is sensible to split up services into several processes to scale concurrent connection counts. With the exception of the Build Events related services, each service is horizontally scalable to support running multiple processes for the same service across multiple machines. The recommended split is as follows: 1. Action Cache, ByteStream, and CAS 2. Execution, Operations, and Introspection 3. BotsInterface .. _GIL: https://docs.python.org/3/glossary.html#term-global-interpreter-lock Action Cache, ByteStream, and CAS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. graphviz:: :align: center digraph cas_process_config { rankdir = LR; bgcolor = "#fcfcfc"; graph [ fontname = "Verdana", fontsize = 10, ]; node [ style = filled, shape = box, fontname = "Verdana", fontsize = 10 ]; subgraph cluster_cas { bgcolor = "#eaeaea"; style = "dashed"; subgraph cluster_storage { color = "#f4f4f4"; style = filled; label = "Storage backends" node [ fillcolor = "#bbf0c3", fontcolor = "#294a2e", color = "#294a2e" ]; edge [ color = "#294a2e" ]; CAS [ label=< v
ShardedStorage
RedisIndex
SizeDifferentiatedStorage
SQLStorage S3Storage
>, ]; } subgraph cluster_caches { color = "#f4f4f4"; style = filled; label = "Cache backends" node [ fillcolor = "#bbf0c3", fontcolor = "#294a2e", color = "#294a2e" ]; edge [ color = "#294a2e" ]; caches [ label=<
ShardedActionCache
RedisActionCache
>, ]; } subgraph cluster_services { color = "#f4f4f4"; style = filled; label = "gRPC Services" node [ color = lightgrey ]; ByteStream [ label = "ByteStream" ]; cas [ label = "CAS" ]; actioncache [ label = "Action Cache" ]; } label = "`bgd server` process"; } S3 [ shape = "cylinder" ]; PostgreSQL [ shape = "cylinder" ]; Redis [ shape = "cylinder" ]; caches:redis -> Redis; CAS:redis -> Redis; CAS:sql -> PostgreSQL; CAS:s3 -> S3; cas -> CAS:sharded; ByteStream -> CAS:sharded; actioncache -> caches:sharded; {rank=same Redis PostgreSQL S3} } This configuration specifies all the services needed for cache-only usage. The exact choice of storage backends to use is dependent on your expected workloads and availability of other services. Using an index somewhere in the stack is strongly recommended to support handling ``FindMissingBlobs`` without querying the actual storage. The Redis index used in this example is more performant than the SQL index, but likely requires sharding to scale to production workloads. As build workflows often involve many small blobs and a small number of much larger blobs, it can be beneficial to store smaller blobs in a faster storage location. In this example we use ``SizeDifferentiatedStorage`` to direct small blobs to an ``SQLStorage``, whilst large blobs are stored in a slower but significantly larger ``S3Storage``. It may be beneficial to add a cache layer using ``WithCacheStorage`` for particularly slow storage backends, although this backend doesn't support any special routing to reduce duplication of storage. The Action Cache backends are separate to the storage backends, though they can reference them. Again the choice here depends on service availability and desired cache behaviour. An ``S3ActionCache`` will be slow but large and resilient, whereas an ``LRUActionCache`` will be fast but small and short-lived. The ``RedisActionCache`` used here provides a middle-ground and is generally the best option for most workloads. Execution, Operations, and Introspection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. graphviz:: :align: center digraph execution_process_config { rankdir = LR; bgcolor = "#fcfcfc"; graph [ fontname = "Verdana", fontsize = 10, ]; node [ style = filled, shape = box, fontname = "Verdana", fontsize = 10 ]; subgraph cluster_cas { bgcolor = "#eaeaea"; style = "dashed"; subgraph cluster_storage { color = "#f4f4f4"; style = filled; label = "Storage backends" node [ fillcolor = "#bbf0c3", fontcolor = "#294a2e", color = "#294a2e" ]; edge [ color = "#294a2e" ]; cas_remote [ label = "Remote" ]; } subgraph cluster_caches { color = "#f4f4f4"; style = filled; label = "Cache backends" node [ fillcolor = "#bbf0c3", fontcolor = "#294a2e", color = "#294a2e" ]; edge [ color = "#294a2e" ]; cache_remote [ label = "RemoteActionCache" ]; } subgraph cluster_schedulers { color = "#f4f4f4"; style = filled; label = "Schedulers" node [ fillcolor = "#bbf0c3", fontcolor = "#294a2e" color = "#294a2e" ]; scheduler [ label = "Scheduler" ]; } subgraph cluster_connections { color = "#f4f4f4"; style = filled; label = "Connections" node [ fillcolor = "#bbf0c3", fontcolor = "#294a2e" color = "#294a2e" ]; sql_writeable [ label = "SQL (read/write)" ]; sql_read_only [ label = "SQL (read-only)" ]; sql_listen_notify [ label = "SQL (notifiers)" ]; } subgraph cluster_services { color = "#f4f4f4"; style = filled; label = "gRPC Services" node [ color = lightgrey ]; execution [ label = "Execution" ]; operations [ label = "Operations" ]; cas [ label = "CAS" ]; introspection [ label = "Introspection" ]; } label = "`bgd server` process"; } PostgreSQL [ shape = "cylinder" ]; BuildGridCAS [ label = "BuildGrid CAS"; shape = "cylinder"; ]; cache_remote -> BuildGridCAS; cas_remote -> BuildGridCAS; cas -> cas_remote; execution -> scheduler; operations -> scheduler; introspection -> scheduler; scheduler -> cas_remote; scheduler -> sql_writeable; scheduler -> sql_read_only; scheduler -> sql_listen_notify; scheduler -> cache_remote; sql_writeable -> PostgreSQL; sql_read_only -> PostgreSQL; sql_listen_notify -> PostgreSQL; {rank=same PostgreSQL BuildGridCAS} } This configuration specifies the services needed to support the client-side parts of remote execution. The Execution service uses its configured ``Scheduler`` to queue incoming jobs in the database for assignment. When using PostgreSQL as in this example, the ``Scheduler`` will use LISTEN/NOTIFY to listen for updates to job state, which will be reported back to clients by the Execution service. The Operations service and Introspection service are mainly for querying for information regarding the current internal state of BuildGrid. The Operations service is also used for requesting cancellation of a previously queued job. All three of these services use the ``Scheduler``, which in turn uses up to three different SQL connection configurations. This allows for example sending read-only traffic to a read-only database replica, or using an external connection pool such as PGBouncer for regular queries whilst maintaining an in-process pool for the long running connections used for LISTEN/NOTIFY. The ``Scheduler`` also needs access to a cache backend and a storage backend. The ``RemoteActionCache`` and ``RemoteStorage`` backends exist to support splitting the configuration like this, and in this example should be pointed to a BuildGrid running the Action Cache, ByteStream, and CAS configuration above. BotsInterface ~~~~~~~~~~~~~ .. graphviz:: :align: center digraph bots_process_config { rankdir = LR; bgcolor = "#fcfcfc"; graph [ fontname = "Verdana", fontsize = 10, ]; node [ style = filled, shape = box, fontname = "Verdana", fontsize = 10 ]; subgraph cluster_cas { bgcolor = "#eaeaea"; style = "dashed"; subgraph cluster_storage { color = "#f4f4f4"; style = filled; label = "Storage backends" node [ fillcolor = "#bbf0c3", fontcolor = "#294a2e", color = "#294a2e" ]; edge [ color = "#294a2e" ]; cas_remote [ label = "Remote" ]; } subgraph cluster_caches { color = "#f4f4f4"; style = filled; label = "Cache backends" node [ fillcolor = "#bbf0c3", fontcolor = "#294a2e", color = "#294a2e" ]; edge [ color = "#294a2e" ]; cache_remote [ label = "RemoteActionCache" ]; } subgraph cluster_schedulers { color = "#f4f4f4"; style = filled; label = "Schedulers" node [ fillcolor = "#bbf0c3", fontcolor = "#294a2e" color = "#294a2e" ]; scheduler [ label = "Scheduler" ]; } subgraph cluster_connections { color = "#f4f4f4"; style = filled; label = "Connections" node [ fillcolor = "#bbf0c3", fontcolor = "#294a2e" color = "#294a2e" ]; sql_writeable [ label = "SQL (read/write)" ]; sql_read_only [ label = "SQL (read-only)" ]; sql_listen_notify [ label = "SQL (notifiers)" ]; } subgraph cluster_services { color = "#f4f4f4"; style = filled; label = "gRPC Services" node [ color = lightgrey ]; bots [ label = "Bots" ]; } label = "`bgd server` process"; } PostgreSQL [ shape = "cylinder" ]; BuildGridCAS [ label = "BuildGrid CAS"; shape = "cylinder"; ]; cache_remote -> BuildGridCAS; cas_remote -> BuildGridCAS; bots -> scheduler; scheduler -> cas_remote; scheduler -> sql_writeable; scheduler -> sql_read_only; scheduler -> sql_listen_notify; scheduler -> cache_remote; sql_writeable -> PostgreSQL; sql_read_only -> PostgreSQL; sql_listen_notify -> PostgreSQL; {rank=same PostgreSQL BuildGridCAS} } This configuration is just for a BotsInterface, the server side of the RWAPI. It is very similar to the Execution configuration, using a ``Scheduler`` which has all the same configuration options as before. This ``Scheduler`` also has configuration for an assigner thread, which periodically fetches the next job in the queue and attempts to assign it to an available bot. This thread could be in the Execution process instead, with the same functionality. A ``Scheduler`` used just for the RWAPI side like this also uses LISTEN/NOTIFY when given a PostgreSQL database. In this case it is used to listen for assignment of work to a connected bot. Bots can long-poll when sending ``UpdateBotSession`` and ``CreateBotSession`` requests, returning immediately when work is assigned.