Notes for Developers

Adding dependencies to a requirements file.

Simply add the dependencies to the relevant requirements.*.in file, and run:

./requirements/manage-deps lock requirements/requirements.*.in # Where * is the name of a specific file

Warning

DO NOT add dependencies to requirements.in.txt, general dependencies should go in requirements.base.in.

Updating dependencies in the main requirements.in file, and creating the frozen requirements.txt file.

requirements.in.txt contains all the dependencies specified in the requirements files, while requirements.txt contains the all the frozen dependencies.

To create this file, simply run:

./requirements/manage-deps lock

This should be run after a new dependency has been added to one of the requirements files.

Upgrading frozen requirements.

To bump up the versions of all the packages in a frozen requirements.*.txt file, run:

./requirements/manage-deps upgrade-all

Upgrading the gRPC protobuf files

Buildgrid’s gRPC stubs are not built as part of installation. Instead, they are precompiled and shipped with the source code. The protobufs are prone to compatibility-breaking changes, so we update them manually.

First bring the updated proto file into the source tree. For example, if updating the remote execution proto, replace the old buildgrid/_protos/build/bazel/remote/execution/v2/remote_execution.proto with the newer one.

Then, compile the protobufs. Continuing with the remote_execution.proto example…

pip install grpcio grpcio-tools grpc-stubs mypy-protobuf
python -m grpc_tools.protoc -Ibuildgrid/_protos/ --python_out=buildgrid/_protos --grpc_python_out=buildgrid/_protos --mypy_out=buildgrid/_protos build/bazel/remote/execution/v2/remote_execution.proto

The most important thing here is to make sure that buildgrid/_protos is in your include path with -Ibuildgrid/_protos. If all goes well, the new files should have been generated.

As of this writing, protoc does not support producing relative imports in the modules that it generates 1, which is needed by BuildGrid. Therefore a manual adjustment is needed to replace includes such as:

-from build.bazel.remote.execution.v2 import remote_execution_pb2 as ...
-from google.longrunning import operations_pb2 as ...
+from buildgrid._protos.build.bazel.remote.execution.v2 import remote_execution_pb2 as ...
+from buildgrid._protos.google.longrunning import operations_pb2 as ...

That is, include paths to other protobuf modules should be made relative to buildgrid._protos.

1

https://github.com/protocolbuffers/protobuf/issues/1491

Modifying the database schema

The database models are stored in buildgrid/server/persistence/sql/models.py. This is the source of truth for the database schema, and this file is what needs to be updated in order to modify the schema.

To update the schema, make any needed changes to the models.py file. Then, you need to generate a new revision and test the revision against a database. The easiest way to do this is probably the postgres docker image (https://hub.docker.com/_/postgres).

Now, install alembic (with pip install alembic) and modify alembic.ini (the file in the root of this repository) to point at our dockerized postgres database by editing the sqlalchemy.url field.

Then, upgrade the database to the latest pre-revision state. Run this from the repository root.

Now, we can finally generate a new revision. Run this from the repository root.

This will generate a new revision file in buildgrid/server/persistence/sql/alembic/versions/ that contains the difference between your old database and the new, updated model.

Implementing a data store for the Scheduler

The buildgrid.server.scheduler.Scheduler can be configured to use one of multiple backends. Currently in-memory and SQL-based backends are available and supported.

It is possible to implement a new backend, by implementing the data store interface (buildgrid.server.persistence.interface.DataStoreInterface) and adding a class to buildgrid/_app/settings/parser.py to allow the interface to be configured.

The implementation is free to decide how to persist the data, as long as all of the abstract methods of DataStoreInterface are implemented. The implementations are only required to exist as specified; the details are left up to the author. This allows implementations to make unneccessary methods do nothing for example, as long as the expected data type is returned. There may be nothing to do for enabling/disabling monitoring for some implementations for example.

Backend implementations are also required to implement some way of triggering the events that are used when streaming updates to clients. These are instances of buildgrid.utils.TypedEvent, and notify_change should be used to indicate an update message should be sent, and notify_stop should be used to indicate that the thread handling the stream should check whether the client is connected, and stop if not. Implementations should only really need to use notify_change, as the disconnect logic is in the DataStoreInterface already.

Both existing implementations start a thread which periodically checks the state of the data for jobs that are being watched and compares it with the previous state. If it detects a change, then it calls notify_change on the relevant event, which is stored in the buildgrid.utils.JobWatchSpec for the affected job, which is in the self.watched_jobs dictionary. Adding/removing entries from this dictionary is handled by the DataStoreInterface class, so doesn’t need to be a concern for implementations.

The class to parse a YAML tag to allow the data store implementation to be configured should inherit from YamlFactory and have a __new__ method which returns an instance of the implementation. There are no other limitations on what it should and shouldn’t do. In order to allow the parser to understand the tag, the get_parser function at the bottom of parser.py should also be modified to add a constructor for the new tag.

Working with timestamps and timezones in Buildgrid

Currently, if the Index is enabled, Buildgrid will store timestamps in the Index Database. These timestamps are stored as timezone-unaware objects in the database. This means that the timestamps do not have any accompaning timezone information. As convention, Buildgrid treats all timestamps as UTC time.

This results in some important considerations one must make when contributing to Buildgrid. You should always default to using UTC time when dealing with timestamps. Not doing so can break behavior in Buildgrid which requires proper ordering of timestamps. This also means that all timestamp objects should also be timezone-unaware. If you use timezone-aware objects, some libraries like SQLAlchemy will convert them to local time before comparing them with timezone-unaware objects. This can break systems which rely on accurate timestamps in Buildgrid.

Consequently, when contributing, please verify that if datetime objects are being used, they are UTC time and timezone-unaware. For example, if you wish to get the current time, you should always be using datetime.utcnow(). Using variants which include timezone information can create subtle bugs!

Additionally, when updating timestamp-sensitive code, it is always best practice to write thorough unit tests. Even if the change may seem trivial, unit tests can reveal hidden assumptions you are making.