What is BuildGrid?¶
BuildGrid is a Python remote execution service which implements Google’s Remote Execution API and the Remote Workers API. The project’s goal is to be able to execute build jobs remotely on a grid of computers in order to massively speed up build times. Workers on the grid should be able to run with different environments. It works with clients such as Bazel, BuildStream and RECC, and is designed to be able to work with any client that conforms to the above API protocols.
BuildGrid is designed to work with any worker conforming to the Remote Workers API specification. Workers actually execute the jobs on the backend while BuildGrid does the scheduling and storage. The BuildBox ecosystem provides a suite of workers and sandboxing tools that work with the Workers API and can be used with BuildGrid.
What’s Going On?¶
Recently we removed the internal communication between the ExecutionService and the BotsService in favour of database polling (or LISTEN/NOTIFY if using PostgreSQL). This allows scaling the services without adding a hard requirement for clients and workers to be connected to the same server. It also means that the services are now independently instantiable, so the ExecutionService can be deployed and scaled separately from the BotsService. One thing to note is that workers must always use the same BotsService that they initially connect to in order for expiry to work correctly, so care must be taken when setting up the worker side of things.
We’ve also finished adding support for an indexed CAS server to facilitate faster FindMissingBlobs() and CAS cleanup. See https://gitlab.com/BuildGrid/buildgrid/issues/181 for more details.
Next, we’re planning potential improvements to the scalability of BuildGrid by making the Execution and Bots services properly stateless, removing the need for a database in favour of using RabbitMQ to communicate updates and queue incoming work. Discussion around that can be found on the mailing list.
We’re also continuing to improve the performance of the existing implmentation, and planning the implementation of CAS expiry/cleanup.
See our release notes for the latest changes/updates.