To improve the performance of CAS-related requests like
it is possible to configure an “indexed” CAS. This also facilitates intelligent
cleanup of blobs, currently only for S3.
This index keeps track of all the blobs that are currently stored in CAS, as well as the last time they were accessed at.
In a configuration file, the CAS index acts just like another storage backend definition. It fully implements the same API interface as the other storage backends, so you can just pass it to the service definition as with any other storage implementation.
For example, here is a basic disk-based CAS configuration
server: - !channel port: 50051 insecure-mode: true authorization: method: none monitoring: enabled: false instances: - name: '' description: | The unique '' instance. storages: - !disk-storage &disk-store path: example-cas/ services: - !cas storage: *disk-store - !bytestream storage: *disk-store
To add an index to this, we simply need to add a CAS index to the list of storages, point it at the disk storage, and then use the index in the service definitions rather than the disk storage.
server: - !channel port: 50051 insecure-mode: true authorization: method: none monitoring: enabled: false instances: - name: '' description: | The unique '' instance. storages: - !disk-storage &disk-store path: example-cas/ - !sql-index &cas-index storage: *disk-store connection_string: sqlite:///./index.example.db automigrate: yes services: - !cas storage: *cas-index - !bytestream storage: *cas-index
If adding an index to an existing storage which is used by another service
which stores digests, such as an ActionCache or Asset API, either those
services need to be cleared or the
fallback_on_get config option needs
to be temporarily set to True on the index.
By default, the index will report that a blob is missing if it isn’t in the
index, whether it exists in storage or not.
fallback_on_get is a pretty
big burden on performance so is off by default, and should ideally not be
left on permanently.
The CAS index implementation is pluggable, similarly to the storage implementation. Currently only an SQL-based implementation exists, but it is possible to write a custom index implementation without too much effort.
As the name suggests, the SQL index uses an SQL database to store the index. It is tested to support both SQLite and PostgreSQL as the database, and theoretically could support any database backend that SQLAlchemy supports (providing that the database has support for window functions).
The storage instance and connection string must be provided, but the other parameters have defaults that should be functional.
buildgrid.server.cas.storage.index.sql.SQLIndexusing the tag
- !sql-index # This assumes that a storage instance is defined elsewhere # with a `&cas-storage` anchor storage: *cas-storage connection_string: postgresql://bgd:insecure@database/bgd automigrate: yes window-size: 1000 inclause-limit: -1 fallback-on-get: no
buildgrid.server.cas.storage.storage_abc.StorageABC) – Instance of storage to use. This must be a storage object constructed using a YAML tag ending in
-storage, for example
connection_string (str) – SQLAlchemy connection string
automigrate (bool) – Attempt to automatically upgrade an existing DB schema to the newest version.
window_size (uint) – Maximum number of blobs to fetch in one SQL operation (larger resultsets will be automatically split into multiple queries)
inclause_limit (int) – If nonnegative, overrides the default number of variables permitted per “in” clause. See the buildgrid.server.cas.storage.index.sql.SQLIndex comments for more details.
fallback_on_get (bool) – By default, the SQL Index only fetches blobs from the underlying storage if they’re present in the index on
bulk_read_blobsrequests to minimize interactions with the storage. If this is set, the index instead checks the underlying storage directly on
bulk_read_blobsrequests, then loads all blobs found into the index.