CAS
CAS, or Content Addressable Storage, is the service responsible for
persistently storing data in BuildGrid. The data stored in CAS must exist
for the lifespan of a job from submission through to the ActionResult being
returned to the client. Cleanup and other TTLs should be configured accordingly
and this requirement should be kept in mind when determining storage size
requirements.
In order to benefit from remote caching, blobs in CAS will need a longer minimum lifespan than this. The exact lifespan to aim for is dictated by your storage availability and use-case.
Cleanup
CAS object lifespans are enforced by the CAS cleanup daemon. This tool is
configured with a minimum blob age for deletion, a high watermark, and a low
watermark. Cleanup starts when the total CAS size exceeds the high watermark,
and ends when the size is below the low watermark. Blobs with a recorded access
time more recent than the minimum blob age (only-if-unused-for) will not be
deleted.
Usage
To run the cleanup daemon,
bgd cleanup --high-watermark 10G --low-watermark 7.5G --batch-size 100M \
--sleep-interval 10 deployment.yml
The batch size and high/low water mark parameters take numbers in bytes. Shorthands for kB, MB, GB, and TB are available as K, M, G, and T respectively, as seen in the example.
The batch size is the minimum amount of space cleared in one go. The cleanup tool will try to remain as close as possible to the configured batch size, but depending on the size of blobs in the CAS will sometimes delete more than the specified batch at a time.
A smaller batch size adds more load to the database and the storage backend, but space will start to be actually cleared faster than with large batch sizes.
If the batch size is larger than the difference between the current CAS size and the low water mark, then the whole set of deletions required will be done in one batch.
The sleep interval is the time in seconds to sleep after checking whether the CAS size has reached the configured high water mark. A lower sleep interval means a more reactive cleanup, at the cost of more database load.
The configuration file used should contain the index and backend storage definitions. The easiest way to achieve this is to just use the same config file that was used to deploy the indexed CAS in the first place.
It should be noted that if monitoring is configured in the provided config file (see Monitoring and Metrics) then any metrics produced by the cleanup tool will be published in the configured place. If that shouldn’t be the same place as the indexed CAS metrics for whatever reason then the config will need to be changed.
Warning
If using a !with-cache storage type and a non-distributed storage type,
such as !lru-storage, the caches will not be cleaned up along with the
backing storage. In rare cases this can cause issues. To minimize this
issue, the configured cache size across all BuildGrids should be smaller
than the configured low watermark.
Index
In order to use the cleanup daemon, an index is currently needed. BuildGrid supports a CAS index stored in either Redis or PostgreSQL.
This index also provides performance improvements for FindMissingBlobs
requests, allowing the CAS to perform existence checking without using
potentially slow storage services.
Configuration
In a configuration file, the CAS index acts just like another storage backend definition. It fully implements the same API interface as the other storage backends, so you can just pass it to the service definition as with any other storage implementation.
For example, here is a basic disk-based CAS configuration
server:
- !channel
address: localhost:50051
insecure-mode: true
authorization:
method: none
monitoring:
enabled: false
instances:
- name: ''
description: |
The unique '' instance.
storages:
- !disk-storage &disk-store
path: example-cas/
services:
- !cas
storage: *disk-store
- !bytestream
storage: *disk-store
To add an index to this, we simply need to add a CAS index to the list of storages, point it at the disk storage, and then use the index in the service definitions rather than the disk storage.
server:
- !channel
address: localhost:50051
insecure-mode: true
authorization:
method: none
monitoring:
enabled: false
connections:
- !sql-connection &sql
connection-string: postgresql://bgd:insecure@database/bgdindex
pool-size: 5
pool-timeout: 30
max-overflow: 10
storages:
- !disk-storage &disk-store
path: example-cas/
- !sql-index &cas-index
sql: *sql
storage: *disk-store
instances:
- name: ''
description: |
The unique '' instance.
services:
- !cas
storage: *cas-index
- !bytestream
storage: *cas-index
thread-pool-size: 100
Warning
Adding a new, empty index to an existing storage is the same as deleting everything from the storage from a client perspective. If the storage is used by another service which stores digests, such as an ActionCache or Asset API, the contents of those services will also be lost.
Consider a migration phase using a !replicated-storage wrapper to work
around this issue.
Index Types
The CAS index implementation is pluggable, similarly to the storage implementation. Currently both Redis and SQL are supported as storage indices.
Redis
The Redis Index uses Redis to store TTLs for CAS objects. The keys themselves are used to determine existence of a blob in CAS, and the TTL of the key is set to the expiry time of the blob. The value is set to a dummy value of 1, since we only care about existence checking and TTLs.
- buildgrid.server.app.settings.parser.load_redis_index(storage: StorageABC, redis: RedisProvider, prefix: str | None = None) RedisIndex
Generates
buildgrid.server.cas.storage.index.redis.RedisIndexusing the tag!redis-index.- Usage
- !redis-index # This assumes that a storage instance is defined elsewhere # with a `&cas-storage` anchor storage: *cas-storage redis: *redis prefix: "B"
- Parameters:
storage (
buildgrid.server.cas.storage.storage_abc.StorageABC) – Instance of storage to use. This must be a storage object constructed using a YAML tag ending in-storage, for example!disk-storage.redis (
buildgrid.server.redis.provider.RedisProvider) – A configured Redis connection manager. This must be an object with an!redis-connectionYAML tag.prefix (str) – An optional prefix to use to prefix keys written by this index. If not specified a prefix of “A” is used.
SQL
As the name suggests, the SQL index uses a PostgreSQL database to store the index.
The storage instance and connection string must be provided, but the other parameters have defaults that should be functional.
- buildgrid.server.app.settings.parser.load_sql_index(storage: StorageABC, sql: SqlProvider, sql_ro: SqlProvider | None = None, window_size: int = 1000, inclause_limit: int = -1, max_inline_blob_size: int = 0, refresh_accesstime_older_than: int = 0) SQLIndex
Generates
buildgrid.server.cas.storage.index.sql.SQLIndexusing the tag!sql-index.- Usage
- !sql-index # Assuming the YAML anchors are defined elsewhere in the config file storage: *cas-storage sql: *sql sql-ro: *readonly-sql window-size: 1000 inclause-limit: -1 max-inline-blob-size: 256 refresh-accesstime-older-than: 0
- Parameters:
storage (
buildgrid.server.cas.storage.storage_abc.StorageABC) – Instance of storage to use. This must be a storage object constructed using a YAML tag ending in-storage, for example!disk-storage.sql (
buildgrid.server.sql.provider.SqlProvider) – A configured SQL connection manager. This must be an object with an!sql-connectionYAML tag.sql_ro (
buildgrid.server.sql.provider.SqlProvider) – Similar to sql, but used for readonly backend transactions. If set, it should be configured with a replica of main DB using an optional but encouraged readonly role. Permission check is not executed by BuildGrid. If not set, readonly transactions are executed by sql object.window_size (uint) – Maximum number of blobs to fetch in one SQL operation (larger resultsets will be automatically split into multiple queries)
inclause_limit (int) – If nonnegative, overrides the default number of variables permitted per “in” clause. See the buildgrid.server.cas.storage.index.sql.SQLIndex comments for more details.
max_inline_blob_size (int) – Blobs of this size or smaller are stored directly in the index and not in the backing storage (must be nonnegative).
refresh-accesstime-older-than (int) – When reading a blob, its access timestamp will not be updated if the current time is not at least refresh-accesstime-older-than seconds newer than the access timestamp. Set this to reduce load associated with frequent timestamp updates.
Important
It is strongly recommended to set refresh-accesstime-older-than to
a reasonably large value to minimise database churn when using an SQLIndex.
See Lifespan for details on how this affects blob lifespans.
Changed in version 0.4.0: fallback_on_get functionality was removed. !replicated-storage
allows for clean migrations more generally.
Lifespan
The index implementions maintain access timestamps which are used by cleanup to
determine whether or not the blob is old enough to be deleted. The Redis index
updates this timestamp whenever the blob is referenced by a FindMissingBlobs
request, but the SQL index can be configured to skip this update sometimes, if
the timestamp is recent enough.
This configuration option (refresh-accesstime-older-than) has implications
on the guaranteed lifespan of blobs however, and should be taken into account
when configuring only-if-unused-for in the cleanup process.
Specifically, the last access could have occurred anywhere in the window of
refresh-accesstime-older-than seconds since the recorded access time.
|------ SQLIndex ------|
|--------------------------- Cleanup only-if-unused-for ----------------------------|
|-------------------- Guaranteed Lifespan -------------------|
This leaves the actual guaranteed lifespan of blobs equal to
l = only_if_unused_for - refresh_accesstime_older_than
FMB Cache
BuildGrid also has a RedisFMBCache storage type. This stores blob existence
information in Redis in a similar way to the RedisIndex, however it sets
the value of each digest’s key to 1 if the blob exists, and 0 if not.
Keys in the RedisFMBCache also expire after a configured TTL, which is only
refreshed when changing the value of the key in the cache. This is to force
requests to sometimes hit the underlying storage layer, to allow that storage
implementation to update its access time metadata for the blob if necessary.
This cache is transparent, and will gracefully fall back to the underlying
storages in the case of Redis unavailability. Whilst also doing the job of
speeding up FindMissingBlobs calls, this is a separate concept from the
index and cannot be used for cleanup. This cache is not an authoritative source
for the existence (or not) of blobs whose digests are not cached.
Lifespan
Adding this extra layer of caching complicates the minimum guaranteed lifespan calculation.
|- FMB Cache TTL -|
|------ SQLIndex ------|
|--------------------------- Cleanup only-if-unused-for ----------------------------|
|----------- Guaranteed Lifespan ----------|
The pathological case is a digest which is included in a FindMissingBlobs
call close to the end of the SQLIndex access timestamp update window. This will
cache the existence, but not update the timestamp. As such, there is a window
equal to the size of the FMB cache TTL during which we may have received a
request which should have updated the access timestamp but did not.
The guaranteed lifespan for a blob in CAS is now
l = only_if_unused_for - refresh_accesstime_older_than - fmb_cache_ttl
This should be accounted for when configuring cleanup.
As a worked example, say we want to guarantee that blobs in CAS are available
for a minimum of 24 hours after their last access. If we have our SQLIndex
configured to refresh access timestamps every 6 hours, and an FMB cache in
front of the index with a 1 hour TTL, we need to set only-if-unused-for to
31 hours when configuring cleanup.