CAS

CAS, or Content Addressable Storage, is the service responsible for persistently storing data in BuildGrid. The data stored in CAS must exist for the lifespan of a job from submission through to the ActionResult being returned to the client. Cleanup and other TTLs should be configured accordingly and this requirement should be kept in mind when determining storage size requirements.

In order to benefit from remote caching, blobs in CAS will need a longer minimum lifespan than this. The exact lifespan to aim for is dictated by your storage availability and use-case.

Cleanup

CAS object lifespans are enforced by the CAS cleanup daemon. This tool is configured with a minimum blob age for deletion, a high watermark, and a low watermark. Cleanup starts when the total CAS size exceeds the high watermark, and ends when the size is below the low watermark. Blobs with a recorded access time more recent than the minimum blob age (only-if-unused-for) will not be deleted.

Usage

To run the cleanup daemon,

bgd cleanup --high-watermark 10G --low-watermark 7.5G --batch-size 100M \
    --sleep-interval 10 deployment.yml

The batch size and high/low water mark parameters take numbers in bytes. Shorthands for kB, MB, GB, and TB are available as K, M, G, and T respectively, as seen in the example.

The batch size is the minimum amount of space cleared in one go. The cleanup tool will try to remain as close as possible to the configured batch size, but depending on the size of blobs in the CAS will sometimes delete more than the specified batch at a time.

A smaller batch size adds more load to the database and the storage backend, but space will start to be actually cleared faster than with large batch sizes.

If the batch size is larger than the difference between the current CAS size and the low water mark, then the whole set of deletions required will be done in one batch.

The sleep interval is the time in seconds to sleep after checking whether the CAS size has reached the configured high water mark. A lower sleep interval means a more reactive cleanup, at the cost of more database load.

The configuration file used should contain the index and backend storage definitions. The easiest way to achieve this is to just use the same config file that was used to deploy the indexed CAS in the first place.

It should be noted that if monitoring is configured in the provided config file (see Monitoring and Metrics) then any metrics produced by the cleanup tool will be published in the configured place. If that shouldn’t be the same place as the indexed CAS metrics for whatever reason then the config will need to be changed.

Warning

If using a !with-cache storage type and a non-distributed storage type, such as !lru-storage, the caches will not be cleaned up along with the backing storage. In rare cases this can cause issues. To minimize this issue, the configured cache size across all BuildGrids should be smaller than the configured low watermark.

Index

In order to use the cleanup daemon, an index is currently needed. BuildGrid supports a CAS index stored in either Redis or PostgreSQL.

This index also provides performance improvements for FindMissingBlobs requests, allowing the CAS to perform existence checking without using potentially slow storage services.

Configuration

In a configuration file, the CAS index acts just like another storage backend definition. It fully implements the same API interface as the other storage backends, so you can just pass it to the service definition as with any other storage implementation.

For example, here is a basic disk-based CAS configuration

server:
  - !channel
    address: localhost:50051
    insecure-mode: true

authorization:
  method: none

monitoring:
  enabled: false

instances:
  - name: ''
    description: |
      The unique '' instance.

    storages:
      - !disk-storage &disk-store
        path: example-cas/

    services:
      - !cas
        storage: *disk-store

      - !bytestream
        storage: *disk-store

To add an index to this, we simply need to add a CAS index to the list of storages, point it at the disk storage, and then use the index in the service definitions rather than the disk storage.

server:
  - !channel
    address: localhost:50051
    insecure-mode: true

authorization:
  method: none

monitoring:
  enabled: false


connections:
  - !sql-connection &sql
    connection-string: postgresql://bgd:insecure@database/bgdindex
    pool-size: 5
    pool-timeout: 30
    max-overflow: 10

storages:
  - !disk-storage &disk-store
    path: example-cas/

  - !sql-index &cas-index
    sql: *sql
    storage: *disk-store

instances:
  - name: ''
    description: |
      The unique '' instance.

    services:
      - !cas
        storage: *cas-index

      - !bytestream
        storage: *cas-index

thread-pool-size: 100

Warning

Adding a new, empty index to an existing storage is the same as deleting everything from the storage from a client perspective. If the storage is used by another service which stores digests, such as an ActionCache or Asset API, the contents of those services will also be lost.

Consider a migration phase using a !replicated-storage wrapper to work around this issue.

Index Types

The CAS index implementation is pluggable, similarly to the storage implementation. Currently both Redis and SQL are supported as storage indices.

Redis

The Redis Index uses Redis to store TTLs for CAS objects. The keys themselves are used to determine existence of a blob in CAS, and the TTL of the key is set to the expiry time of the blob. The value is set to a dummy value of 1, since we only care about existence checking and TTLs.

buildgrid.server.app.settings.parser.load_redis_index(storage: StorageABC, redis: RedisProvider, prefix: str | None = None) → RedisIndex

Generates buildgrid.server.cas.storage.index.redis.RedisIndex using the tag !redis-index.

Usage

- !redis-index
  # This assumes that a storage instance is defined elsewhere
  # with a `&cas-storage` anchor
  storage: *cas-storage
  redis: *redis
  prefix: "B"

Parameters:

storage (buildgrid.server.cas.storage.storage_abc.StorageABC) – Instance of storage to use. This must be a storage object constructed using a YAML tag ending in -storage, for example !disk-storage.
redis (buildgrid.server.redis.provider.RedisProvider) – A configured Redis connection manager. This must be an object with an !redis-connection YAML tag.
prefix (str) – An optional prefix to use to prefix keys written by this index. If not specified a prefix of “A” is used.

SQL

As the name suggests, the SQL index uses a PostgreSQL database to store the index.

The storage instance and connection string must be provided, but the other parameters have defaults that should be functional.

buildgrid.server.app.settings.parser.load_sql_index(storage: StorageABC, sql: SqlProvider, sql_ro: SqlProvider | None = None, window_size: int = 1000, inclause_limit: int = -1, max_inline_blob_size: int = 0, refresh_accesstime_older_than: int = 0) → SQLIndex

Generates buildgrid.server.cas.storage.index.sql.SQLIndex using the tag !sql-index.

Usage

- !sql-index
  # Assuming the YAML anchors are defined elsewhere in the config file
  storage: *cas-storage
  sql: *sql
  sql-ro: *readonly-sql
  window-size: 1000
  inclause-limit: -1
  max-inline-blob-size: 256
  refresh-accesstime-older-than: 0

Parameters:

storage (buildgrid.server.cas.storage.storage_abc.StorageABC) – Instance of storage to use. This must be a storage object constructed using a YAML tag ending in -storage, for example !disk-storage.
sql (buildgrid.server.sql.provider.SqlProvider) – A configured SQL connection manager. This must be an object with an !sql-connection YAML tag.
sql_ro (buildgrid.server.sql.provider.SqlProvider) – Similar to sql, but used for readonly backend transactions. If set, it should be configured with a replica of main DB using an optional but encouraged readonly role. Permission check is not executed by BuildGrid. If not set, readonly transactions are executed by sql object.
window_size (uint) – Maximum number of blobs to fetch in one SQL operation (larger resultsets will be automatically split into multiple queries)
inclause_limit (int) – If nonnegative, overrides the default number of variables permitted per “in” clause. See the buildgrid.server.cas.storage.index.sql.SQLIndex comments for more details.
max_inline_blob_size (int) – Blobs of this size or smaller are stored directly in the index and not in the backing storage (must be nonnegative).
refresh-accesstime-older-than (int) – When reading a blob, its access timestamp will not be updated if the current time is not at least refresh-accesstime-older-than seconds newer than the access timestamp. Set this to reduce load associated with frequent timestamp updates.

Important

It is strongly recommended to set refresh-accesstime-older-than to a reasonably large value to minimise database churn when using an SQLIndex.

See Lifespan for details on how this affects blob lifespans.

Changed in version 0.4.0: fallback_on_get functionality was removed. !replicated-storage allows for clean migrations more generally.

Lifespan

The index implementions maintain access timestamps which are used by cleanup to determine whether or not the blob is old enough to be deleted. The Redis index updates this timestamp whenever the blob is referenced by a FindMissingBlobs request, but the SQL index can be configured to skip this update sometimes, if the timestamp is recent enough.

This configuration option (refresh-accesstime-older-than) has implications on the guaranteed lifespan of blobs however, and should be taken into account when configuring only-if-unused-for in the cleanup process.

Specifically, the last access could have occurred anywhere in the window of refresh-accesstime-older-than seconds since the recorded access time.

|------ SQLIndex ------|
|--------------------------- Cleanup only-if-unused-for ----------------------------|
                       |-------------------- Guaranteed Lifespan -------------------|

This leaves the actual guaranteed lifespan of blobs equal to

l = only_if_unused_for - refresh_accesstime_older_than

FMB Cache

BuildGrid also has a RedisFMBCache storage type. This stores blob existence information in Redis in a similar way to the RedisIndex, however it sets the value of each digest’s key to 1 if the blob exists, and 0 if not.

Keys in the RedisFMBCache also expire after a configured TTL, which is only refreshed when changing the value of the key in the cache. This is to force requests to sometimes hit the underlying storage layer, to allow that storage implementation to update its access time metadata for the blob if necessary.

This cache is transparent, and will gracefully fall back to the underlying storages in the case of Redis unavailability. Whilst also doing the job of speeding up FindMissingBlobs calls, this is a separate concept from the index and cannot be used for cleanup. This cache is not an authoritative source for the existence (or not) of blobs whose digests are not cached.

Lifespan

Adding this extra layer of caching complicates the minimum guaranteed lifespan calculation.

                       |- FMB Cache TTL -|
|------ SQLIndex ------|
|--------------------------- Cleanup only-if-unused-for ----------------------------|
                                         |----------- Guaranteed Lifespan ----------|

The pathological case is a digest which is included in a FindMissingBlobs call close to the end of the SQLIndex access timestamp update window. This will cache the existence, but not update the timestamp. As such, there is a window equal to the size of the FMB cache TTL during which we may have received a request which should have updated the access timestamp but did not.

The guaranteed lifespan for a blob in CAS is now

l = only_if_unused_for - refresh_accesstime_older_than - fmb_cache_ttl

This should be accounted for when configuring cleanup.

As a worked example, say we want to guarantee that blobs in CAS are available for a minimum of 24 hours after their last access. If we have our SQLIndex configured to refresh access timestamps every 6 hours, and an FMB cache in front of the index with a 1 hour TTL, we need to set only-if-unused-for to 31 hours when configuring cleanup.