.. _cas-operation: CAS === CAS, or Content Addressable Storage, is the service responsible for persistently storing data in BuildGrid. The data stored in CAS **must** exist for the lifespan of a job from submission through to the ``ActionResult`` being returned to the client. Cleanup and other TTLs should be configured accordingly and this requirement should be kept in mind when determining storage size requirements. In order to benefit from remote caching, blobs in CAS will need a longer minimum lifespan than this. The exact lifespan to aim for is dictated by your storage availability and use-case. .. _cas-cleanup: Cleanup ------- CAS object lifespans are enforced by the CAS cleanup daemon. This tool is configured with a minimum blob age for deletion, a high watermark, and a low watermark. Cleanup starts when the total CAS size exceeds the high watermark, and ends when the size is below the low watermark. Blobs with a recorded access time more recent than the minimum blob age (``only-if-unused-for``) will not be deleted. Usage ~~~~~ To run the cleanup daemon, .. code-block:: sh bgd cleanup --high-watermark 10G --low-watermark 7.5G --batch-size 100M \ --sleep-interval 10 deployment.yml The batch size and high/low water mark parameters take numbers in bytes. Shorthands for kB, MB, GB, and TB are available as K, M, G, and T respectively, as seen in the example. The batch size is the minimum amount of space cleared in one go. The cleanup tool will try to remain as close as possible to the configured batch size, but depending on the size of blobs in the CAS will sometimes delete more than the specified batch at a time. A smaller batch size adds more load to the database and the storage backend, but space will start to be actually cleared faster than with large batch sizes. If the batch size is larger than the difference between the current CAS size and the low water mark, then the whole set of deletions required will be done in one batch. The sleep interval is the time in seconds to sleep after checking whether the CAS size has reached the configured high water mark. A lower sleep interval means a more reactive cleanup, at the cost of more database load. The configuration file used should contain the index and backend storage definitions. The easiest way to achieve this is to just use the same config file that was used to deploy the indexed CAS in the first place. It should be noted that if monitoring is configured in the provided config file (see :ref:`monitoring`) then any metrics produced by the cleanup tool will be published in the configured place. If that shouldn't be the same place as the indexed CAS metrics for whatever reason then the config will need to be changed. .. warning:: If using a ``!with-cache`` storage type and a non-distributed storage type, such as ``!lru-storage``, the caches will *not* be cleaned up along with the backing storage. In rare cases this can cause issues. To minimize this issue, the configured cache size across all BuildGrids should be smaller than the configured low watermark. .. _indexed-cas: Index ----- In order to use the cleanup daemon, an index is currently needed. BuildGrid supports a CAS index stored in either Redis or PostgreSQL. This index also provides performance improvements for ``FindMissingBlobs`` requests, allowing the CAS to perform existence checking without using potentially slow storage services. Configuration ~~~~~~~~~~~~~ In a configuration file, the CAS index acts just like another storage backend definition. It fully implements the same API interface as the other storage backends, so you can just pass it to the service definition as with any other storage implementation. For example, here is a basic disk-based CAS configuration .. literalinclude:: ../data/basic-disk-cas.yml :language: yaml To add an index to this, we simply need to add a CAS index to the list of storages, point it at the disk storage, and then use the index in the service definitions rather than the disk storage. .. literalinclude:: ../data/postgresql-index-cas-only.yml :language: yaml .. warning:: Adding a new, empty index to an existing storage is the same as deleting everything from the storage from a client perspective. If the storage is used by another service which stores digests, such as an ActionCache or Asset API, the contents of those services will also be lost. Consider a migration phase using a ``!replicated-storage`` wrapper to work around this issue. Index Types ~~~~~~~~~~~ The CAS index implementation is pluggable, similarly to the storage implementation. Currently both Redis and SQL are supported as storage indices. Redis ''''' The Redis Index uses Redis to store TTLs for CAS objects. The keys themselves are used to determine existence of a blob in CAS, and the TTL of the key is set to the expiry time of the blob. The value is set to a dummy value of 1, since we only care about existence checking and TTLs. .. autofunction:: buildgrid.server.app.settings.parser.load_redis_index :noindex: SQL ''' As the name suggests, the SQL index uses a PostgreSQL database to store the index. The storage instance and connection string must be provided, but the other parameters have defaults that should be functional. .. autofunction:: buildgrid.server.app.settings.parser.load_sql_index :noindex: .. important:: It is **strongly recommended** to set ``refresh-accesstime-older-than`` to a reasonably large value to minimise database churn when using an SQLIndex. See :ref:`index-lifespan` for details on how this affects blob lifespans. .. versionchanged:: 0.4.0 ``fallback_on_get`` functionality was removed. ``!replicated-storage`` allows for clean migrations more generally. .. _index-lifespan: Lifespan ~~~~~~~~ The index implementions maintain access timestamps which are used by cleanup to determine whether or not the blob is old enough to be deleted. The Redis index updates this timestamp whenever the blob is referenced by a ``FindMissingBlobs`` request, but the SQL index can be configured to skip this update sometimes, if the timestamp is recent enough. This configuration option (``refresh-accesstime-older-than``) has implications on the guaranteed lifespan of blobs however, and should be taken into account when configuring ``only-if-unused-for`` in the cleanup process. Specifically, the last access could have occurred anywhere in the window of ``refresh-accesstime-older-than`` seconds since the recorded access time. .. code-block:: |------ SQLIndex ------| |--------------------------- Cleanup only-if-unused-for ----------------------------| |-------------------- Guaranteed Lifespan -------------------| This leaves the actual guaranteed lifespan of blobs equal to .. code-block:: l = only_if_unused_for - refresh_accesstime_older_than FMB Cache --------- BuildGrid also has a ``RedisFMBCache`` storage type. This stores blob existence information in Redis in a similar way to the ``RedisIndex``, however it sets the value of each digest's key to ``1`` if the blob exists, and ``0`` if not. Keys in the ``RedisFMBCache`` also expire after a configured TTL, which is only refreshed when changing the value of the key in the cache. This is to force requests to sometimes hit the underlying storage layer, to allow that storage implementation to update its access time metadata for the blob if necessary. This cache is transparent, and will gracefully fall back to the underlying storages in the case of Redis unavailability. Whilst also doing the job of speeding up ``FindMissingBlobs`` calls, this is a separate concept from the index and cannot be used for cleanup. This cache is not an authoritative source for the existence (or not) of blobs whose digests are not cached. Lifespan ~~~~~~~~ Adding this extra layer of caching complicates the minimum guaranteed lifespan calculation. .. code-block:: |- FMB Cache TTL -| |------ SQLIndex ------| |--------------------------- Cleanup only-if-unused-for ----------------------------| |----------- Guaranteed Lifespan ----------| The pathological case is a digest which is included in a ``FindMissingBlobs`` call close to the end of the SQLIndex access timestamp update window. This will cache the existence, but not update the timestamp. As such, there is a window equal to the size of the FMB cache TTL during which we may have received a request which should have updated the access timestamp but did not. The guaranteed lifespan for a blob in CAS is now .. code-block:: l = only_if_unused_for - refresh_accesstime_older_than - fmb_cache_ttl This should be accounted for when configuring cleanup. As a worked example, say we want to guarantee that blobs in CAS are available for a minimum of 24 hours after their last access. If we have our SQLIndex configured to refresh access timestamps every 6 hours, and an FMB cache in front of the index with a 1 hour TTL, we need to set ``only-if-unused-for`` to 31 hours when configuring cleanup.