Indexed CAS
To improve the performance of CAS-related requests like FindMissingBlobs
,
it is possible to configure an “indexed” CAS. This also facilitates intelligent
cleanup of blobs, currently only for S3.
This index keeps track of all the blobs that are currently stored in CAS, as well as the last time they were accessed at.
Configuration
In a configuration file, the CAS index acts just like another storage backend definition. It fully implements the same API interface as the other storage backends, so you can just pass it to the service definition as with any other storage implementation.
For example, here is a basic disk-based CAS configuration
server:
- !channel
address: localhost:50051
insecure-mode: true
authorization:
method: none
monitoring:
enabled: false
instances:
- name: ''
description: |
The unique '' instance.
storages:
- !disk-storage &disk-store
path: example-cas/
services:
- !cas
storage: *disk-store
- !bytestream
storage: *disk-store
To add an index to this, we simply need to add a CAS index to the list of storages, point it at the disk storage, and then use the index in the service definitions rather than the disk storage.
server:
- !channel
address: localhost:50051
insecure-mode: true
authorization:
method: none
monitoring:
enabled: false
connections:
- !sql-connection &sql
connection-string: sqlite:///./example.db
automigrate: yes
connection-timeout: 15
storages:
- !disk-storage &disk-store
path: example-cas/
- !sql-index &cas-index
sql: *sql
storage: *disk-store
instances:
- name: ''
description: |
The unique '' instance.
services:
- !cas
storage: *cas-index
- !bytestream
storage: *cas-index
Warning
If adding an index to an existing storage which is used by another service
which stores digests, such as an ActionCache or Asset API, either those
services need to be cleared or the fallback_on_get
config option needs
to be temporarily set to True on the index.
By default, the index will report that a blob is missing if it isn’t in the
index, whether it exists in storage or not. fallback_on_get
is a pretty
big burden on performance so is off by default, and should ideally not be
left on permanently.
Index Types
The CAS index implementation is pluggable, similarly to the storage implementation. Currently only an SQL-based implementation exists, but it is possible to write a custom index implementation without too much effort.
SQL
As the name suggests, the SQL index uses an SQL database to store the index. It is tested to support both SQLite and PostgreSQL as the database, and theoretically could support any database backend that SQLAlchemy supports (providing that the database has support for window functions).
The storage instance and connection string must be provided, but the other parameters have defaults that should be functional.