A SQL index implementation. This can be pointed to either a remote SQL server or a local SQLite database.

class SqlProvider, storage: StorageABC, *, window_size: int = 1000, inclause_limit: int = -1, max_inline_blob_size: int = 0, refresh_accesstime_older_than: int = 0, **kwargs: Any)

Bases: IndexABC

TYPE: str = 'SQLIndex'
start() None
stop() None
has_blob(digest: Digest) bool

Return True if the blob with the given instance/digest exists.

get_blob(digest: Digest) IO[bytes] | None

Get a blob from the index or the backing storage. Optionally fallback and repair index

delete_blob(digest: Digest) None

Delete the blob from storage if it’s present.

commit_write(digest: Digest, write_session: IO[bytes]) None

Store the contents for a digest.

The storage object is not responsible for verifying that the data written to the write_session actually matches the digest. The caller must do that.

missing_blobs(digests: List[Digest]) List[Digest]

Return a container containing the blobs not present in CAS.

bulk_update_blobs(digest_blob_pairs: List[Tuple[Digest, bytes]]) List[Status]

Implement the StorageABC’s bulk_update_blobs method.

The StorageABC interface takes in a list of digest/blob pairs and returns a list of results. The list of results MUST be ordered to correspond with the order of the input list.

bulk_read_blobs(digests: List[Digest]) Dict[str, bytes]

Given an iterable container of digests, return a {hash: file-like object} dictionary corresponding to the blobs represented by the input digests.

Each file-like object must be readable and seekable.

least_recent_digests() Iterator[Digest]

Generator to iterate through the digests in LRU order

get_total_size() int

Return the sum of the size of all blobs within the index.

delete_n_bytes(n_bytes: int, dry_run: bool = False, protect_blobs_after: datetime | None = None) int

When using a SQL Index, entries with a delete marker are “in the process of being deleted”. This is required because storage operations can’t be safely tied to the SQL index transaction (one may fail independently of the other, and you end up inconsistent).

The workflow is roughly as follows: - Start a SQL transaction. - Lock and mark the indexed items you want to delete. - Close the SQL transaction. - Perform the storage deletes - Start a SQL transaction. - Actually delete the index entries. - Close the SQL transaction.

This means anything with deleted=False will always be present in the backing store. If it is marked deleted=True, and the process gets killed when deleting from the backing storage, only some of the items might actually be gone.

The next time the cleaner starts up, it can try to do that delete again (ignoring 404s). Eventually that will succeed and the item will actually be removed from the DB. Only during the first run of batches do we consider already marked items. This avoids multiple cleanup daemons from competing with each other on every batch.

bulk_delete(digests: List[Digest]) List[str]

Delete a list of blobs from storage.