Skip to content

Conversation

@tyler-french
Copy link
Contributor

This PR describes the implementation of CDC for Bazel, using FastCDC 2020 algorithm.

The key design consideration here are:

During a read:

  • Check if a blob is above the chunking threshold
  • Call SplitBlob to see if the remote has stored the blob with chunking map[blob] -> chunks
  • If so, assemble these and use this. Keep the chunks in the disk cache. Blob is downloaded

During a write:

  • Check if a blob is above the chunking threshold
  • If so, run FastCDC and store the offests + lens of the chunks
  • While uploading, we have the files available to upload from the merkle tree. So, instead of uploading the whole blob:
    • Check if this blob -> chunk mapping already exists using Split
    • Call FMB to find the chunks that need uploading (** key optimization, many times the remote has most chunks **)
    • Upload the chunks
    • Call SpliceBlob to register with the server that this blob digest can be constructed using these chunks

Optimization: we don't need to store the individual chunks for uploads, since we have the whole files locally anyway. Storing their offset + len is sufficient to do all of the work.

Depends on a new version release of remote_apis repo.

@tyler-french tyler-french force-pushed the tfrench/chunked-remote-cache branch from 8a45f14 to dbc1af6 Compare January 27, 2026 04:17
@tyler-french tyler-french force-pushed the tfrench/chunked-remote-cache branch 11 times, most recently from 789ab23 to 4349030 Compare January 28, 2026 06:00
@tyler-french tyler-french force-pushed the tfrench/chunked-remote-cache branch 6 times, most recently from 438591d to 99dc964 Compare January 28, 2026 19:38
@tyler-french tyler-french force-pushed the tfrench/chunked-remote-cache branch from 99dc964 to ca960ca Compare January 28, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants