Skip to content

Draft Fix chunk cache key and metadata retrieval #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

maxstack
Copy link
Contributor

Fixes for the chunk cache:

  1. Request offset and size not included in the cache key so the different byte ranges received by these requests would be cached in the same file causing errors in results
  2. Fix thread issue where tasks processing requests could access the cache state whilst being written via get_metadata
  3. Take unnecessary request parameters out of the key so the cache may hit more, we only include the source, bucket, object key, offset, and size - these are the parameters used by S3 object download API calls

…nloaded byte range under a key that doesn't account for this range.

When another request is received we get a cache hit even though the requested byte range may differ.

Fix this by downloading and caching the entire chunk, applying the byte range post download or cache hit.

If caching is disabled honour the byte range in the S3 client download.
…ache chunks aren't shared between requests of differing byte range.

Revert the previous fix which suffers from memory over consumption when bombarded by requests for byte ranges from the same very large file - we would need a way to ensure a single download of that file before servicing concurrent requests against it.
 1) Only incorporate request fields that themselves are used in the S3 object download
 2) Immediately turn this key into a md5 hash so we're handling a much shorter string when interacting with the cache
…essing task to determine if a chunk is cached and if so its size.

There's a MPSC channel used to buffer all cache write requests to a single task responsible for cache updates but the update task could happen at the same time request processing tasks read the state.
Reading the state file whilst it's being written results in a serde parsing error.

Instead of adding some type of thread safety around the load/save of the state file a simpler solution is to store a chunk's metadata in its own metadata file.
This mirrors the chunk get which ignores the state file and simply retrieves files from disk.
@maxstack maxstack self-assigned this Apr 17, 2025
Copy link
Member

@sd109 sd109 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, seems like a sensible approach to me.

Do we also need to be worried about the pruning cycle trying to access the cache state file while the cache is being written to? I see we're using load_state in the remove method as well as in various places throughout the prune* methods.

@@ -320,6 +319,26 @@ impl SimpleDiskCache {
}
}

/// Retrieves chunk metadata from the cache for an unique key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
/// Retrieves chunk metadata from the cache for an unique key.
/// Retrieves chunk metadata from the cache for a unique key.

Comment on lines 389 to 395
fs::remove_file(path.join(self.filename_for_key(key).await))
.await
.unwrap();
// Remove the metadata file
fs::remove_file(path.join(self.filename_for_key(key).await + ".meta"))
.await
.unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything we can do to avoid the unwraps here? Or we could at least log the error for easier debugging with something like

Suggested change
fs::remove_file(path.join(self.filename_for_key(key).await))
.await
.unwrap();
// Remove the metadata file
fs::remove_file(path.join(self.filename_for_key(key).await + ".meta"))
.await
.unwrap();
fs::remove_file(path.join(self.filename_for_key(key).await))
.await
.map_err(|e| log::error!("Failed to remove cache data with: {}", e)
.unwrap();
// Remove the metadata file
fs::remove_file(path.join(self.filename_for_key(key).await + ".meta"))
.await
.map_err(|e| log::error!("Failed to remove cache metadata with: {}", e)
.unwrap();

N.B. We might also want to add this map_error pattern anywhere else in the code that we need to use unwrap / expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants