Draft Fix chunk cache key and metadata retrieval #104

maxstack · 2025-04-17T08:41:13Z

Fixes for the chunk cache:

Request offset and size not included in the cache key so the different byte ranges received by these requests would be cached in the same file causing errors in results
Fix thread issue where tasks processing requests could access the cache state whilst being written via get_metadata
Take unnecessary request parameters out of the key so the cache may hit more, we only include the source, bucket, object key, offset, and size - these are the parameters used by S3 object download API calls

…nloaded byte range under a key that doesn't account for this range. When another request is received we get a cache hit even though the requested byte range may differ. Fix this by downloading and caching the entire chunk, applying the byte range post download or cache hit. If caching is disabled honour the byte range in the S3 client download.

…ache chunks aren't shared between requests of differing byte range. Revert the previous fix which suffers from memory over consumption when bombarded by requests for byte ranges from the same very large file - we would need a way to ensure a single download of that file before servicing concurrent requests against it.

1) Only incorporate request fields that themselves are used in the S3 object download 2) Immediately turn this key into a md5 hash so we're handling a much shorter string when interacting with the cache

…essing task to determine if a chunk is cached and if so its size. There's a MPSC channel used to buffer all cache write requests to a single task responsible for cache updates but the update task could happen at the same time request processing tasks read the state. Reading the state file whilst it's being written results in a serde parsing error. Instead of adding some type of thread safety around the load/save of the state file a simpler solution is to store a chunk's metadata in its own metadata file. This mirrors the chunk get which ignores the state file and simply retrieves files from disk.

sd109

Nice, seems like a sensible approach to me.

Do we also need to be worried about the pruning cycle trying to access the cache state file while the cache is being written to? I see we're using load_state in the remove method as well as in various places throughout the prune* methods.

src/app.rs

sd109 · 2025-04-17T09:32:41Z

src/chunk_cache.rs

@@ -320,6 +319,26 @@ impl SimpleDiskCache {
        }
    }

+    /// Retrieves chunk metadata from the cache for an unique key.


Nit:

Suggested change

/// Retrieves chunk metadata from the cache for an unique key.

/// Retrieves chunk metadata from the cache for a unique key.

sd109 · 2025-04-17T09:39:48Z

src/chunk_cache.rs

+            fs::remove_file(path.join(self.filename_for_key(key).await))
+                .await
+                .unwrap();
+            // Remove the metadata file
+            fs::remove_file(path.join(self.filename_for_key(key).await + ".meta"))
+                .await
+                .unwrap();


Is there anything we can do to avoid the unwraps here? Or we could at least log the error for easier debugging with something like

Suggested change

fs::remove_file(path.join(self.filename_for_key(key).await))

.await

.unwrap();

// Remove the metadata file

fs::remove_file(path.join(self.filename_for_key(key).await + ".meta"))

.await

.unwrap();

fs::remove_file(path.join(self.filename_for_key(key).await))

.await

.map_err(|e| log::error!("Failed to remove cache data with: {}", e)

.unwrap();

// Remove the metadata file

fs::remove_file(path.join(self.filename_for_key(key).await + ".meta"))

.await

.map_err(|e| log::error!("Failed to remove cache metadata with: {}", e)

.unwrap();

N.B. We might also want to add this map_error pattern anywhere else in the code that we need to use unwrap / expect.

maxstack added 4 commits April 10, 2025 22:18

Alter the key used for the chunk cache:

57df55d

1) Only incorporate request fields that themselves are used in the S3 object download 2) Immediately turn this key into a md5 hash so we're handling a much shorter string when interacting with the cache

maxstack self-assigned this Apr 17, 2025

sd109 requested changes Apr 17, 2025

View reviewed changes

maxstack added 3 commits April 17, 2025 14:37

Allow the key of the chunk cache to be configurable.

cce5926

Mirror the default chunk cache key in the ansible group_vars.

a4af1cb

Replace unwraps with descriptive error messages.

de30640

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft Fix chunk cache key and metadata retrieval #104

Draft Fix chunk cache key and metadata retrieval #104

Uh oh!

maxstack commented Apr 17, 2025

Uh oh!

sd109 left a comment

Uh oh!

Uh oh!

sd109 Apr 17, 2025

Uh oh!

sd109 Apr 17, 2025

Uh oh!

Uh oh!

	/// Retrieves chunk metadata from the cache for an unique key.
	/// Retrieves chunk metadata from the cache for a unique key.

Draft Fix chunk cache key and metadata retrieval #104

Are you sure you want to change the base?

Draft Fix chunk cache key and metadata retrieval #104

Uh oh!

Conversation

maxstack commented Apr 17, 2025

Uh oh!

sd109 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sd109 Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

sd109 Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!