Support local cache with get api by XucSh · Pull Request #1530 · kvcache-ai/Mooncake

XucSh · 2026-02-10T09:29:35Z

Description

Now this feature has to disable evict.

Test result:

--- Benchmark: Data size sweep (get_buffer) ---
Size | Avg Miss | Avg Hit | Speedup | Throughput
---------+------------+------------+----------+-------------
4 KB | 0.276ms | 0.001ms | 191.4x | 2.65 GB/s
64 KB | 0.174ms | 0.001ms | 150.1x | 52.50 GB/s
256 KB | 0.245ms | 0.002ms | 109.2x | 108.75 GB/s
1 MB | 0.590ms | 0.002ms | 277.5x | 459.56 GB/s
4 MB | 1.959ms | 0.002ms | 824.8x | 1644.83 GB/s

--- Benchmark: Data size sweep (get_into) ---
Size | Avg Miss | Avg Hit | Speedup | Throughput
---------+------------+------------+----------+-------------
4 KB | 0.042ms | 0.019ms | 2.2x | 0.20 GB/s
64 KB | 0.031ms | 0.027ms | 1.1x | 2.23 GB/s
256 KB | 0.053ms | 0.049ms | 1.1x | 4.94 GB/s
1 MB | 0.165ms | 0.163ms | 1.0x | 6.00 GB/s
4 MB | 0.604ms | 0.602ms | 1.0x | 6.48 GB/s

Type of Change

Types
- Bug fix
- New feature
  - Transfer Engine
  - Mooncake Store
  - Mooncake EP
  - Integration
  - P2P Store
  - Python Wheel
- Breaking change
- CI/CD
- Documentation update
- Other

How Has This Been Tested?

Checklist

I have performed a self-review of my own code.
I have formatted my own code using ./scripts/code_format.sh before submitting.
I have updated the documentation.
I have added tests to prove my changes are effective.

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

gemini-code-assist · 2026-02-10T09:30:10Z

Summary of Changes

Hello @XucSh, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial local caching mechanism for get API calls within the Mooncake Store. By enabling local caching, the system can store frequently accessed objects directly on the client's node, significantly reducing network latency and master server load for subsequent requests. The benchmarks provided demonstrate substantial performance improvements, with cache hits achieving speedups of over 100x and throughputs reaching hundreds of GB/s for get_buffer operations. This enhancement is designed to optimize data retrieval for applications requiring high-performance access to shared data.

Highlights

Local Cache Parameter: Introduced a local_cache boolean parameter to Python get APIs (get_buffer, batch_get_buffer, get_into, batch_get_into) to enable or disable local caching, defaulting to false.
New LocalCache Class: Implemented a new LocalCache C++ class for thread-safe in-memory storage of retrieved objects, designed to reduce redundant remote transfers and master queries.
Master-Coordinated Local Replica Allocation: Added new RPCs (CacheLocalStart, CacheLocalEnd, CacheLocalRevoke) and corresponding client/service methods to coordinate local replica allocation and status updates with the master server.
RealClient Integration: Integrated local cache lookup and insertion logic into the RealClient's get methods, ensuring cache hits bypass remote operations and data is cached locally for future access.
Cache Invalidation: Modified remove operations (remove, removeByRegex, removeAll) to invalidate corresponding entries in the local cache, maintaining data consistency across the system.
Comprehensive Testing and Benchmarking: Provided a comprehensive Python test script (test_local_cache.py) to validate the local cache functionality and measure performance benefits, demonstrating significant speedups for cache hits across various get operations and data sizes.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

mooncake-integration/store/store_py.cpp
- Added local_cache boolean parameter to Python bindings for get_buffer, batch_get_buffer, get_into, and batch_get_into methods, defaulting to false.
mooncake-store/include/client_service.h
- Declared a new CacheLocal method to enable local caching of objects.
mooncake-store/include/dummy_client.h
- Updated method signatures for get_into, batch_get_into, get_buffer, and batch_get_buffer to include a local_cache boolean parameter.
mooncake-store/include/local_cache.h
- Added a new header file defining the LocalCache class, which provides thread-safe key-value storage for local object replicas, including Lookup, Insert, Erase, Clear, Contains, and Size methods.
mooncake-store/include/master_client.h
- Declared new methods CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke for coordinating local cache operations with the master.
mooncake-store/include/master_service.h
- Declared new methods CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke for managing local cache replica allocation and status on the master side.
mooncake-store/include/pyclient.h
- Modified virtual method signatures for get_into, batch_get_into, get_buffer, and batch_get_buffer to accept a local_cache boolean parameter.
mooncake-store/include/real_client.h
- Included local_cache.h, added a LocalCache member, and updated get_into, batch_get_into, get_buffer, and batch_get_buffer method signatures to include a local_cache boolean parameter.
mooncake-store/include/rpc_service.h
- Declared new RPC methods CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke within WrappedMasterService.
mooncake-store/src/client_service.cpp
- Implemented the Client::CacheLocal method, which orchestrates local replica allocation, data copying, and master notification.
mooncake-store/src/dummy_client.cpp
- Updated the implementations of get_buffer, batch_get_buffer, get_into, and batch_get_into to match their new signatures, incorporating the local_cache parameter.
mooncake-store/src/master_client.cpp
- Defined RpcNameTraits for the new CacheLocal RPCs and implemented the client-side logic for MasterClient::CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
mooncake-store/src/master_service.cpp
- Implemented the server-side logic for MasterService::CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke, handling replica allocation, completion, and revocation.
mooncake-store/src/real_client.cpp
- Modified remove, removeByRegex, and removeAll methods to clear corresponding entries from the local cache upon object removal.
- Integrated local cache lookup, insertion, and fallback logic into get_buffer, batch_get_buffer, get_into, and batch_get_into methods.
mooncake-store/src/rpc_service.cpp
- Implemented the WrappedMasterService methods for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke, and registered their RPC handlers with the server.
scripts/test_local_cache.py
- Added a new Python script (test_local_cache.py) to perform two-machine local cache testing and benchmarking, covering various get operations and cache scenarios.

Activity

The pull request introduces local caching functionality to the get API, aiming to significantly improve data retrieval performance.
Benchmarks were conducted to measure the speedup and throughput of cache hits versus misses for various data sizes and get operations.
The author provided detailed benchmark results in the pull request description, showing substantial performance gains.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a local caching mechanism to improve performance for get operations, covering get_buffer, get_into, and their batch versions, and includes coordination with the master service to manage local replicas, along with a thorough test suite. However, critical security vulnerabilities have been identified, including a potential heap buffer overflow in the CacheLocal function due to inconsistent object size assumptions, a risk of arbitrary memory access from the Python wrapper's use of raw memory pointers, and a race condition in the caching logic that could lead to resource leaks on the master server. Furthermore, there is an inefficiency in cache population for get_into operations, some code duplication, and a misleading comment that need addressing.

gemini-code-assist · 2026-02-10T09:33:45Z

mooncake-integration/store/store_py.cpp

+               uintptr_t buffer_ptr, size_t size, bool local_cache = false) {
                // Get data directly into user-provided buffer
                void *buffer = reinterpret_cast<void *>(buffer_ptr);


The get_into, batch_get_into, put_from, and put_from_with_metadata functions accept raw memory addresses as uintptr_t from Python and cast them directly to void* (e.g., line 1336). This allows a Python script to read from or write to arbitrary memory locations within the process. In shared or multi-tenant environments, this can lead to sensitive information disclosure, process crashes (if NULL or invalid pointers are passed), or Remote Code Execution. The library should ideally validate that these pointers belong to registered memory regions or use safer abstractions like BufferHandle objects instead of raw pointers.

gemini-code-assist · 2026-02-10T09:33:45Z

mooncake-store/src/client_service.cpp

+        auto& mem_desc = target_desc.get_memory_descriptor();
+        void* target_addr =
+            reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);
+        memcpy(target_addr, data_ptr, data_size);


Potential heap buffer overflow in Client::CacheLocal. The memcpy operation at line 1518 uses data_size provided by the caller without verifying it against the actual size of the allocated buffer (mem_desc.buffer_descriptor.size_). In a distributed environment, the object size on the master could change between the initial query and the CacheLocalStart call, leading to an allocation smaller than the data being copied. Additionally, the destination address target_addr should be validated to ensure it resides within a memory segment previously mounted by the client to prevent arbitrary memory writes if the master is compromised.

gemini-code-assist · 2026-02-10T09:33:45Z

mooncake-store/src/client_service.cpp

+    auto end_result = master_client_.CacheLocalEnd(key, replica_id);
+    if (!end_result) {
+        LOG(ERROR) << "CacheLocal: CacheLocalEnd failed for key=" << key
+                   << ", error=" << toString(end_result.error());
+        return tl::unexpected(end_result.error());
+    }


There is a potential resource leak here. If master_client_.CacheLocalEnd(key, replica_id) fails, the replica allocated by CacheLocalStart is not revoked. This will leave the replica in an INITIALIZED state on the master, consuming resources indefinitely. You should call master_client_.CacheLocalRevoke(key, replica_id) in the failure case to clean up the allocated replica.

Suggested change

auto end_result = master_client_.CacheLocalEnd(key, replica_id);

if (!end_result) {

LOG(ERROR) << "CacheLocal: CacheLocalEnd failed for key=" << key

<< ", error=" << toString(end_result.error());

return tl::unexpected(end_result.error());

}

auto end_result = master_client_.CacheLocalEnd(key, replica_id);

if (!end_result) {

LOG(ERROR) << "CacheLocal: CacheLocalEnd failed for key=" << key

<< ", error=" << toString(end_result.error());

master_client_.CacheLocalRevoke(key, replica_id);

return tl::unexpected(end_result.error());

}

gemini-code-assist · 2026-02-10T09:33:46Z

mooncake-store/src/real_client.cpp

+    if (client_ && !local_cache_.Contains(key)) {
+        auto cache_result =
+            client_->CacheLocal(key, buffer, result.value());
+        if (cache_result) {
+            auto alloc_result =
+                client_buffer_allocator_->allocate(result.value());
+            if (alloc_result) {
+                memcpy(alloc_result->ptr(), buffer, result.value());
+                auto handle = std::make_shared<BufferHandle>(
+                    std::move(*alloc_result));
+                local_cache_.Insert(key, handle, cache_result.value(),
+                                    result.value());
+            }
+        }
+    }


The logic for populating the cache on a get_into miss seems inefficient. It performs two separate memcpy operations from the user-provided buffer: one inside client_->CacheLocal to create a segment replica, and another to populate a buffer from client_buffer_allocator_. This results in reading from the user buffer twice.

To improve this and make the data flow clearer, consider a more linear approach:

Allocate a buffer from client_buffer_allocator_.

Copy data from the user buffer into this new buffer once.

Use this new buffer as the single source to create the segment replica via client_->CacheLocal.

if (client_ && !local_cache_.Contains(key)) { auto alloc_result = client_buffer_allocator_->allocate(result.value()); if (alloc_result) { // First, copy data to our managed buffer memcpy(alloc_result->ptr(), buffer, result.value()); // Then, create the segment replica from our managed buffer auto cache_result = client_->CacheLocal(key, alloc_result->ptr(), result.value()); if (cache_result) { auto handle = std::make_shared<BufferHandle>( std::move(*alloc_result)); local_cache_.Insert(key, handle, cache_result.value(), result.value()); } } }

gemini-code-assist · 2026-02-10T09:33:46Z

mooncake-store/src/real_client.cpp

+    // Cache the result locally via master-coordinated allocation
+    if (client_) {
+        auto cache_result =
+            client_->CacheLocal(key, result->ptr(), result->size());


Race condition in get_buffer, get_into, and batch_get_into when local_cache=true. Multiple threads requesting the same key simultaneously may all miss the cache and call client_->CacheLocal. Since MasterService::CacheLocalStart allocates a new replica if the existing one is not yet COMPLETE, this can lead to multiple redundant replicas being allocated on the master for the same key and segment. If eviction is disabled (as noted in the PR description), this results in a permanent resource leak on the master, potentially leading to Denial of Service due to memory exhaustion.

gemini-code-assist · 2026-02-10T09:33:46Z

mooncake-store/include/local_cache.h

+     * @return pointer to CacheEntry if found, nullptr otherwise.
+     *         The pointer is valid only while the caller holds no
+     *         write lock (i.e., use under shared lock externally,
+     *         or copy the entry).
+     */


The documentation for LookupEntry is misleading. It states that the function returns a pointer to CacheEntry, but the implementation returns a std::optional<CacheEntry>, which is a value type (a copy of the entry). This could confuse future developers about the lifetime of the returned object. The comment should be updated to reflect that it returns a copy of the cache entry.

Suggested change:

* @return An optional containing a copy of the CacheEntry if found, * std::nullopt otherwise.

gemini-code-assist · 2026-02-10T09:33:46Z

mooncake-store/src/real_client.cpp

+        // Cache locally on success only when local_cache=true
+        if (local_cache && internal_results[i] && client_ &&
+            !local_cache_.Contains(miss_keys[i])) {
+            int64_t data_size = internal_results[i].value();
+            auto cache_result =
+                client_->CacheLocal(miss_keys[i], miss_buffers[i], data_size);
+            if (cache_result) {
+                auto alloc_result =
+                    client_buffer_allocator_->allocate(data_size);
+                if (alloc_result) {
+                    memcpy(alloc_result->ptr(), miss_buffers[i], data_size);
+                    auto handle = std::make_shared<BufferHandle>(
+                        std::move(*alloc_result));
+                    local_cache_.Insert(miss_keys[i], handle,
+                                        cache_result.value(), data_size);
+                }
+            }
+        }


The cache population logic for a cache miss in batch_get_into is very similar to the logic in get_into. This code duplication could be avoided by extracting the caching logic into a private helper function. This would improve maintainability and ensure that any future bug fixes or optimizations to the caching logic are applied in both places.

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

codecov-commenter · 2026-02-10T10:06:16Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 23.00319% with 241 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
mooncake-store/src/real_client.cpp	35.66%	92 Missing ⚠️
mooncake-store/src/master_service.cpp	0.00%	57 Missing ⚠️
mooncake-store/src/rpc_service.cpp	10.00%	27 Missing ⚠️
mooncake-store/src/client_service.cpp	8.00%	23 Missing ⚠️
mooncake-store/src/master_client.cpp	0.00%	20 Missing ⚠️
mooncake-integration/store/store_py.cpp	0.00%	12 Missing ⚠️
mooncake-store/include/local_cache.h	66.66%	8 Missing ⚠️
mooncake-store/src/dummy_client.cpp	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

XucSh · 2026-02-10T12:09:51Z

see #1516 and #1226

XucSh added 5 commits February 7, 2026 18:50

Add cache op for get

1c1ac65

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

update

9f31d2f

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

update

c0428a8

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

update

c6e86fb

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

update

f2cd7ec

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

XucSh requested review from YiXR, stmatengss and ykwd as code owners February 10, 2026 09:29

github-actions bot added run-ci Store labels Feb 10, 2026

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

update

d520771

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

XucSh closed this Feb 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Support local cache with get api#1530

Support local cache with get api#1530
XucSh wants to merge 6 commits intokvcache-ai:mainfrom
XucSh:cache-op-without-evict

XucSh commented Feb 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

codecov-commenter commented Feb 10, 2026

Uh oh!

XucSh commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

XucSh commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

How Has This Been Tested?

Checklist

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Feb 10, 2026

Codecov Report

Uh oh!

XucSh commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XucSh commented Feb 10, 2026 •

edited

Loading