Skip to content

Comments

Support local cache with get api#1530

Closed
XucSh wants to merge 6 commits intokvcache-ai:mainfrom
XucSh:cache-op-without-evict
Closed

Support local cache with get api#1530
XucSh wants to merge 6 commits intokvcache-ai:mainfrom
XucSh:cache-op-without-evict

Conversation

@XucSh
Copy link
Collaborator

@XucSh XucSh commented Feb 10, 2026

Description

Now this feature has to disable evict.

Test result:

--- Benchmark: Data size sweep (get_buffer) ---
Size | Avg Miss | Avg Hit | Speedup | Throughput
---------+------------+------------+----------+-------------
4 KB | 0.276ms | 0.001ms | 191.4x | 2.65 GB/s
64 KB | 0.174ms | 0.001ms | 150.1x | 52.50 GB/s
256 KB | 0.245ms | 0.002ms | 109.2x | 108.75 GB/s
1 MB | 0.590ms | 0.002ms | 277.5x | 459.56 GB/s
4 MB | 1.959ms | 0.002ms | 824.8x | 1644.83 GB/s

--- Benchmark: Data size sweep (get_into) ---
Size | Avg Miss | Avg Hit | Speedup | Throughput
---------+------------+------------+----------+-------------
4 KB | 0.042ms | 0.019ms | 2.2x | 0.20 GB/s
64 KB | 0.031ms | 0.027ms | 1.1x | 2.23 GB/s
256 KB | 0.053ms | 0.049ms | 1.1x | 4.94 GB/s
1 MB | 0.165ms | 0.163ms | 1.0x | 6.00 GB/s
4 MB | 0.604ms | 0.602ms | 1.0x | 6.48 GB/s

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have formatted my own code using ./scripts/code_format.sh before submitting.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

XucSh added 5 commits February 7, 2026 18:50
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @XucSh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial local caching mechanism for get API calls within the Mooncake Store. By enabling local caching, the system can store frequently accessed objects directly on the client's node, significantly reducing network latency and master server load for subsequent requests. The benchmarks provided demonstrate substantial performance improvements, with cache hits achieving speedups of over 100x and throughputs reaching hundreds of GB/s for get_buffer operations. This enhancement is designed to optimize data retrieval for applications requiring high-performance access to shared data.

Highlights

  • Local Cache Parameter: Introduced a local_cache boolean parameter to Python get APIs (get_buffer, batch_get_buffer, get_into, batch_get_into) to enable or disable local caching, defaulting to false.
  • New LocalCache Class: Implemented a new LocalCache C++ class for thread-safe in-memory storage of retrieved objects, designed to reduce redundant remote transfers and master queries.
  • Master-Coordinated Local Replica Allocation: Added new RPCs (CacheLocalStart, CacheLocalEnd, CacheLocalRevoke) and corresponding client/service methods to coordinate local replica allocation and status updates with the master server.
  • RealClient Integration: Integrated local cache lookup and insertion logic into the RealClient's get methods, ensuring cache hits bypass remote operations and data is cached locally for future access.
  • Cache Invalidation: Modified remove operations (remove, removeByRegex, removeAll) to invalidate corresponding entries in the local cache, maintaining data consistency across the system.
  • Comprehensive Testing and Benchmarking: Provided a comprehensive Python test script (test_local_cache.py) to validate the local cache functionality and measure performance benefits, demonstrating significant speedups for cache hits across various get operations and data sizes.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • mooncake-integration/store/store_py.cpp
    • Added local_cache boolean parameter to Python bindings for get_buffer, batch_get_buffer, get_into, and batch_get_into methods, defaulting to false.
  • mooncake-store/include/client_service.h
    • Declared a new CacheLocal method to enable local caching of objects.
  • mooncake-store/include/dummy_client.h
    • Updated method signatures for get_into, batch_get_into, get_buffer, and batch_get_buffer to include a local_cache boolean parameter.
  • mooncake-store/include/local_cache.h
    • Added a new header file defining the LocalCache class, which provides thread-safe key-value storage for local object replicas, including Lookup, Insert, Erase, Clear, Contains, and Size methods.
  • mooncake-store/include/master_client.h
    • Declared new methods CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke for coordinating local cache operations with the master.
  • mooncake-store/include/master_service.h
    • Declared new methods CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke for managing local cache replica allocation and status on the master side.
  • mooncake-store/include/pyclient.h
    • Modified virtual method signatures for get_into, batch_get_into, get_buffer, and batch_get_buffer to accept a local_cache boolean parameter.
  • mooncake-store/include/real_client.h
    • Included local_cache.h, added a LocalCache member, and updated get_into, batch_get_into, get_buffer, and batch_get_buffer method signatures to include a local_cache boolean parameter.
  • mooncake-store/include/rpc_service.h
    • Declared new RPC methods CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke within WrappedMasterService.
  • mooncake-store/src/client_service.cpp
    • Implemented the Client::CacheLocal method, which orchestrates local replica allocation, data copying, and master notification.
  • mooncake-store/src/dummy_client.cpp
    • Updated the implementations of get_buffer, batch_get_buffer, get_into, and batch_get_into to match their new signatures, incorporating the local_cache parameter.
  • mooncake-store/src/master_client.cpp
    • Defined RpcNameTraits for the new CacheLocal RPCs and implemented the client-side logic for MasterClient::CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
  • mooncake-store/src/master_service.cpp
    • Implemented the server-side logic for MasterService::CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke, handling replica allocation, completion, and revocation.
  • mooncake-store/src/real_client.cpp
    • Modified remove, removeByRegex, and removeAll methods to clear corresponding entries from the local cache upon object removal.
    • Integrated local cache lookup, insertion, and fallback logic into get_buffer, batch_get_buffer, get_into, and batch_get_into methods.
  • mooncake-store/src/rpc_service.cpp
    • Implemented the WrappedMasterService methods for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke, and registered their RPC handlers with the server.
  • scripts/test_local_cache.py
    • Added a new Python script (test_local_cache.py) to perform two-machine local cache testing and benchmarking, covering various get operations and cache scenarios.
Activity
  • The pull request introduces local caching functionality to the get API, aiming to significantly improve data retrieval performance.
  • Benchmarks were conducted to measure the speedup and throughput of cache hits versus misses for various data sizes and get operations.
  • The author provided detailed benchmark results in the pull request description, showing substantial performance gains.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a local caching mechanism to improve performance for get operations, covering get_buffer, get_into, and their batch versions, and includes coordination with the master service to manage local replicas, along with a thorough test suite. However, critical security vulnerabilities have been identified, including a potential heap buffer overflow in the CacheLocal function due to inconsistent object size assumptions, a risk of arbitrary memory access from the Python wrapper's use of raw memory pointers, and a race condition in the caching logic that could lead to resource leaks on the master server. Furthermore, there is an inefficiency in cache population for get_into operations, some code duplication, and a misleading comment that need addressing.

Comment on lines +1334 to 1336
uintptr_t buffer_ptr, size_t size, bool local_cache = false) {
// Get data directly into user-provided buffer
void *buffer = reinterpret_cast<void *>(buffer_ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The get_into, batch_get_into, put_from, and put_from_with_metadata functions accept raw memory addresses as uintptr_t from Python and cast them directly to void* (e.g., line 1336). This allows a Python script to read from or write to arbitrary memory locations within the process. In shared or multi-tenant environments, this can lead to sensitive information disclosure, process crashes (if NULL or invalid pointers are passed), or Remote Code Execution. The library should ideally validate that these pointers belong to registered memory regions or use safer abstractions like BufferHandle objects instead of raw pointers.

auto& mem_desc = target_desc.get_memory_descriptor();
void* target_addr =
reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);
memcpy(target_addr, data_ptr, data_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Potential heap buffer overflow in Client::CacheLocal. The memcpy operation at line 1518 uses data_size provided by the caller without verifying it against the actual size of the allocated buffer (mem_desc.buffer_descriptor.size_). In a distributed environment, the object size on the master could change between the initial query and the CacheLocalStart call, leading to an allocation smaller than the data being copied. Additionally, the destination address target_addr should be validated to ensure it resides within a memory segment previously mounted by the client to prevent arbitrary memory writes if the master is compromised.

Comment on lines +1527 to +1532
auto end_result = master_client_.CacheLocalEnd(key, replica_id);
if (!end_result) {
LOG(ERROR) << "CacheLocal: CacheLocalEnd failed for key=" << key
<< ", error=" << toString(end_result.error());
return tl::unexpected(end_result.error());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential resource leak here. If master_client_.CacheLocalEnd(key, replica_id) fails, the replica allocated by CacheLocalStart is not revoked. This will leave the replica in an INITIALIZED state on the master, consuming resources indefinitely. You should call master_client_.CacheLocalRevoke(key, replica_id) in the failure case to clean up the allocated replica.

Suggested change
auto end_result = master_client_.CacheLocalEnd(key, replica_id);
if (!end_result) {
LOG(ERROR) << "CacheLocal: CacheLocalEnd failed for key=" << key
<< ", error=" << toString(end_result.error());
return tl::unexpected(end_result.error());
}
auto end_result = master_client_.CacheLocalEnd(key, replica_id);
if (!end_result) {
LOG(ERROR) << "CacheLocal: CacheLocalEnd failed for key=" << key
<< ", error=" << toString(end_result.error());
master_client_.CacheLocalRevoke(key, replica_id);
return tl::unexpected(end_result.error());
}

Comment on lines 1482 to 1496
if (client_ && !local_cache_.Contains(key)) {
auto cache_result =
client_->CacheLocal(key, buffer, result.value());
if (cache_result) {
auto alloc_result =
client_buffer_allocator_->allocate(result.value());
if (alloc_result) {
memcpy(alloc_result->ptr(), buffer, result.value());
auto handle = std::make_shared<BufferHandle>(
std::move(*alloc_result));
local_cache_.Insert(key, handle, cache_result.value(),
result.value());
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for populating the cache on a get_into miss seems inefficient. It performs two separate memcpy operations from the user-provided buffer: one inside client_->CacheLocal to create a segment replica, and another to populate a buffer from client_buffer_allocator_. This results in reading from the user buffer twice.

To improve this and make the data flow clearer, consider a more linear approach:

  1. Allocate a buffer from client_buffer_allocator_.
  2. Copy data from the user buffer into this new buffer once.
  3. Use this new buffer as the single source to create the segment replica via client_->CacheLocal.
    if (client_ && !local_cache_.Contains(key)) {
        auto alloc_result =
            client_buffer_allocator_->allocate(result.value());
        if (alloc_result) {
            // First, copy data to our managed buffer
            memcpy(alloc_result->ptr(), buffer, result.value());

            // Then, create the segment replica from our managed buffer
            auto cache_result =
                client_->CacheLocal(key, alloc_result->ptr(), result.value());
            if (cache_result) {
                auto handle = std::make_shared<BufferHandle>(
                    std::move(*alloc_result));
                local_cache_.Insert(key, handle, cache_result.value(),
                                    result.value());
            }
        }
    }

// Cache the result locally via master-coordinated allocation
if (client_) {
auto cache_result =
client_->CacheLocal(key, result->ptr(), result->size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Race condition in get_buffer, get_into, and batch_get_into when local_cache=true. Multiple threads requesting the same key simultaneously may all miss the cache and call client_->CacheLocal. Since MasterService::CacheLocalStart allocates a new replica if the existing one is not yet COMPLETE, this can lead to multiple redundant replicas being allocated on the master for the same key and segment. If eviction is disabled (as noted in the PR description), this results in a permanent resource leak on the master, potentially leading to Denial of Service due to memory exhaustion.

Comment on lines +43 to +47
* @return pointer to CacheEntry if found, nullptr otherwise.
* The pointer is valid only while the caller holds no
* write lock (i.e., use under shared lock externally,
* or copy the entry).
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation for LookupEntry is misleading. It states that the function returns a pointer to CacheEntry, but the implementation returns a std::optional<CacheEntry>, which is a value type (a copy of the entry). This could confuse future developers about the lifetime of the returned object. The comment should be updated to reflect that it returns a copy of the cache entry.

Suggested change:

* @return An optional containing a copy of the CacheEntry if found,
*         std::nullopt otherwise.

Comment on lines +1755 to +1772
// Cache locally on success only when local_cache=true
if (local_cache && internal_results[i] && client_ &&
!local_cache_.Contains(miss_keys[i])) {
int64_t data_size = internal_results[i].value();
auto cache_result =
client_->CacheLocal(miss_keys[i], miss_buffers[i], data_size);
if (cache_result) {
auto alloc_result =
client_buffer_allocator_->allocate(data_size);
if (alloc_result) {
memcpy(alloc_result->ptr(), miss_buffers[i], data_size);
auto handle = std::make_shared<BufferHandle>(
std::move(*alloc_result));
local_cache_.Insert(miss_keys[i], handle,
cache_result.value(), data_size);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cache population logic for a cache miss in batch_get_into is very similar to the logic in get_into. This code duplication could be avoided by extracting the caching logic into a private helper function. This would improve maintainability and ensure that any future bug fixes or optimizations to the caching logic are applied in both places.

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 23.00319% with 241 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
mooncake-store/src/real_client.cpp 35.66% 92 Missing ⚠️
mooncake-store/src/master_service.cpp 0.00% 57 Missing ⚠️
mooncake-store/src/rpc_service.cpp 10.00% 27 Missing ⚠️
mooncake-store/src/client_service.cpp 8.00% 23 Missing ⚠️
mooncake-store/src/master_client.cpp 0.00% 20 Missing ⚠️
mooncake-integration/store/store_py.cpp 0.00% 12 Missing ⚠️
mooncake-store/include/local_cache.h 66.66% 8 Missing ⚠️
mooncake-store/src/dummy_client.cpp 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@XucSh
Copy link
Collaborator Author

XucSh commented Feb 10, 2026

see #1516 and #1226

@XucSh XucSh closed this Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants