Add cache op for get api by XucSh · Pull Request #1516 · kvcache-ai/Mooncake

XucSh · 2026-02-07T10:56:23Z

Description

--- Benchmark: Data size sweep (get_buffer) ---
Size | Avg Miss | Avg Hit | Speedup | Throughput
---------+------------+------------+----------+-------------
4 KB | 0.292ms | 0.081ms | 3.6x | 0.05 GB/s
64 KB | 0.227ms | 0.091ms | 2.5x | 0.67 GB/s
256 KB | 0.254ms | 0.108ms | 2.4x | 2.26 GB/s
1 MB | 0.573ms | 0.266ms | 2.2x | 3.67 GB/s
4 MB | 1.757ms | 0.865ms | 2.0x | 4.52 GB/s

--- Benchmark: Data size sweep (get_into) ---
Size | Avg Miss | Avg Hit | Speedup | Throughput
---------+------------+------------+----------+-------------
4 KB | 0.136ms | 0.079ms | 1.7x | 0.05 GB/s
64 KB | 0.093ms | 0.086ms | 1.1x | 0.71 GB/s
256 KB | 0.111ms | 0.108ms | 1.0x | 2.27 GB/s
1 MB | 0.273ms | 0.266ms | 1.0x | 3.67 GB/s
4 MB | 0.848ms | 0.865ms | 1.0x | 4.52 GB/s

Type of Change

Types
- Bug fix
- New feature
  - Transfer Engine
  - Mooncake Store
  - Mooncake EP
  - Integration
  - P2P Store
  - Python Wheel
- Breaking change
- CI/CD
- Documentation update
- Other

How Has This Been Tested?

Checklist

I have performed a self-review of my own code.
I have formatted my own code using ./scripts/code_format.sh before submitting.
I have updated the documentation.
I have added tests to prove my changes are effective.

gemini-code-assist · 2026-02-07T10:56:56Z

Summary of Changes

Hello @XucSh, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Mooncake Store's data retrieval performance by integrating a local caching layer. It introduces client-side caching for get operations, allowing clients to store and retrieve data locally without repeatedly querying the master or performing remote transfers. This is achieved through a new LocalCache component and a set of master-coordinated RPCs that manage local replica allocation. The changes aim to reduce network latency and improve overall efficiency for frequently accessed data.

Highlights

Local Cache Implementation: Introduced a new LocalCache class for client-side data caching, designed to be thread-safe and used by the RealClient to store and retrieve BufferHandle objects locally.
Extended Get API with Caching Option: Extended get_buffer, batch_get_buffer, get_into, and batch_get_into methods across MooncakeStorePyWrapper, PyClient, DummyClient, and RealClient to include an optional local_cache boolean parameter, allowing callers to explicitly request local caching.
Master-Coordinated Local Replica Management: Added new RPC operations (CacheLocalStart, CacheLocalEnd, CacheLocalRevoke) to MasterClient, MasterService, and WrappedMasterService to enable master-coordinated allocation and management of local replicas for caching, ensuring consistency in the distributed system.
Cache Invalidation on Data Removal: Ensured cache consistency by modifying remove, removeByRegex, and removeAll operations in RealClient to clear corresponding entries from the local cache when data is deleted or invalidated.
New Integration Test for Local Cache: Included a new Python integration test (scripts/test_local_cache.py) to validate the local caching mechanism in a two-machine setup, covering various get operations and demonstrating performance improvements from cache hits.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

mooncake-integration/store/store_py.cpp
- Modified Python bindings for get_buffer, batch_get_buffer, get_into, and batch_get_into to accept a new local_cache boolean argument, defaulting to false.
mooncake-store/include/client_service.h
- Declared a new Client::CacheLocal method for local replica allocation and data copying.
mooncake-store/include/dummy_client.h
- Updated method signatures for get_into, batch_get_into, get_buffer, and batch_get_buffer to include the local_cache parameter.
mooncake-store/include/local_cache.h
- Added a new header defining the LocalCache class, which provides thread-safe storage and lookup for BufferHandles.
mooncake-store/include/master_client.h
- Declared new RPC client methods: CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
mooncake-store/include/master_service.h
- Declared new master service methods: CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke for managing local replica lifecycle.
mooncake-store/include/pyclient.h
- Updated virtual method signatures for get_into, batch_get_into, get_buffer, and batch_get_buffer to include the local_cache parameter.
mooncake-store/include/real_client.h
- Included local_cache.h.
- Added local_cache_ member to the RealClient class.
- Updated method signatures for get_into, batch_get_into, get_buffer, and batch_get_buffer to support local caching.
mooncake-store/include/rpc_service.h
- Declared new RPC service methods for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
mooncake-store/src/client_service.cpp
- Implemented Client::CacheLocal to coordinate with the master for local replica creation and data population.
mooncake-store/src/dummy_client.cpp
- Updated implementations of get_buffer, batch_get_buffer, get_into, and batch_get_into to match new signatures, passing local_cache parameter.
mooncake-store/src/master_client.cpp
- Added RpcNameTraits specializations for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
- Implemented MasterClient methods for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke to invoke corresponding RPCs.
mooncake-store/src/master_service.cpp
- Implemented MasterService::CacheLocalStart to allocate a new memory replica on the target segment.
- Implemented MasterService::CacheLocalEnd to mark a replica as complete.
- Implemented MasterService::CacheLocalRevoke to remove an allocated replica.
mooncake-store/src/real_client.cpp
- Modified remove, removeByRegex, and removeAll to clear corresponding entries from the local_cache_.
- Implemented local cache lookup and insertion logic within get_buffer, batch_get_buffer, get_into, and batch_get_into methods.
- Added logic to handle cache hits (local data transfer) and misses (remote fetch) for get_into and batch_get_into.
mooncake-store/src/rpc_service.cpp
- Implemented WrappedMasterService methods for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
- Registered new RPC handlers for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
scripts/test_local_cache.py
- Added a new Python script for end-to-end testing of the local cache feature in a two-machine setup, including benchmarks for cache hit/miss latency.

Activity

The pull request is currently marked as a Work In Progress (WIP).
The author, XucSh, has indicated that testing on a two-machine setup is still in progress.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a local caching mechanism for get operations to improve performance by avoiding redundant remote data transfers. However, a critical stale cache vulnerability exists in the get_into and batch_get_into paths, which could lead to information leaks if memory is reallocated by the master service. Additionally, there's a race condition in the master service that could cause resource exhaustion, and inconsistent use of memcpy instead of the transfer engine might lead to crashes with device memory. The current implementation also has performance inefficiencies in the cache population logic for get_into and batch_get_into operations, resulting in extra memory copies, and a minor documentation inaccuracy in the new LocalCache class.

gemini-code-assist · 2026-02-07T11:01:22Z

mooncake-store/src/real_client.cpp

+    auto entry = local_cache_.LookupEntry(key);
+    if (entry) {
+        const auto &replica = entry->replica_desc;
+        uint64_t total_size = calculate_total_size(replica);
+        if (size < total_size) {
+            LOG(ERROR) << "User buffer too small. Required: " << total_size
+                       << ", provided: " << size;
+            return -1;
+        }
+        std::vector<mooncake::Slice> slices;
+        allocateSlices(slices, replica, buffer);
+        QueryResult qr({replica},
+                       std::chrono::steady_clock::time_point::max());
+        auto get_result = client_->Get(key, qr, slices);


The local_cache_ implementation is vulnerable to an information leak due to stale cache entries. The cache hit path in get_into uses a Replica::Descriptor stored in the cache to perform a data transfer from a master-managed memory segment. However, the cache does not implement any lease validation or invalidation mechanism. If an object is removed or its lease expires on the master, the underlying memory may be reallocated to a different object. A subsequent cache hit will then read data from the new object, potentially leaking sensitive information belonging to another user or task.

good point. will fix it

gemini-code-assist · 2026-02-07T11:01:22Z

mooncake-store/src/real_client.cpp

+        auto entry = local_cache_.LookupEntry(keys[i]);
+        if (entry) {
+            const auto &replica = entry->replica_desc;
+            uint64_t total_size = calculate_total_size(replica);
+            if (sizes[i] >= total_size) {
+                std::vector<Slice> slices;
+                allocateSlices(slices, replica, buffers[i]);
+                cache_hits.push_back(
+                    {.original_index = i,
+                     .key = keys[i],
+                     .query_result = QueryResult(
+                         {replica},
+                         std::chrono::steady_clock::time_point::max()),
+                     .slices = std::move(slices),
+                     .total_size = total_size});
+                results[i] = static_cast<int64_t>(total_size);
+                continue;
+            }
+        }
+        miss_keys.push_back(keys[i]);
+        miss_buffers.push_back(buffers[i]);
+        miss_sizes.push_back(sizes[i]);
+        miss_indices.push_back(i);
+    }
+
+    // BatchGet for cache hits
+    if (!cache_hits.empty()) {
+        std::vector<std::string> hit_keys;
+        std::vector<QueryResult> hit_qrs;
+        std::unordered_map<std::string, std::vector<Slice>> hit_slices;
+        hit_keys.reserve(cache_hits.size());
+        hit_qrs.reserve(cache_hits.size());
+        for (auto &hit : cache_hits) {
+            hit_keys.push_back(hit.key);
+            hit_qrs.push_back(std::move(hit.query_result));
+            hit_slices[hit.key] = std::move(hit.slices);
+        }
+        auto batch_results =
+            client_->BatchGet(hit_keys, hit_qrs, hit_slices);


Similar to the get_into method, batch_get_into is also vulnerable to information leaks from stale cache entries. It uses cached Replica::Descriptor objects to read from memory segments that may have been reallocated by the master service after the original object was removed or its lease expired.

gemini-code-assist · 2026-02-07T11:01:22Z

mooncake-store/src/real_client.cpp

+            auto alloc_result =
+                client_buffer_allocator_->allocate(result.value());
+            if (alloc_result) {
+                memcpy(alloc_result->ptr(), buffer, result.value());


The get_into and batch_get_into methods currently use memcpy to populate the local cache from a user-provided buffer. This is a critical issue as it can lead to a process crash if the buffer resides in device memory; the transfer engine should be used instead to ensure compatibility with all supported memory types. Furthermore, the cache population logic on a cache miss for get_into is inefficient, involving multiple memcpy operations. A more efficient approach would be to reuse the caching logic from get_buffer to avoid duplicating logic and eliminate extra memory copies. This inefficiency also applies to batch_get_into.

gemini-code-assist · 2026-02-07T11:01:22Z

mooncake-store/src/master_service.cpp

+    auto* existing = metadata.GetReplicaBySegmentName(tgt_segment);
+    if (existing != nullptr && existing->is_completed()) {
+        return tl::make_unexpected(ErrorCode::OBJECT_ALREADY_EXISTS);
+    }


A race condition in CacheLocalStart allows for redundant memory allocations on the same segment for the same key. The code only checks if a completed replica exists. If multiple concurrent requests are made for the same key before the first allocation is marked as complete, the master will proceed with multiple allocations. This can be exploited to exhaust the memory capacity of a segment, leading to a Denial of Service (DoS).

gemini-code-assist · 2026-02-07T11:01:22Z

mooncake-store/src/client_service.cpp

+        auto& mem_desc = target_desc.get_memory_descriptor();
+        void* target_addr =
+            reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);
+        memcpy(target_addr, data_ptr, data_size);


The use of memcpy for data transfer in CacheLocal is unsafe when dealing with device memory (e.g., GPU or NPU memory). While the cache hit path correctly utilizes the transfer engine, this path uses memcpy, which will cause a process crash or undefined behavior if the data_ptr points to device memory. This is particularly relevant for protocols like ascend where device memory is common.

gemini-code-assist · 2026-02-07T11:01:22Z

mooncake-store/include/local_cache.h

+     * @brief Look up a key and return the full cache entry.
+     * @return pointer to CacheEntry if found, nullptr otherwise.
+     *         The pointer is valid only while the caller holds no
+     *         write lock (i.e., use under shared lock externally,
+     *         or copy the entry).
+     */


The Doxygen comment for LookupEntry is inaccurate. It states that the function returns a pointer and discusses pointer validity, but the function actually returns a std::optional<CacheEntry>, which is a value type (a copy of the entry). This can be misleading for developers using this class. The comment should be updated to reflect the actual return type and its semantics.

/** * @brief Look up a key and return the full cache entry. * @return An `std::optional` containing a copy of the `CacheEntry` if found, * `std::nullopt` otherwise. */

codecov-commenter · 2026-02-07T11:24:22Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 15.46392% with 164 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
mooncake-store/src/master_service.cpp	0.00%	58 Missing ⚠️
mooncake-store/src/real_client.cpp	43.93%	37 Missing ⚠️
mooncake-store/src/client_service.cpp	0.00%	30 Missing ⚠️
mooncake-integration/store/store_py.cpp	0.00%	17 Missing ⚠️
mooncake-store/src/rpc_service.cpp	9.09%	10 Missing ⚠️
mooncake-store/src/master_client.cpp	0.00%	6 Missing ⚠️
mooncake-store/src/dummy_client.cpp	0.00%	3 Missing ⚠️
mooncake-store/include/client_service.h	0.00%	2 Missing ⚠️
mooncake-store/include/rpc_types.h	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR introduces a “local_cache” option for GET-style APIs in Mooncake Store, adding master-coordinated local replica allocation to avoid redundant remote transfers, and exposes the flag through the C++/Python client surfaces. It also adds a two-machine script to manually validate cache hit behavior.

Changes:

Add new master RPCs and master logic for cache-local replica allocation/completion/revocation (CacheLocalStart/End/Revoke).
Extend RealClient/PyClient APIs (get_buffer, batch_get_buffer, get_into, batch_get_into) with a local_cache flag and implement a per-process LocalCache.
Expose local_cache to Python bindings and add a two-machine validation script.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
scripts/test_local_cache.py	Adds a two-machine manual test/benchmark script for local cache behavior.
mooncake-store/src/rpc_service.cpp	Wraps and registers new master RPC handlers for cache-local operations.
mooncake-store/src/real_client.cpp	Implements local cache lookups/population and clears cache on remove paths.
mooncake-store/src/master_service.cpp	Implements CacheLocalStart/End/Revoke in the master service.
mooncake-store/src/master_client.cpp	Adds master client RPC name traits and client methods for cache-local ops.
mooncake-store/src/dummy_client.cpp	Updates dummy client method signatures to include `local_cache` parameters.
mooncake-store/src/client_service.cpp	Adds Client::CacheLocal to allocate and populate a local replica via the master.
mooncake-store/include/rpc_service.h	Declares WrappedMasterService cache-local RPC wrappers.
mooncake-store/include/real_client.h	Adds `local_cache` parameters and introduces a LocalCache member.
mooncake-store/include/pyclient.h	Extends virtual API with `local_cache` parameters for get/get_into methods.
mooncake-store/include/master_service.h	Declares CacheLocalStart/End/Revoke and documents behavior.
mooncake-store/include/master_client.h	Declares CacheLocalStart/End/Revoke methods on MasterClient.
mooncake-store/include/local_cache.h	Introduces a thread-safe in-process LocalCache container.
mooncake-store/include/dummy_client.h	Updates dummy client method signatures with `local_cache` defaults.
mooncake-store/include/client_service.h	Declares Client::CacheLocal helper API.
mooncake-integration/store/store_py.cpp	Exposes `local_cache` flag to Python bindings for get/get_into APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-07T12:31:32Z

mooncake-store/src/master_service.cpp

+    // Check if the target segment already has a completed replica
+    auto* existing = metadata.GetReplicaBySegmentName(tgt_segment);
+    if (existing != nullptr && existing->is_completed()) {
+        return tl::make_unexpected(ErrorCode::OBJECT_ALREADY_EXISTS);
+    }
+
+    // Allocate a new replica on the target segment
+    Replica replica = [&]() -> Replica {
+        ScopedAllocatorAccess allocator_access =
+            segment_manager_.getAllocatorAccess();
+        const auto& allocator_manager = allocator_access.getAllocatorManager();
+
+        auto result = allocation_strategy_->AllocateFrom(
+            allocator_manager, metadata.size, tgt_segment);
+        if (!result.has_value()) {
+            return Replica(nullptr, ReplicaStatus::UNDEFINED);
+        }
+        return std::move(*result);
+    }();
+
+    if (replica.status() == ReplicaStatus::UNDEFINED) {
+        LOG(ERROR) << "key=" << key << ", tgt_segment=" << tgt_segment
+                   << ", failed to allocate replica for cache local";
+        return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
+    }


MasterService::CacheLocalStart only treats an existing COMPLETED replica on tgt_segment as a conflict. If there is an existing replica on tgt_segment in PROCESSING/FAILED state (e.g., from a previous crashed cache_local attempt), this code will allocate an additional replica and leave the old one in metadata, potentially leaking memory/replicas. Consider revoking/cleaning up non-completed replicas on tgt_segment before allocating a new one, or reusing the existing replica when safe.

Copilot · 2026-02-07T12:31:32Z

mooncake-store/include/local_cache.h

+    /**
+     * @brief Look up a key and return the full cache entry.
+     * @return pointer to CacheEntry if found, nullptr otherwise.
+     *         The pointer is valid only while the caller holds no
+     *         write lock (i.e., use under shared lock externally,
+     *         or copy the entry).
+     */
+    std::optional<CacheEntry> LookupEntry(
+        const std::string& key) const {
+        std::shared_lock<std::shared_mutex> lock(mutex_);
+        auto it = cache_.find(key);
+        if (it != cache_.end()) {
+            return it->second;
+        }
+        return std::nullopt;
+    }


The LookupEntry() docstring describes returning a pointer with lifetime constraints, but the function actually returns std::optional by value (a copy). Please update the comment to match the actual API to avoid misuse/confusion.

Copilot · 2026-02-07T12:31:33Z

mooncake-store/src/real_client.cpp

+        if (size < total_size) {
+            LOG(ERROR) << "User buffer too small. Required: " << total_size
+                       << ", provided: " << size;
+            return -1;


RealClient::get_into returns -1 when the user buffer is too small on a cache hit, but other error paths use to_py_ret(ErrorCode) for consistent Python-facing error codes. Consider returning the appropriate error code (e.g., INVALID_PARAMS) via to_py_ret to keep error handling consistent.

Suggested change

return -1;

return to_py_ret(ErrorCode::INVALID_PARAMS);

Copilot · 2026-02-07T12:31:33Z

mooncake-store/src/real_client.cpp

+        auto get_result = client_->Get(key, qr, slices);
+        if (get_result) {
+            return static_cast<int64_t>(total_size);
+        }
+        // Cache hit but transfer failed, fall through to normal path
+        LOG(WARNING) << "Local cache transfer failed for key: " << key
+                     << ", falling back to normal get";
+    }


In RealClient::get_into, the cache-hit transfer failure path falls back to the normal get but leaves the failing cache entry intact. If the local replica is corrupted/invalid, this will cause repeated warnings and repeated failed cache-hit attempts. Consider erasing (and possibly revoking) the local cache entry when a cache-hit transfer fails.

Copilot · 2026-02-07T12:31:33Z

mooncake-store/src/real_client.cpp

+    // Cache the data locally for future calls
+    if (client_ && !local_cache_.Contains(key)) {
+        auto cache_result =
+            client_->CacheLocal(key, buffer, result.value());
+        if (cache_result) {
+            auto alloc_result =
+                client_buffer_allocator_->allocate(result.value());
+            if (alloc_result) {
+                memcpy(alloc_result->ptr(), buffer, result.value());
+                auto handle = std::make_shared<BufferHandle>(
+                    std::move(*alloc_result));
+                local_cache_.Insert(key, handle, cache_result.value(),
+                                    result.value());


Caching on get_into/batch_get_into copies from the user-provided buffer via memcpy into a newly allocated BufferHandle and also calls Client::CacheLocal which memcpy's from the same buffer into the store replica. If the user buffer can be GPU/device memory (the comment explicitly mentions GPU), these memcpy operations can crash or corrupt data. Either gate local_cache support to host memory only (with explicit checks) or perform device-aware copies via the transfer engine.

Suggested change

// Cache the data locally for future calls

if (client_ && !local_cache_.Contains(key)) {

auto cache_result =

client_->CacheLocal(key, buffer, result.value());

if (cache_result) {

auto alloc_result =

client_buffer_allocator_->allocate(result.value());

if (alloc_result) {

memcpy(alloc_result->ptr(), buffer, result.value());

auto handle = std::make_shared<BufferHandle>(

std::move(*alloc_result));

local_cache_.Insert(key, handle, cache_result.value(),

result.value());

// Cache the data locally for future calls.

// IMPORTANT: Do not read from the user-provided buffer here, as it may be

// device (e.g., GPU) memory. Instead, allocate a host buffer, re-fetch

// into it via the transfer engine, and cache from that host buffer.

if (client_ && !local_cache_.Contains(key)) {

auto alloc_result =

client_buffer_allocator_->allocate(result.value());

if (alloc_result) {

// Re-fetch the object into our own (host) buffer using the

// existing transfer engine. This avoids memcpy from a potentially

// device-resident user buffer.

auto cache_fetch =

get_into_internal(key, alloc_result->ptr(), result.value());

if (cache_fetch) {

auto cache_result =

client_->CacheLocal(key, alloc_result->ptr(), result.value());

if (cache_result) {

auto handle = std::make_shared<BufferHandle>(

std::move(*alloc_result));

local_cache_.Insert(key, handle, cache_result.value(),

result.value());

}

Copilot · 2026-02-07T12:31:34Z

mooncake-store/src/client_service.cpp

+    if (target_desc.is_memory_replica()) {
+        auto& mem_desc = target_desc.get_memory_descriptor();
+        void* target_addr =
+            reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);


Client::CacheLocal memcpy's data_size bytes into the allocated replica address without validating that data_size <= allocated buffer size (mem_desc.buffer_descriptor.size_). Add a bounds check to prevent buffer overflows/corruption if the caller passes a mismatched size.

Suggested change

reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);

reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);

const size_t target_size = mem_desc.buffer_descriptor.size_;

if (data_size > target_size) {

LOG(ERROR) << "CacheLocal: data_size (" << data_size

<< ") exceeds allocated buffer size (" << target_size

<< ") for key=" << key;

master_client_.CacheLocalRevoke(key, replica_id);

return tl::unexpected(ErrorCode::INVALID_PARAMS);

}

Copilot · 2026-02-07T12:31:35Z

mooncake-store/src/rpc_service.cpp

+tl::expected<Replica::Descriptor, ErrorCode>
+WrappedMasterService::CacheLocalStart(const UUID& client_id,
+                                      const std::string& key,
+                                      const std::string& tgt_segment) {
+    return execute_rpc(
+        "CacheLocalStart",
+        [&] {
+            return master_service_.CacheLocalStart(client_id, key, tgt_segment);
+        },
+        [&](auto& timer) {
+            timer.LogRequest("client_id=", client_id, ", key=", key,
+                             ", tgt_segment=", tgt_segment);
+        },
+        [] {}, [] {});
+}
+
+tl::expected<void, ErrorCode> WrappedMasterService::CacheLocalEnd(
+    const UUID& client_id, const std::string& key, ReplicaID replica_id) {
+    return execute_rpc(
+        "CacheLocalEnd",
+        [&] {
+            return master_service_.CacheLocalEnd(client_id, key, replica_id);
+        },
+        [&](auto& timer) {
+            timer.LogRequest("client_id=", client_id, ", key=", key,
+                             ", replica_id=", replica_id);
+        },
+        [] {}, [] {});
+}
+
+tl::expected<void, ErrorCode> WrappedMasterService::CacheLocalRevoke(
+    const UUID& client_id, const std::string& key, ReplicaID replica_id) {
+    return execute_rpc(
+        "CacheLocalRevoke",
+        [&] {
+            return master_service_.CacheLocalRevoke(client_id, key, replica_id);
+        },
+        [&](auto& timer) {
+            timer.LogRequest("client_id=", client_id, ", key=", key,
+                             ", replica_id=", replica_id);
+        },
+        [] {}, [] {});
+}


WrappedMasterService::{CacheLocalStart,CacheLocalEnd,CacheLocalRevoke} pass empty metric callbacks to execute_rpc, unlike other RPC wrappers in this file which increment request/failure metrics. If metrics are expected for operational visibility, add corresponding counters in MasterMetricManager and hook them up here (or add a clear comment explaining why metrics are intentionally omitted).

Copilot · 2026-02-07T12:31:35Z

scripts/test_local_cache.py

+    # Benchmark: cache miss vs hit latency
+    # ------------------------------------------
+    print("\n--- Benchmark: get_buffer cache hit speedup ---")
+    bench_data = b"\xAB" * BENCH_DATA_SIZE


Variable bench_data is not used.

Suggested change

bench_data = b"\xAB" * BENCH_DATA_SIZE

Copilot · 2026-02-07T12:31:35Z

scripts/test_local_cache.py

+    for _ in range(BENCH_ITERATIONS):
+        for key in BENCH_KEYS:
+            t0 = time.perf_counter()
+            buf = store.get_buffer(key, local_cache=True)


Variable buf is not used.

Suggested change

buf = store.get_buffer(key, local_cache=True)

buf = store.get_buffer(key, local_cache=True)

assert buf is not None

Copilot · 2026-02-07T12:31:35Z

scripts/test_local_cache.py

+    print("\n--- Benchmark: batch_get_buffer cache hit speedup ---")
+    t0 = time.perf_counter()
+    bufs = store.batch_get_buffer(BENCH_KEYS, local_cache=True)
+    t_batch_hit = time.perf_counter() - t0


Variable bufs is not used.

Suggested change

t_batch_hit = time.perf_counter() - t0

t_batch_hit = time.perf_counter() - t0

assert bufs is not None

assert len(bufs) == len(BENCH_KEYS)

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

XucSh · 2026-02-10T10:09:41Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a cache-on-get feature to improve performance for remote reads, with extensive changes across the codebase. However, two significant security vulnerabilities were identified: a potential Use-After-Free in client-side background thread management and a missing access control check in the master service. Additionally, a critical thread safety issue in the client-side caching implementation could lead to a crash on shutdown, and there's a minor inconsistency in metrics reporting for the new cache operation.

gemini-code-assist · 2026-02-10T10:13:31Z

mooncake-store/src/master_service.cpp

+tl::expected<CacheOnGetResponse, ErrorCode> MasterService::CacheOnGet(
+    const UUID& client_id, const std::string& key,
+    const std::string& local_segment) {


The MasterService::CacheOnGet function accepts a local_segment name from the client but fails to verify that this segment is actually owned by or associated with the requesting client_id. A malicious client could provide the name of a segment belonging to another node/client. The master would then proceed to allocate memory on that remote segment and return a descriptor (including access keys) to the attacker. This allows an attacker to exhaust memory on arbitrary nodes in the cluster or potentially gain unauthorized access to memory regions on other nodes.

Implement a check in MasterService::CacheOnGet to verify that the local_segment provided in the request is owned by the client identified by client_id.

mooncake-store/src/real_client.cpp

gemini-code-assist · 2026-02-10T10:13:31Z

mooncake-store/src/rpc_service.cpp

+        [] { MasterMetricManager::instance().inc_copy_start_requests(); },
+        [] { MasterMetricManager::instance().inc_copy_start_failures(); });


The metrics being incremented here (inc_copy_start_requests, inc_copy_start_failures) appear to be for a CopyStart operation, but this function is for CacheOnGet. This could lead to incorrect metrics. It would be better to add and use dedicated metrics for CacheOnGet to ensure accurate monitoring. For example (assuming new metric names):

Suggested change

[] { MasterMetricManager::instance().inc_copy_start_requests(); },

[] { MasterMetricManager::instance().inc_copy_start_failures(); });

[] { MasterMetricManager::instance().inc_cache_on_get_requests(); },

[] { MasterMetricManager::instance().inc_cache_on_get_failures(); });

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

XucSh requested review from YiXR, stmatengss and ykwd as code owners February 7, 2026 10:56

github-actions bot added run-ci Store labels Feb 7, 2026

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

stmatengss requested a review from Copilot February 7, 2026 12:25

Copilot started reviewing on behalf of stmatengss February 7, 2026 12:26 View session

Copilot AI reviewed Feb 7, 2026

View reviewed changes

support cache for get api

e1d8bae

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

XucSh force-pushed the cache_op branch from 1c1ac65 to e1d8bae Compare February 10, 2026 08:34

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

XucSh added 4 commits February 10, 2026 18:45

update

e8b297a

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

update

787a1f6

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

update

5e0078b

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

update

2561bb0

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

XucSh changed the title ~~[WIP]Add cache op for get api~~ Add cache op for get api Feb 10, 2026

XucSh mentioned this pull request Feb 10, 2026

Support local cache with get api #1530

Closed

16 tasks

update

9874a8c

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

-            reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);
+            reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);
+        const size_t target_size = mem_desc.buffer_descriptor.size_;
+        if (data_size > target_size) {
+            LOG(ERROR) << "CacheLocal: data_size (" << data_size
+                       << ") exceeds allocated buffer size (" << target_size
+                       << ") for key=" << key;
+            master_client_.CacheLocalRevoke(key, replica_id);
+            return tl::unexpected(ErrorCode::INVALID_PARAMS);
+        }

	buf = store.get_buffer(key, local_cache=True)
	buf = store.get_buffer(key, local_cache=True)
	assert buf is not None

		[] { MasterMetricManager::instance().inc_copy_start_requests(); },
		[] { MasterMetricManager::instance().inc_copy_start_failures(); });

Comments

Conversation

XucSh commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

How Has This Been Tested?

Checklist

Uh oh!

gemini-code-assist bot commented Feb 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

XucSh Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

XucSh commented Feb 10, 2026

Uh oh!

XucSh commented Feb 7, 2026 •

edited

Loading

codecov-commenter commented Feb 7, 2026 •

edited

Loading