Skip to content

Comments

Add cache op for get api#1516

Open
XucSh wants to merge 6 commits intokvcache-ai:mainfrom
XucSh:cache_op
Open

Add cache op for get api#1516
XucSh wants to merge 6 commits intokvcache-ai:mainfrom
XucSh:cache_op

Conversation

@XucSh
Copy link
Collaborator

@XucSh XucSh commented Feb 7, 2026

Description

--- Benchmark: Data size sweep (get_buffer) ---
Size | Avg Miss | Avg Hit | Speedup | Throughput
---------+------------+------------+----------+-------------
4 KB | 0.292ms | 0.081ms | 3.6x | 0.05 GB/s
64 KB | 0.227ms | 0.091ms | 2.5x | 0.67 GB/s
256 KB | 0.254ms | 0.108ms | 2.4x | 2.26 GB/s
1 MB | 0.573ms | 0.266ms | 2.2x | 3.67 GB/s
4 MB | 1.757ms | 0.865ms | 2.0x | 4.52 GB/s

--- Benchmark: Data size sweep (get_into) ---
Size | Avg Miss | Avg Hit | Speedup | Throughput
---------+------------+------------+----------+-------------
4 KB | 0.136ms | 0.079ms | 1.7x | 0.05 GB/s
64 KB | 0.093ms | 0.086ms | 1.1x | 0.71 GB/s
256 KB | 0.111ms | 0.108ms | 1.0x | 2.27 GB/s
1 MB | 0.273ms | 0.266ms | 1.0x | 3.67 GB/s
4 MB | 0.848ms | 0.865ms | 1.0x | 4.52 GB/s

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have formatted my own code using ./scripts/code_format.sh before submitting.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @XucSh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Mooncake Store's data retrieval performance by integrating a local caching layer. It introduces client-side caching for get operations, allowing clients to store and retrieve data locally without repeatedly querying the master or performing remote transfers. This is achieved through a new LocalCache component and a set of master-coordinated RPCs that manage local replica allocation. The changes aim to reduce network latency and improve overall efficiency for frequently accessed data.

Highlights

  • Local Cache Implementation: Introduced a new LocalCache class for client-side data caching, designed to be thread-safe and used by the RealClient to store and retrieve BufferHandle objects locally.
  • Extended Get API with Caching Option: Extended get_buffer, batch_get_buffer, get_into, and batch_get_into methods across MooncakeStorePyWrapper, PyClient, DummyClient, and RealClient to include an optional local_cache boolean parameter, allowing callers to explicitly request local caching.
  • Master-Coordinated Local Replica Management: Added new RPC operations (CacheLocalStart, CacheLocalEnd, CacheLocalRevoke) to MasterClient, MasterService, and WrappedMasterService to enable master-coordinated allocation and management of local replicas for caching, ensuring consistency in the distributed system.
  • Cache Invalidation on Data Removal: Ensured cache consistency by modifying remove, removeByRegex, and removeAll operations in RealClient to clear corresponding entries from the local cache when data is deleted or invalidated.
  • New Integration Test for Local Cache: Included a new Python integration test (scripts/test_local_cache.py) to validate the local caching mechanism in a two-machine setup, covering various get operations and demonstrating performance improvements from cache hits.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • mooncake-integration/store/store_py.cpp
    • Modified Python bindings for get_buffer, batch_get_buffer, get_into, and batch_get_into to accept a new local_cache boolean argument, defaulting to false.
  • mooncake-store/include/client_service.h
    • Declared a new Client::CacheLocal method for local replica allocation and data copying.
  • mooncake-store/include/dummy_client.h
    • Updated method signatures for get_into, batch_get_into, get_buffer, and batch_get_buffer to include the local_cache parameter.
  • mooncake-store/include/local_cache.h
    • Added a new header defining the LocalCache class, which provides thread-safe storage and lookup for BufferHandles.
  • mooncake-store/include/master_client.h
    • Declared new RPC client methods: CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
  • mooncake-store/include/master_service.h
    • Declared new master service methods: CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke for managing local replica lifecycle.
  • mooncake-store/include/pyclient.h
    • Updated virtual method signatures for get_into, batch_get_into, get_buffer, and batch_get_buffer to include the local_cache parameter.
  • mooncake-store/include/real_client.h
    • Included local_cache.h.
    • Added local_cache_ member to the RealClient class.
    • Updated method signatures for get_into, batch_get_into, get_buffer, and batch_get_buffer to support local caching.
  • mooncake-store/include/rpc_service.h
    • Declared new RPC service methods for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
  • mooncake-store/src/client_service.cpp
    • Implemented Client::CacheLocal to coordinate with the master for local replica creation and data population.
  • mooncake-store/src/dummy_client.cpp
    • Updated implementations of get_buffer, batch_get_buffer, get_into, and batch_get_into to match new signatures, passing local_cache parameter.
  • mooncake-store/src/master_client.cpp
    • Added RpcNameTraits specializations for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
    • Implemented MasterClient methods for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke to invoke corresponding RPCs.
  • mooncake-store/src/master_service.cpp
    • Implemented MasterService::CacheLocalStart to allocate a new memory replica on the target segment.
    • Implemented MasterService::CacheLocalEnd to mark a replica as complete.
    • Implemented MasterService::CacheLocalRevoke to remove an allocated replica.
  • mooncake-store/src/real_client.cpp
    • Modified remove, removeByRegex, and removeAll to clear corresponding entries from the local_cache_.
    • Implemented local cache lookup and insertion logic within get_buffer, batch_get_buffer, get_into, and batch_get_into methods.
    • Added logic to handle cache hits (local data transfer) and misses (remote fetch) for get_into and batch_get_into.
  • mooncake-store/src/rpc_service.cpp
    • Implemented WrappedMasterService methods for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
    • Registered new RPC handlers for CacheLocalStart, CacheLocalEnd, and CacheLocalRevoke.
  • scripts/test_local_cache.py
    • Added a new Python script for end-to-end testing of the local cache feature in a two-machine setup, including benchmarks for cache hit/miss latency.
Activity
  • The pull request is currently marked as a Work In Progress (WIP).
  • The author, XucSh, has indicated that testing on a two-machine setup is still in progress.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a local caching mechanism for get operations to improve performance by avoiding redundant remote data transfers. However, a critical stale cache vulnerability exists in the get_into and batch_get_into paths, which could lead to information leaks if memory is reallocated by the master service. Additionally, there's a race condition in the master service that could cause resource exhaustion, and inconsistent use of memcpy instead of the transfer engine might lead to crashes with device memory. The current implementation also has performance inefficiencies in the cache population logic for get_into and batch_get_into operations, resulting in extra memory copies, and a minor documentation inaccuracy in the new LocalCache class.

Comment on lines 1453 to 1466
auto entry = local_cache_.LookupEntry(key);
if (entry) {
const auto &replica = entry->replica_desc;
uint64_t total_size = calculate_total_size(replica);
if (size < total_size) {
LOG(ERROR) << "User buffer too small. Required: " << total_size
<< ", provided: " << size;
return -1;
}
std::vector<mooncake::Slice> slices;
allocateSlices(slices, replica, buffer);
QueryResult qr({replica},
std::chrono::steady_clock::time_point::max());
auto get_result = client_->Get(key, qr, slices);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The local_cache_ implementation is vulnerable to an information leak due to stale cache entries. The cache hit path in get_into uses a Replica::Descriptor stored in the cache to perform a data transfer from a master-managed memory segment. However, the cache does not implement any lease validation or invalidation mechanism. If an object is removed or its lease expires on the master, the underlying memory may be reallocated to a different object. A subsequent cache hit will then read data from the new object, potentially leaking sensitive information belonging to another user or task.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. will fix it

Comment on lines 1692 to 1730
auto entry = local_cache_.LookupEntry(keys[i]);
if (entry) {
const auto &replica = entry->replica_desc;
uint64_t total_size = calculate_total_size(replica);
if (sizes[i] >= total_size) {
std::vector<Slice> slices;
allocateSlices(slices, replica, buffers[i]);
cache_hits.push_back(
{.original_index = i,
.key = keys[i],
.query_result = QueryResult(
{replica},
std::chrono::steady_clock::time_point::max()),
.slices = std::move(slices),
.total_size = total_size});
results[i] = static_cast<int64_t>(total_size);
continue;
}
}
miss_keys.push_back(keys[i]);
miss_buffers.push_back(buffers[i]);
miss_sizes.push_back(sizes[i]);
miss_indices.push_back(i);
}

// BatchGet for cache hits
if (!cache_hits.empty()) {
std::vector<std::string> hit_keys;
std::vector<QueryResult> hit_qrs;
std::unordered_map<std::string, std::vector<Slice>> hit_slices;
hit_keys.reserve(cache_hits.size());
hit_qrs.reserve(cache_hits.size());
for (auto &hit : cache_hits) {
hit_keys.push_back(hit.key);
hit_qrs.push_back(std::move(hit.query_result));
hit_slices[hit.key] = std::move(hit.slices);
}
auto batch_results =
client_->BatchGet(hit_keys, hit_qrs, hit_slices);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Similar to the get_into method, batch_get_into is also vulnerable to information leaks from stale cache entries. It uses cached Replica::Descriptor objects to read from memory segments that may have been reallocated by the master service after the original object was removed or its lease expired.

auto alloc_result =
client_buffer_allocator_->allocate(result.value());
if (alloc_result) {
memcpy(alloc_result->ptr(), buffer, result.value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The get_into and batch_get_into methods currently use memcpy to populate the local cache from a user-provided buffer. This is a critical issue as it can lead to a process crash if the buffer resides in device memory; the transfer engine should be used instead to ensure compatibility with all supported memory types. Furthermore, the cache population logic on a cache miss for get_into is inefficient, involving multiple memcpy operations. A more efficient approach would be to reuse the caching logic from get_buffer to avoid duplicating logic and eliminate extra memory copies. This inefficiency also applies to batch_get_into.

Comment on lines 1072 to 1075
auto* existing = metadata.GetReplicaBySegmentName(tgt_segment);
if (existing != nullptr && existing->is_completed()) {
return tl::make_unexpected(ErrorCode::OBJECT_ALREADY_EXISTS);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

A race condition in CacheLocalStart allows for redundant memory allocations on the same segment for the same key. The code only checks if a completed replica exists. If multiple concurrent requests are made for the same key before the first allocation is marked as complete, the master will proceed with multiple allocations. This can be exploited to exhaust the memory capacity of a segment, leading to a Denial of Service (DoS).

auto& mem_desc = target_desc.get_memory_descriptor();
void* target_addr =
reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);
memcpy(target_addr, data_ptr, data_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The use of memcpy for data transfer in CacheLocal is unsafe when dealing with device memory (e.g., GPU or NPU memory). While the cache hit path correctly utilizes the transfer engine, this path uses memcpy, which will cause a process crash or undefined behavior if the data_ptr points to device memory. This is particularly relevant for protocols like ascend where device memory is common.

Comment on lines 42 to 47
* @brief Look up a key and return the full cache entry.
* @return pointer to CacheEntry if found, nullptr otherwise.
* The pointer is valid only while the caller holds no
* write lock (i.e., use under shared lock externally,
* or copy the entry).
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Doxygen comment for LookupEntry is inaccurate. It states that the function returns a pointer and discusses pointer validity, but the function actually returns a std::optional<CacheEntry>, which is a value type (a copy of the entry). This can be misleading for developers using this class. The comment should be updated to reflect the actual return type and its semantics.

    /**
     * @brief Look up a key and return the full cache entry.
     * @return An `std::optional` containing a copy of the `CacheEntry` if found,
     *         `std::nullopt` otherwise.
     */

@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2026

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “local_cache” option for GET-style APIs in Mooncake Store, adding master-coordinated local replica allocation to avoid redundant remote transfers, and exposes the flag through the C++/Python client surfaces. It also adds a two-machine script to manually validate cache hit behavior.

Changes:

  • Add new master RPCs and master logic for cache-local replica allocation/completion/revocation (CacheLocalStart/End/Revoke).
  • Extend RealClient/PyClient APIs (get_buffer, batch_get_buffer, get_into, batch_get_into) with a local_cache flag and implement a per-process LocalCache.
  • Expose local_cache to Python bindings and add a two-machine validation script.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
scripts/test_local_cache.py Adds a two-machine manual test/benchmark script for local cache behavior.
mooncake-store/src/rpc_service.cpp Wraps and registers new master RPC handlers for cache-local operations.
mooncake-store/src/real_client.cpp Implements local cache lookups/population and clears cache on remove paths.
mooncake-store/src/master_service.cpp Implements CacheLocalStart/End/Revoke in the master service.
mooncake-store/src/master_client.cpp Adds master client RPC name traits and client methods for cache-local ops.
mooncake-store/src/dummy_client.cpp Updates dummy client method signatures to include local_cache parameters.
mooncake-store/src/client_service.cpp Adds Client::CacheLocal to allocate and populate a local replica via the master.
mooncake-store/include/rpc_service.h Declares WrappedMasterService cache-local RPC wrappers.
mooncake-store/include/real_client.h Adds local_cache parameters and introduces a LocalCache member.
mooncake-store/include/pyclient.h Extends virtual API with local_cache parameters for get/get_into methods.
mooncake-store/include/master_service.h Declares CacheLocalStart/End/Revoke and documents behavior.
mooncake-store/include/master_client.h Declares CacheLocalStart/End/Revoke methods on MasterClient.
mooncake-store/include/local_cache.h Introduces a thread-safe in-process LocalCache container.
mooncake-store/include/dummy_client.h Updates dummy client method signatures with local_cache defaults.
mooncake-store/include/client_service.h Declares Client::CacheLocal helper API.
mooncake-integration/store/store_py.cpp Exposes local_cache flag to Python bindings for get/get_into APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1071 to 1095
// Check if the target segment already has a completed replica
auto* existing = metadata.GetReplicaBySegmentName(tgt_segment);
if (existing != nullptr && existing->is_completed()) {
return tl::make_unexpected(ErrorCode::OBJECT_ALREADY_EXISTS);
}

// Allocate a new replica on the target segment
Replica replica = [&]() -> Replica {
ScopedAllocatorAccess allocator_access =
segment_manager_.getAllocatorAccess();
const auto& allocator_manager = allocator_access.getAllocatorManager();

auto result = allocation_strategy_->AllocateFrom(
allocator_manager, metadata.size, tgt_segment);
if (!result.has_value()) {
return Replica(nullptr, ReplicaStatus::UNDEFINED);
}
return std::move(*result);
}();

if (replica.status() == ReplicaStatus::UNDEFINED) {
LOG(ERROR) << "key=" << key << ", tgt_segment=" << tgt_segment
<< ", failed to allocate replica for cache local";
return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MasterService::CacheLocalStart only treats an existing COMPLETED replica on tgt_segment as a conflict. If there is an existing replica on tgt_segment in PROCESSING/FAILED state (e.g., from a previous crashed cache_local attempt), this code will allocate an additional replica and leave the old one in metadata, potentially leaking memory/replicas. Consider revoking/cleaning up non-completed replicas on tgt_segment before allocating a new one, or reusing the existing replica when safe.

Copilot uses AI. Check for mistakes.
Comment on lines 41 to 56
/**
* @brief Look up a key and return the full cache entry.
* @return pointer to CacheEntry if found, nullptr otherwise.
* The pointer is valid only while the caller holds no
* write lock (i.e., use under shared lock externally,
* or copy the entry).
*/
std::optional<CacheEntry> LookupEntry(
const std::string& key) const {
std::shared_lock<std::shared_mutex> lock(mutex_);
auto it = cache_.find(key);
if (it != cache_.end()) {
return it->second;
}
return std::nullopt;
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LookupEntry() docstring describes returning a pointer with lifetime constraints, but the function actually returns std::optional by value (a copy). Please update the comment to match the actual API to avoid misuse/confusion.

Copilot uses AI. Check for mistakes.
if (size < total_size) {
LOG(ERROR) << "User buffer too small. Required: " << total_size
<< ", provided: " << size;
return -1;
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RealClient::get_into returns -1 when the user buffer is too small on a cache hit, but other error paths use to_py_ret(ErrorCode) for consistent Python-facing error codes. Consider returning the appropriate error code (e.g., INVALID_PARAMS) via to_py_ret to keep error handling consistent.

Suggested change
return -1;
return to_py_ret(ErrorCode::INVALID_PARAMS);

Copilot uses AI. Check for mistakes.
Comment on lines 1466 to 1473
auto get_result = client_->Get(key, qr, slices);
if (get_result) {
return static_cast<int64_t>(total_size);
}
// Cache hit but transfer failed, fall through to normal path
LOG(WARNING) << "Local cache transfer failed for key: " << key
<< ", falling back to normal get";
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In RealClient::get_into, the cache-hit transfer failure path falls back to the normal get but leaves the failing cache entry intact. If the local replica is corrupted/invalid, this will cause repeated warnings and repeated failed cache-hit attempts. Consider erasing (and possibly revoking) the local cache entry when a cache-hit transfer fails.

Copilot uses AI. Check for mistakes.
Comment on lines 1481 to 1493
// Cache the data locally for future calls
if (client_ && !local_cache_.Contains(key)) {
auto cache_result =
client_->CacheLocal(key, buffer, result.value());
if (cache_result) {
auto alloc_result =
client_buffer_allocator_->allocate(result.value());
if (alloc_result) {
memcpy(alloc_result->ptr(), buffer, result.value());
auto handle = std::make_shared<BufferHandle>(
std::move(*alloc_result));
local_cache_.Insert(key, handle, cache_result.value(),
result.value());
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching on get_into/batch_get_into copies from the user-provided buffer via memcpy into a newly allocated BufferHandle and also calls Client::CacheLocal which memcpy's from the same buffer into the store replica. If the user buffer can be GPU/device memory (the comment explicitly mentions GPU), these memcpy operations can crash or corrupt data. Either gate local_cache support to host memory only (with explicit checks) or perform device-aware copies via the transfer engine.

Suggested change
// Cache the data locally for future calls
if (client_ && !local_cache_.Contains(key)) {
auto cache_result =
client_->CacheLocal(key, buffer, result.value());
if (cache_result) {
auto alloc_result =
client_buffer_allocator_->allocate(result.value());
if (alloc_result) {
memcpy(alloc_result->ptr(), buffer, result.value());
auto handle = std::make_shared<BufferHandle>(
std::move(*alloc_result));
local_cache_.Insert(key, handle, cache_result.value(),
result.value());
// Cache the data locally for future calls.
// IMPORTANT: Do not read from the user-provided buffer here, as it may be
// device (e.g., GPU) memory. Instead, allocate a host buffer, re-fetch
// into it via the transfer engine, and cache from that host buffer.
if (client_ && !local_cache_.Contains(key)) {
auto alloc_result =
client_buffer_allocator_->allocate(result.value());
if (alloc_result) {
// Re-fetch the object into our own (host) buffer using the
// existing transfer engine. This avoids memcpy from a potentially
// device-resident user buffer.
auto cache_fetch =
get_into_internal(key, alloc_result->ptr(), result.value());
if (cache_fetch) {
auto cache_result =
client_->CacheLocal(key, alloc_result->ptr(), result.value());
if (cache_result) {
auto handle = std::make_shared<BufferHandle>(
std::move(*alloc_result));
local_cache_.Insert(key, handle, cache_result.value(),
result.value());
}

Copilot uses AI. Check for mistakes.
if (target_desc.is_memory_replica()) {
auto& mem_desc = target_desc.get_memory_descriptor();
void* target_addr =
reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client::CacheLocal memcpy's data_size bytes into the allocated replica address without validating that data_size <= allocated buffer size (mem_desc.buffer_descriptor.size_). Add a bounds check to prevent buffer overflows/corruption if the caller passes a mismatched size.

Suggested change
reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);
reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_);
const size_t target_size = mem_desc.buffer_descriptor.size_;
if (data_size > target_size) {
LOG(ERROR) << "CacheLocal: data_size (" << data_size
<< ") exceeds allocated buffer size (" << target_size
<< ") for key=" << key;
master_client_.CacheLocalRevoke(key, replica_id);
return tl::unexpected(ErrorCode::INVALID_PARAMS);
}

Copilot uses AI. Check for mistakes.
Comment on lines 768 to 810
tl::expected<Replica::Descriptor, ErrorCode>
WrappedMasterService::CacheLocalStart(const UUID& client_id,
const std::string& key,
const std::string& tgt_segment) {
return execute_rpc(
"CacheLocalStart",
[&] {
return master_service_.CacheLocalStart(client_id, key, tgt_segment);
},
[&](auto& timer) {
timer.LogRequest("client_id=", client_id, ", key=", key,
", tgt_segment=", tgt_segment);
},
[] {}, [] {});
}

tl::expected<void, ErrorCode> WrappedMasterService::CacheLocalEnd(
const UUID& client_id, const std::string& key, ReplicaID replica_id) {
return execute_rpc(
"CacheLocalEnd",
[&] {
return master_service_.CacheLocalEnd(client_id, key, replica_id);
},
[&](auto& timer) {
timer.LogRequest("client_id=", client_id, ", key=", key,
", replica_id=", replica_id);
},
[] {}, [] {});
}

tl::expected<void, ErrorCode> WrappedMasterService::CacheLocalRevoke(
const UUID& client_id, const std::string& key, ReplicaID replica_id) {
return execute_rpc(
"CacheLocalRevoke",
[&] {
return master_service_.CacheLocalRevoke(client_id, key, replica_id);
},
[&](auto& timer) {
timer.LogRequest("client_id=", client_id, ", key=", key,
", replica_id=", replica_id);
},
[] {}, [] {});
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WrappedMasterService::{CacheLocalStart,CacheLocalEnd,CacheLocalRevoke} pass empty metric callbacks to execute_rpc, unlike other RPC wrappers in this file which increment request/failure metrics. If metrics are expected for operational visibility, add corresponding counters in MasterMetricManager and hook them up here (or add a clear comment explaining why metrics are intentionally omitted).

Copilot uses AI. Check for mistakes.
# Benchmark: cache miss vs hit latency
# ------------------------------------------
print("\n--- Benchmark: get_buffer cache hit speedup ---")
bench_data = b"\xAB" * BENCH_DATA_SIZE
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable bench_data is not used.

Suggested change
bench_data = b"\xAB" * BENCH_DATA_SIZE

Copilot uses AI. Check for mistakes.
for _ in range(BENCH_ITERATIONS):
for key in BENCH_KEYS:
t0 = time.perf_counter()
buf = store.get_buffer(key, local_cache=True)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable buf is not used.

Suggested change
buf = store.get_buffer(key, local_cache=True)
buf = store.get_buffer(key, local_cache=True)
assert buf is not None

Copilot uses AI. Check for mistakes.
print("\n--- Benchmark: batch_get_buffer cache hit speedup ---")
t0 = time.perf_counter()
bufs = store.batch_get_buffer(BENCH_KEYS, local_cache=True)
t_batch_hit = time.perf_counter() - t0
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable bufs is not used.

Suggested change
t_batch_hit = time.perf_counter() - t0
t_batch_hit = time.perf_counter() - t0
assert bufs is not None
assert len(bufs) == len(BENCH_KEYS)

Copilot uses AI. Check for mistakes.
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
@XucSh
Copy link
Collaborator Author

XucSh commented Feb 10, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a cache-on-get feature to improve performance for remote reads, with extensive changes across the codebase. However, two significant security vulnerabilities were identified: a potential Use-After-Free in client-side background thread management and a missing access control check in the master service. Additionally, a critical thread safety issue in the client-side caching implementation could lead to a crash on shutdown, and there's a minor inconsistency in metrics reporting for the new cache operation.

Comment on lines +1060 to +1062
tl::expected<CacheOnGetResponse, ErrorCode> MasterService::CacheOnGet(
const UUID& client_id, const std::string& key,
const std::string& local_segment) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The MasterService::CacheOnGet function accepts a local_segment name from the client but fails to verify that this segment is actually owned by or associated with the requesting client_id. A malicious client could provide the name of a segment belonging to another node/client. The master would then proceed to allocate memory on that remote segment and return a descriptor (including access keys) to the attacker. This allows an attacker to exhaust memory on arbitrary nodes in the cluster or potentially gain unauthorized access to memory regions on other nodes.

Implement a check in MasterService::CacheOnGet to verify that the local_segment provided in the request is owned by the client identified by client_id.

Comment on lines +780 to +781
[] { MasterMetricManager::instance().inc_copy_start_requests(); },
[] { MasterMetricManager::instance().inc_copy_start_failures(); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The metrics being incremented here (inc_copy_start_requests, inc_copy_start_failures) appear to be for a CopyStart operation, but this function is for CacheOnGet. This could lead to incorrect metrics. It would be better to add and use dedicated metrics for CacheOnGet to ensure accurate monitoring. For example (assuming new metric names):

Suggested change
[] { MasterMetricManager::instance().inc_copy_start_requests(); },
[] { MasterMetricManager::instance().inc_copy_start_failures(); });
[] { MasterMetricManager::instance().inc_cache_on_get_requests(); },
[] { MasterMetricManager::instance().inc_cache_on_get_failures(); });

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
@XucSh XucSh changed the title [WIP]Add cache op for get api Add cache op for get api Feb 10, 2026
@XucSh XucSh mentioned this pull request Feb 10, 2026
16 tasks
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants