Conversation
Summary of ChangesHello @XucSh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Mooncake Store's data retrieval performance by integrating a local caching layer. It introduces client-side caching for Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a local caching mechanism for get operations to improve performance by avoiding redundant remote data transfers. However, a critical stale cache vulnerability exists in the get_into and batch_get_into paths, which could lead to information leaks if memory is reallocated by the master service. Additionally, there's a race condition in the master service that could cause resource exhaustion, and inconsistent use of memcpy instead of the transfer engine might lead to crashes with device memory. The current implementation also has performance inefficiencies in the cache population logic for get_into and batch_get_into operations, resulting in extra memory copies, and a minor documentation inaccuracy in the new LocalCache class.
mooncake-store/src/real_client.cpp
Outdated
| auto entry = local_cache_.LookupEntry(key); | ||
| if (entry) { | ||
| const auto &replica = entry->replica_desc; | ||
| uint64_t total_size = calculate_total_size(replica); | ||
| if (size < total_size) { | ||
| LOG(ERROR) << "User buffer too small. Required: " << total_size | ||
| << ", provided: " << size; | ||
| return -1; | ||
| } | ||
| std::vector<mooncake::Slice> slices; | ||
| allocateSlices(slices, replica, buffer); | ||
| QueryResult qr({replica}, | ||
| std::chrono::steady_clock::time_point::max()); | ||
| auto get_result = client_->Get(key, qr, slices); |
There was a problem hiding this comment.
The local_cache_ implementation is vulnerable to an information leak due to stale cache entries. The cache hit path in get_into uses a Replica::Descriptor stored in the cache to perform a data transfer from a master-managed memory segment. However, the cache does not implement any lease validation or invalidation mechanism. If an object is removed or its lease expires on the master, the underlying memory may be reallocated to a different object. A subsequent cache hit will then read data from the new object, potentially leaking sensitive information belonging to another user or task.
There was a problem hiding this comment.
good point. will fix it
mooncake-store/src/real_client.cpp
Outdated
| auto entry = local_cache_.LookupEntry(keys[i]); | ||
| if (entry) { | ||
| const auto &replica = entry->replica_desc; | ||
| uint64_t total_size = calculate_total_size(replica); | ||
| if (sizes[i] >= total_size) { | ||
| std::vector<Slice> slices; | ||
| allocateSlices(slices, replica, buffers[i]); | ||
| cache_hits.push_back( | ||
| {.original_index = i, | ||
| .key = keys[i], | ||
| .query_result = QueryResult( | ||
| {replica}, | ||
| std::chrono::steady_clock::time_point::max()), | ||
| .slices = std::move(slices), | ||
| .total_size = total_size}); | ||
| results[i] = static_cast<int64_t>(total_size); | ||
| continue; | ||
| } | ||
| } | ||
| miss_keys.push_back(keys[i]); | ||
| miss_buffers.push_back(buffers[i]); | ||
| miss_sizes.push_back(sizes[i]); | ||
| miss_indices.push_back(i); | ||
| } | ||
|
|
||
| // BatchGet for cache hits | ||
| if (!cache_hits.empty()) { | ||
| std::vector<std::string> hit_keys; | ||
| std::vector<QueryResult> hit_qrs; | ||
| std::unordered_map<std::string, std::vector<Slice>> hit_slices; | ||
| hit_keys.reserve(cache_hits.size()); | ||
| hit_qrs.reserve(cache_hits.size()); | ||
| for (auto &hit : cache_hits) { | ||
| hit_keys.push_back(hit.key); | ||
| hit_qrs.push_back(std::move(hit.query_result)); | ||
| hit_slices[hit.key] = std::move(hit.slices); | ||
| } | ||
| auto batch_results = | ||
| client_->BatchGet(hit_keys, hit_qrs, hit_slices); |
There was a problem hiding this comment.
mooncake-store/src/real_client.cpp
Outdated
| auto alloc_result = | ||
| client_buffer_allocator_->allocate(result.value()); | ||
| if (alloc_result) { | ||
| memcpy(alloc_result->ptr(), buffer, result.value()); |
There was a problem hiding this comment.
The get_into and batch_get_into methods currently use memcpy to populate the local cache from a user-provided buffer. This is a critical issue as it can lead to a process crash if the buffer resides in device memory; the transfer engine should be used instead to ensure compatibility with all supported memory types. Furthermore, the cache population logic on a cache miss for get_into is inefficient, involving multiple memcpy operations. A more efficient approach would be to reuse the caching logic from get_buffer to avoid duplicating logic and eliminate extra memory copies. This inefficiency also applies to batch_get_into.
| auto* existing = metadata.GetReplicaBySegmentName(tgt_segment); | ||
| if (existing != nullptr && existing->is_completed()) { | ||
| return tl::make_unexpected(ErrorCode::OBJECT_ALREADY_EXISTS); | ||
| } |
There was a problem hiding this comment.
A race condition in CacheLocalStart allows for redundant memory allocations on the same segment for the same key. The code only checks if a completed replica exists. If multiple concurrent requests are made for the same key before the first allocation is marked as complete, the master will proceed with multiple allocations. This can be exploited to exhaust the memory capacity of a segment, leading to a Denial of Service (DoS).
| auto& mem_desc = target_desc.get_memory_descriptor(); | ||
| void* target_addr = | ||
| reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_); | ||
| memcpy(target_addr, data_ptr, data_size); |
There was a problem hiding this comment.
The use of memcpy for data transfer in CacheLocal is unsafe when dealing with device memory (e.g., GPU or NPU memory). While the cache hit path correctly utilizes the transfer engine, this path uses memcpy, which will cause a process crash or undefined behavior if the data_ptr points to device memory. This is particularly relevant for protocols like ascend where device memory is common.
mooncake-store/include/local_cache.h
Outdated
| * @brief Look up a key and return the full cache entry. | ||
| * @return pointer to CacheEntry if found, nullptr otherwise. | ||
| * The pointer is valid only while the caller holds no | ||
| * write lock (i.e., use under shared lock externally, | ||
| * or copy the entry). | ||
| */ |
There was a problem hiding this comment.
The Doxygen comment for LookupEntry is inaccurate. It states that the function returns a pointer and discusses pointer validity, but the function actually returns a std::optional<CacheEntry>, which is a value type (a copy of the entry). This can be misleading for developers using this class. The comment should be updated to reflect the actual return type and its semantics.
/**
* @brief Look up a key and return the full cache entry.
* @return An `std::optional` containing a copy of the `CacheEntry` if found,
* `std::nullopt` otherwise.
*/|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR introduces a “local_cache” option for GET-style APIs in Mooncake Store, adding master-coordinated local replica allocation to avoid redundant remote transfers, and exposes the flag through the C++/Python client surfaces. It also adds a two-machine script to manually validate cache hit behavior.
Changes:
- Add new master RPCs and master logic for cache-local replica allocation/completion/revocation (CacheLocalStart/End/Revoke).
- Extend RealClient/PyClient APIs (get_buffer, batch_get_buffer, get_into, batch_get_into) with a
local_cacheflag and implement a per-process LocalCache. - Expose
local_cacheto Python bindings and add a two-machine validation script.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/test_local_cache.py | Adds a two-machine manual test/benchmark script for local cache behavior. |
| mooncake-store/src/rpc_service.cpp | Wraps and registers new master RPC handlers for cache-local operations. |
| mooncake-store/src/real_client.cpp | Implements local cache lookups/population and clears cache on remove paths. |
| mooncake-store/src/master_service.cpp | Implements CacheLocalStart/End/Revoke in the master service. |
| mooncake-store/src/master_client.cpp | Adds master client RPC name traits and client methods for cache-local ops. |
| mooncake-store/src/dummy_client.cpp | Updates dummy client method signatures to include local_cache parameters. |
| mooncake-store/src/client_service.cpp | Adds Client::CacheLocal to allocate and populate a local replica via the master. |
| mooncake-store/include/rpc_service.h | Declares WrappedMasterService cache-local RPC wrappers. |
| mooncake-store/include/real_client.h | Adds local_cache parameters and introduces a LocalCache member. |
| mooncake-store/include/pyclient.h | Extends virtual API with local_cache parameters for get/get_into methods. |
| mooncake-store/include/master_service.h | Declares CacheLocalStart/End/Revoke and documents behavior. |
| mooncake-store/include/master_client.h | Declares CacheLocalStart/End/Revoke methods on MasterClient. |
| mooncake-store/include/local_cache.h | Introduces a thread-safe in-process LocalCache container. |
| mooncake-store/include/dummy_client.h | Updates dummy client method signatures with local_cache defaults. |
| mooncake-store/include/client_service.h | Declares Client::CacheLocal helper API. |
| mooncake-integration/store/store_py.cpp | Exposes local_cache flag to Python bindings for get/get_into APIs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Check if the target segment already has a completed replica | ||
| auto* existing = metadata.GetReplicaBySegmentName(tgt_segment); | ||
| if (existing != nullptr && existing->is_completed()) { | ||
| return tl::make_unexpected(ErrorCode::OBJECT_ALREADY_EXISTS); | ||
| } | ||
|
|
||
| // Allocate a new replica on the target segment | ||
| Replica replica = [&]() -> Replica { | ||
| ScopedAllocatorAccess allocator_access = | ||
| segment_manager_.getAllocatorAccess(); | ||
| const auto& allocator_manager = allocator_access.getAllocatorManager(); | ||
|
|
||
| auto result = allocation_strategy_->AllocateFrom( | ||
| allocator_manager, metadata.size, tgt_segment); | ||
| if (!result.has_value()) { | ||
| return Replica(nullptr, ReplicaStatus::UNDEFINED); | ||
| } | ||
| return std::move(*result); | ||
| }(); | ||
|
|
||
| if (replica.status() == ReplicaStatus::UNDEFINED) { | ||
| LOG(ERROR) << "key=" << key << ", tgt_segment=" << tgt_segment | ||
| << ", failed to allocate replica for cache local"; | ||
| return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE); | ||
| } |
There was a problem hiding this comment.
MasterService::CacheLocalStart only treats an existing COMPLETED replica on tgt_segment as a conflict. If there is an existing replica on tgt_segment in PROCESSING/FAILED state (e.g., from a previous crashed cache_local attempt), this code will allocate an additional replica and leave the old one in metadata, potentially leaking memory/replicas. Consider revoking/cleaning up non-completed replicas on tgt_segment before allocating a new one, or reusing the existing replica when safe.
mooncake-store/include/local_cache.h
Outdated
| /** | ||
| * @brief Look up a key and return the full cache entry. | ||
| * @return pointer to CacheEntry if found, nullptr otherwise. | ||
| * The pointer is valid only while the caller holds no | ||
| * write lock (i.e., use under shared lock externally, | ||
| * or copy the entry). | ||
| */ | ||
| std::optional<CacheEntry> LookupEntry( | ||
| const std::string& key) const { | ||
| std::shared_lock<std::shared_mutex> lock(mutex_); | ||
| auto it = cache_.find(key); | ||
| if (it != cache_.end()) { | ||
| return it->second; | ||
| } | ||
| return std::nullopt; | ||
| } |
There was a problem hiding this comment.
The LookupEntry() docstring describes returning a pointer with lifetime constraints, but the function actually returns std::optional by value (a copy). Please update the comment to match the actual API to avoid misuse/confusion.
mooncake-store/src/real_client.cpp
Outdated
| if (size < total_size) { | ||
| LOG(ERROR) << "User buffer too small. Required: " << total_size | ||
| << ", provided: " << size; | ||
| return -1; |
There was a problem hiding this comment.
RealClient::get_into returns -1 when the user buffer is too small on a cache hit, but other error paths use to_py_ret(ErrorCode) for consistent Python-facing error codes. Consider returning the appropriate error code (e.g., INVALID_PARAMS) via to_py_ret to keep error handling consistent.
| return -1; | |
| return to_py_ret(ErrorCode::INVALID_PARAMS); |
mooncake-store/src/real_client.cpp
Outdated
| auto get_result = client_->Get(key, qr, slices); | ||
| if (get_result) { | ||
| return static_cast<int64_t>(total_size); | ||
| } | ||
| // Cache hit but transfer failed, fall through to normal path | ||
| LOG(WARNING) << "Local cache transfer failed for key: " << key | ||
| << ", falling back to normal get"; | ||
| } |
There was a problem hiding this comment.
In RealClient::get_into, the cache-hit transfer failure path falls back to the normal get but leaves the failing cache entry intact. If the local replica is corrupted/invalid, this will cause repeated warnings and repeated failed cache-hit attempts. Consider erasing (and possibly revoking) the local cache entry when a cache-hit transfer fails.
mooncake-store/src/real_client.cpp
Outdated
| // Cache the data locally for future calls | ||
| if (client_ && !local_cache_.Contains(key)) { | ||
| auto cache_result = | ||
| client_->CacheLocal(key, buffer, result.value()); | ||
| if (cache_result) { | ||
| auto alloc_result = | ||
| client_buffer_allocator_->allocate(result.value()); | ||
| if (alloc_result) { | ||
| memcpy(alloc_result->ptr(), buffer, result.value()); | ||
| auto handle = std::make_shared<BufferHandle>( | ||
| std::move(*alloc_result)); | ||
| local_cache_.Insert(key, handle, cache_result.value(), | ||
| result.value()); |
There was a problem hiding this comment.
Caching on get_into/batch_get_into copies from the user-provided buffer via memcpy into a newly allocated BufferHandle and also calls Client::CacheLocal which memcpy's from the same buffer into the store replica. If the user buffer can be GPU/device memory (the comment explicitly mentions GPU), these memcpy operations can crash or corrupt data. Either gate local_cache support to host memory only (with explicit checks) or perform device-aware copies via the transfer engine.
| // Cache the data locally for future calls | |
| if (client_ && !local_cache_.Contains(key)) { | |
| auto cache_result = | |
| client_->CacheLocal(key, buffer, result.value()); | |
| if (cache_result) { | |
| auto alloc_result = | |
| client_buffer_allocator_->allocate(result.value()); | |
| if (alloc_result) { | |
| memcpy(alloc_result->ptr(), buffer, result.value()); | |
| auto handle = std::make_shared<BufferHandle>( | |
| std::move(*alloc_result)); | |
| local_cache_.Insert(key, handle, cache_result.value(), | |
| result.value()); | |
| // Cache the data locally for future calls. | |
| // IMPORTANT: Do not read from the user-provided buffer here, as it may be | |
| // device (e.g., GPU) memory. Instead, allocate a host buffer, re-fetch | |
| // into it via the transfer engine, and cache from that host buffer. | |
| if (client_ && !local_cache_.Contains(key)) { | |
| auto alloc_result = | |
| client_buffer_allocator_->allocate(result.value()); | |
| if (alloc_result) { | |
| // Re-fetch the object into our own (host) buffer using the | |
| // existing transfer engine. This avoids memcpy from a potentially | |
| // device-resident user buffer. | |
| auto cache_fetch = | |
| get_into_internal(key, alloc_result->ptr(), result.value()); | |
| if (cache_fetch) { | |
| auto cache_result = | |
| client_->CacheLocal(key, alloc_result->ptr(), result.value()); | |
| if (cache_result) { | |
| auto handle = std::make_shared<BufferHandle>( | |
| std::move(*alloc_result)); | |
| local_cache_.Insert(key, handle, cache_result.value(), | |
| result.value()); | |
| } |
| if (target_desc.is_memory_replica()) { | ||
| auto& mem_desc = target_desc.get_memory_descriptor(); | ||
| void* target_addr = | ||
| reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_); |
There was a problem hiding this comment.
Client::CacheLocal memcpy's data_size bytes into the allocated replica address without validating that data_size <= allocated buffer size (mem_desc.buffer_descriptor.size_). Add a bounds check to prevent buffer overflows/corruption if the caller passes a mismatched size.
| reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_); | |
| reinterpret_cast<void*>(mem_desc.buffer_descriptor.buffer_address_); | |
| const size_t target_size = mem_desc.buffer_descriptor.size_; | |
| if (data_size > target_size) { | |
| LOG(ERROR) << "CacheLocal: data_size (" << data_size | |
| << ") exceeds allocated buffer size (" << target_size | |
| << ") for key=" << key; | |
| master_client_.CacheLocalRevoke(key, replica_id); | |
| return tl::unexpected(ErrorCode::INVALID_PARAMS); | |
| } |
mooncake-store/src/rpc_service.cpp
Outdated
| tl::expected<Replica::Descriptor, ErrorCode> | ||
| WrappedMasterService::CacheLocalStart(const UUID& client_id, | ||
| const std::string& key, | ||
| const std::string& tgt_segment) { | ||
| return execute_rpc( | ||
| "CacheLocalStart", | ||
| [&] { | ||
| return master_service_.CacheLocalStart(client_id, key, tgt_segment); | ||
| }, | ||
| [&](auto& timer) { | ||
| timer.LogRequest("client_id=", client_id, ", key=", key, | ||
| ", tgt_segment=", tgt_segment); | ||
| }, | ||
| [] {}, [] {}); | ||
| } | ||
|
|
||
| tl::expected<void, ErrorCode> WrappedMasterService::CacheLocalEnd( | ||
| const UUID& client_id, const std::string& key, ReplicaID replica_id) { | ||
| return execute_rpc( | ||
| "CacheLocalEnd", | ||
| [&] { | ||
| return master_service_.CacheLocalEnd(client_id, key, replica_id); | ||
| }, | ||
| [&](auto& timer) { | ||
| timer.LogRequest("client_id=", client_id, ", key=", key, | ||
| ", replica_id=", replica_id); | ||
| }, | ||
| [] {}, [] {}); | ||
| } | ||
|
|
||
| tl::expected<void, ErrorCode> WrappedMasterService::CacheLocalRevoke( | ||
| const UUID& client_id, const std::string& key, ReplicaID replica_id) { | ||
| return execute_rpc( | ||
| "CacheLocalRevoke", | ||
| [&] { | ||
| return master_service_.CacheLocalRevoke(client_id, key, replica_id); | ||
| }, | ||
| [&](auto& timer) { | ||
| timer.LogRequest("client_id=", client_id, ", key=", key, | ||
| ", replica_id=", replica_id); | ||
| }, | ||
| [] {}, [] {}); | ||
| } |
There was a problem hiding this comment.
WrappedMasterService::{CacheLocalStart,CacheLocalEnd,CacheLocalRevoke} pass empty metric callbacks to execute_rpc, unlike other RPC wrappers in this file which increment request/failure metrics. If metrics are expected for operational visibility, add corresponding counters in MasterMetricManager and hook them up here (or add a clear comment explaining why metrics are intentionally omitted).
| # Benchmark: cache miss vs hit latency | ||
| # ------------------------------------------ | ||
| print("\n--- Benchmark: get_buffer cache hit speedup ---") | ||
| bench_data = b"\xAB" * BENCH_DATA_SIZE |
There was a problem hiding this comment.
Variable bench_data is not used.
| bench_data = b"\xAB" * BENCH_DATA_SIZE |
scripts/test_local_cache.py
Outdated
| for _ in range(BENCH_ITERATIONS): | ||
| for key in BENCH_KEYS: | ||
| t0 = time.perf_counter() | ||
| buf = store.get_buffer(key, local_cache=True) |
There was a problem hiding this comment.
Variable buf is not used.
| buf = store.get_buffer(key, local_cache=True) | |
| buf = store.get_buffer(key, local_cache=True) | |
| assert buf is not None |
| print("\n--- Benchmark: batch_get_buffer cache hit speedup ---") | ||
| t0 = time.perf_counter() | ||
| bufs = store.batch_get_buffer(BENCH_KEYS, local_cache=True) | ||
| t_batch_hit = time.perf_counter() - t0 |
There was a problem hiding this comment.
Variable bufs is not used.
| t_batch_hit = time.perf_counter() - t0 | |
| t_batch_hit = time.perf_counter() - t0 | |
| assert bufs is not None | |
| assert len(bufs) == len(BENCH_KEYS) |
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a cache-on-get feature to improve performance for remote reads, with extensive changes across the codebase. However, two significant security vulnerabilities were identified: a potential Use-After-Free in client-side background thread management and a missing access control check in the master service. Additionally, a critical thread safety issue in the client-side caching implementation could lead to a crash on shutdown, and there's a minor inconsistency in metrics reporting for the new cache operation.
| tl::expected<CacheOnGetResponse, ErrorCode> MasterService::CacheOnGet( | ||
| const UUID& client_id, const std::string& key, | ||
| const std::string& local_segment) { |
There was a problem hiding this comment.
The MasterService::CacheOnGet function accepts a local_segment name from the client but fails to verify that this segment is actually owned by or associated with the requesting client_id. A malicious client could provide the name of a segment belonging to another node/client. The master would then proceed to allocate memory on that remote segment and return a descriptor (including access keys) to the attacker. This allows an attacker to exhaust memory on arbitrary nodes in the cluster or potentially gain unauthorized access to memory regions on other nodes.
Implement a check in MasterService::CacheOnGet to verify that the local_segment provided in the request is owned by the client identified by client_id.
| [] { MasterMetricManager::instance().inc_copy_start_requests(); }, | ||
| [] { MasterMetricManager::instance().inc_copy_start_failures(); }); |
There was a problem hiding this comment.
The metrics being incremented here (inc_copy_start_requests, inc_copy_start_failures) appear to be for a CopyStart operation, but this function is for CacheOnGet. This could lead to incorrect metrics. It would be better to add and use dedicated metrics for CacheOnGet to ensure accurate monitoring. For example (assuming new metric names):
| [] { MasterMetricManager::instance().inc_copy_start_requests(); }, | |
| [] { MasterMetricManager::instance().inc_copy_start_failures(); }); | |
| [] { MasterMetricManager::instance().inc_cache_on_get_requests(); }, | |
| [] { MasterMetricManager::instance().inc_cache_on_get_failures(); }); |
Description
--- Benchmark: Data size sweep (get_buffer) ---
Size | Avg Miss | Avg Hit | Speedup | Throughput
---------+------------+------------+----------+-------------
4 KB | 0.292ms | 0.081ms | 3.6x | 0.05 GB/s
64 KB | 0.227ms | 0.091ms | 2.5x | 0.67 GB/s
256 KB | 0.254ms | 0.108ms | 2.4x | 2.26 GB/s
1 MB | 0.573ms | 0.266ms | 2.2x | 3.67 GB/s
4 MB | 1.757ms | 0.865ms | 2.0x | 4.52 GB/s
--- Benchmark: Data size sweep (get_into) ---
Size | Avg Miss | Avg Hit | Speedup | Throughput
---------+------------+------------+----------+-------------
4 KB | 0.136ms | 0.079ms | 1.7x | 0.05 GB/s
64 KB | 0.093ms | 0.086ms | 1.1x | 0.71 GB/s
256 KB | 0.111ms | 0.108ms | 1.0x | 2.27 GB/s
1 MB | 0.273ms | 0.266ms | 1.0x | 3.67 GB/s
4 MB | 0.848ms | 0.865ms | 1.0x | 4.52 GB/s
Type of Change
How Has This Been Tested?
Checklist
./scripts/code_format.shbefore submitting.