[Store] feat: CXL storage full features.#1531
[Store] feat: CXL storage full features.#1531qiuweit7 wants to merge 15 commits intokvcache-ai:mainfrom
Conversation
Summary of ChangesHello @qiuweit7, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the CXL storage capabilities by introducing comprehensive test coverage for its core functionalities. It ensures the reliability of zero-copy data transfers and the memory eviction process, which are critical for efficient CXL memory management. The changes also refine how memory usage is reported, providing clearer insights into system performance. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This PR completes the CXL storage functionality by adding test verification for zero-copy operations and the eviction mechanism. The changes are generally well-implemented. My review includes two main points: first, the C++ integration test for eviction should be improved to actually verify that keys are evicted. Second, the new Python tests contain commented-out code that should be removed for better clarity.
| } | ||
| } | ||
| LOG(INFO) << "Test finished, put_data=" << test_put_size / 1048576 << "MB"; | ||
| } |
There was a problem hiding this comment.
The EvictOperation test is incomplete as it doesn't verify that any keys were actually evicted. This makes it difficult to confirm that the eviction mechanism is working correctly. Adding assertions to check for evicted keys would make this test more robust.
LOG(INFO) << "Test finished, put_data=" << test_put_size / 1048576 << "MB";
// Verify that some keys have been evicted.
// It may take some time for eviction to complete.
std::this_thread::sleep_for(std::chrono::seconds(5));
bool eviction_happened = false;
for (uint64_t i = 0; i < 100; ++i) {
std::string key = "evict_key_" + std::to_string(i);
auto exist_result = test_client_->IsExist(key);
ASSERT_TRUE(exist_result.has_value());
if (!exist_result.value()) {
eviction_happened = true;
LOG(INFO) << "Key " << key << " was evicted as expected.";
break;
}
}
ASSERT_TRUE(eviction_happened)
<< "No keys were evicted, the eviction mechanism might not be working.";
}| # Register the buffer for zero-copy operations | ||
| # result = self.store.register_buffer(buffer_ptr, buffer_size) | ||
| # self.assertEqual(result, 0, "Buffer registration should succeed") |
There was a problem hiding this comment.
The calls to register_buffer are commented out in this test and other zero-copy tests (test_batch_get_into_operations, test_batch_put_from_operations). If buffer registration is not required for CXL zero-copy operations, these commented lines and their corresponding unregister_buffer calls in cleanup sections should be removed to improve code clarity. If registration is conditionally required, a comment explaining why it's disabled here would be helpful.
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
merge main: sync latest storage_backend_test fix
There was a problem hiding this comment.
Please delete these changes
Code Review: CXL Storage Full FeaturesOverviewThis PR completes the CXL storage feature implementation with comprehensive testing for zero-copy operations and eviction mechanism. Overall, this is a solid PR with excellent test coverage that significantly enhances CXL support. ✅ Strengths
|
| MasterMetricManager::instance().inc_total_mem_capacity(segment.name, | ||
| size); |
There was a problem hiding this comment.
Why delete these two lines?
|
Don't merge it until pybind dependency is resolved. |
…vcache-ai#1523) When building with USE_EFA=ON, auto_discover is disabled to prevent RDMA transport installation (QP creation fails on EFA devices). This means TCP transport is also not installed automatically. Add explicit TCP transport installation for non-EFA protocols in the EFA build path. Documentation changes: - build.md: Add USE_EFA option and clarify USE_CUDA default/purpose - supported-protocols.md: Add EFA as a supported protocol - efa_transport.md: Add USE_CUDA=ON to build command, document GPU memory requirement Co-authored-by: whn09 <whn09@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Teng Ma <teng-ma@linux.alibaba.com>
… add delete safety (kvcache-ai#1456)
…ai#1226) * feat(Store): add local hot cache for client * feat(Store): add client local hot cache log to show performance * fix: local hot cache initialize bug * fix(Store): Mooncake put slice is max 16MB, so make local hot cache block 16MB * feat(Store): move local hot cache initialization to Client::Create * feat(Store): local hot cache remove unused small block implementation * feat(Store): add client local hot cache unit test * fix(Store): modify client local hot cache suit with v0.3.7 * feat(Store): change local hot cache unit tes * fix: initialize local hot cache with negative value * feat: use in process master and metadata fro local hot cache unit test. * feat: update local hot cache to one replica one slice version * fix: local hot cache unit test use in process master service * fix: code style fix * fix: fix dirty read when client wants to read a previously hitted hot block but the hot block is modified by incoming put actions * fix: local hot cache unit test use in process master service * fix: code format fix * fix: fix comment problems for * feat: add local hot asynchronous queue size limit * fix: local hot cache task involves the block so that there is no memcpy operation when inserting local hot cache * fix: code check fix * fix: update block in_use prop to reference count --------- Co-authored-by: shichangzhang064 <zhangshichang@h-partners.com>
Description
This is the second PR for the CXL feature, which mainly completes the CXL storage function (based on the first submitted PR PR #1365). The main changes introduced by this PR:
get_into/put_fromoperations, batch zero-copy operations, and others.Type of Change
How Has This Been Tested?
Checklist
./scripts/code_format.shbefore submitting.