Skip to content

Comments

[Store][Feature] Add CXL storage for mooncake_store#1365

Merged
stmatengss merged 5 commits intokvcache-ai:mainfrom
qiuweit7:mooncake_qwt
Jan 28, 2026
Merged

[Store][Feature] Add CXL storage for mooncake_store#1365
stmatengss merged 5 commits intokvcache-ai:mainfrom
qiuweit7:mooncake_qwt

Conversation

@qiuweit7
Copy link
Contributor

Description

This is the first PR for the CXL feature.
It introduces Mooncake storage backed by CXL shared memory: the Master centrally manages and allocates CXL segments, currently using CacheLib as the default allocator. Clients can read and write the CXL device through the Put/Get APIs, and BatchPut/BatchGet interfaces are also provided.

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @qiuweit7, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for CXL shared memory within the Mooncake Store. It enables the master service to efficiently manage and allocate CXL segments, leveraging CacheLib for memory management. The client-side interfaces are updated to transparently utilize CXL devices, ensuring that existing data manipulation APIs function correctly with this new high-speed memory. The changes also include new configuration options for CXL and a dedicated test suite to verify its integration and functionality.

Highlights

  • CXL Storage Integration: Mooncake Store now supports Compute Express Link (CXL) shared memory as a storage backend, enabling high-performance memory access for data operations.
  • Centralized CXL Memory Management: The master service has been enhanced to centrally manage and allocate CXL segments, utilizing CacheLib as the default allocator for these segments.
  • Client API Compatibility for CXL: Existing client Put/Get and BatchPut/BatchGet APIs are extended to work seamlessly with CXL devices, abstracting the underlying memory technology from the application layer.
  • CXL-Specific Configuration and Control: New configuration parameters (cxl_path, cxl_size, enable_cxl) are introduced for the master service, allowing fine-grained control over CXL memory setup and usage.
  • Dedicated CXL Integration Testing: A new integration test suite has been added to thoroughly validate the CXL client functionality, ensuring correct behavior of CXL-backed storage operations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces CXL storage support for mooncake_store, which is a significant new feature. The changes span across configuration, allocation strategies, client and master services, and the transfer engine to accommodate the CXL memory protocol. While the overall approach is sound, I've identified several critical syntax errors that will prevent compilation, as well as some logical issues in the new CXL allocation strategy. Additionally, a potential regression in hugepage support for non-CXL protocols was noted. The newly added integration tests for CXL are a great addition, but they should be made self-contained to improve portability and reliability.

Comment on lines 362 to 414
class CxlAllocationStrategy : public AllocationStrategy {
public:
CxlAllocationStrategy() = default;
tl::expected<std::vector<Replica>, ErrorCode> Allocate(
const AllocatorManager& allocator_manager, const size_t slice_length,
const size_t replica_num = 1,
const std::vector<std::string>& preferred_segments =
std::vector<std::string>(),
const std::set<std::string>& excluded_segments =
std::set<std::string>()) {
if (slice_length == 0 || replica_num == 0) {
return tl::make_unexpected(ErrorCode::INVALID_PARAMS);
}

if (preferred_segments.empty()) {
LOG(ERROR) << "Preferred_segments is empty.";
return tl::make_unexpected(ErrorCode::INVALID_PARAMS);
}

const std::string& cxl_segment_name = preferred_segments[0];
// if (excluded_segments.contains(cxl_segment_name)) {
// return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
// }
VLOG(1) << "Do cxl allocate, overwriten segment=" << cxl_segment_name;

const auto cxl_allocators = allocator_manager.getAllocators(cxl_segment_name);

if (cxl_allocators == nullptr || cxl_allocators->size() == 0) {
return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
}
std::shared_ptr<BufferAllocatorBase> cxl_allocator = (*cxl_allocators)[0];
if (!cxl_allocator) {
LOG(ERROR) << "No CXL allocator in preferred_segment";
return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
}

std::vector<Replica> replicas;
replicas.reserve(replica_num);

auto buffer = cxl_allocator->allocate(slice_length);
if(buffer){
buffer->change_to_cxl(cxl_segment_name);
}
replicas.emplace_back(std::move(buffer),
ReplicaStatus::PROCESSING);

if (replicas.empty()) {
return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
}
VLOG(1) << "Successfully allocated " << replicas.size() << " CXL replica.";
return replicas;
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The class CxlAllocationStrategy does not implement the pure virtual function AllocateFrom from its base class AllocationStrategy. This will result in a compilation error. You should add an implementation for AllocateFrom, even if it's just to return an 'unsupported' error for now.

Comment on lines 462 to 464
MasterServiceConfigBuilder& set_processing_task_timeout_sec(uint64_t sec) {
processing_task_timeout_sec_ = sec;
MasterServiceConfigBuilder& set_cxl_path(const std::string& path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a syntax error here. The method set_processing_task_timeout_sec is missing its closing brace } and a return *this; statement. This will cause a compilation error.

    MasterServiceConfigBuilder& set_processing_task_timeout_sec(uint64_t sec) {
        processing_task_timeout_sec_ = sec;
        return *this;
    }

    MasterServiceConfigBuilder& set_cxl_path(const std::string& path) {

Comment on lines 37 to 40
task_manager_(config.task_manager_config) {
cxl_path_(config.cxl_path),
cxl_size_(config.cxl_size),
enable_cxl_(config.enable_cxl) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a syntax error in the constructor's member initializer list. The CXL-related members are being initialized inside the constructor body instead of the initializer list, which is incorrect syntax.

Suggested change
task_manager_(config.task_manager_config) {
cxl_path_(config.cxl_path),
cxl_size_(config.cxl_size),
enable_cxl_(config.enable_cxl) {
task_manager_(config.task_manager_config),
cxl_path_(config.cxl_path),
cxl_size_(config.cxl_size),
enable_cxl_(config.enable_cxl) {

Comment on lines 401 to 412
auto buffer = cxl_allocator->allocate(slice_length);
if(buffer){
buffer->change_to_cxl(cxl_segment_name);
}
replicas.emplace_back(std::move(buffer),
ReplicaStatus::PROCESSING);

if (replicas.empty()) {
return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
}
VLOG(1) << "Successfully allocated " << replicas.size() << " CXL replica.";
return replicas;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for handling allocation failure and replica_num seems incorrect:

  1. The replica_num parameter is not used to create multiple replicas. The code always creates one replica. If replica_num is intended to be always 1 for CXL, there should be an assertion or check.
  2. If cxl_allocator->allocate(slice_length) returns a null pointer, a Replica is still created with a null buffer and pushed into replicas. The function then returns a vector containing an invalid replica, which is likely a bug.
  3. The check if (replicas.empty()) is redundant as a Replica is always pushed into the vector.

The suggested change corrects the handling of allocation failures.

Suggested change
auto buffer = cxl_allocator->allocate(slice_length);
if(buffer){
buffer->change_to_cxl(cxl_segment_name);
}
replicas.emplace_back(std::move(buffer),
ReplicaStatus::PROCESSING);
if (replicas.empty()) {
return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
}
VLOG(1) << "Successfully allocated " << replicas.size() << " CXL replica.";
return replicas;
auto buffer = cxl_allocator->allocate(slice_length);
if (!buffer) {
return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
}
buffer->change_to_cxl(cxl_segment_name);
replicas.emplace_back(std::move(buffer),
ReplicaStatus::PROCESSING);
VLOG(1) << "Successfully allocated " << replicas.size() << " CXL replica.";
return replicas;

Comment on lines 287 to 317
} else {
auto max_mr_size = globalConfig().max_mr_size; // Max segment size
uint64_t total_glbseg_size = global_segment_size; // For logging
uint64_t current_glbseg_size = 0; // For logging
while (global_segment_size > 0) {
size_t segment_size = std::min(global_segment_size, max_mr_size);
global_segment_size -= segment_size;
current_glbseg_size += segment_size;
LOG(INFO) << "Mounting segment: " << segment_size << " bytes, "
<< current_glbseg_size << " of " << total_glbseg_size;
void *ptr =allocate_buffer_allocator_memory(segment_size, this->protocol);
if (!ptr) {
LOG(ERROR) << "Failed to allocate segment memory";
return tl::unexpected(ErrorCode::INVALID_PARAMS);
}
if (this->protocol == "ascend") {
ascend_segment_ptrs_.emplace_back(ptr);
} else {
segment_ptrs_.emplace_back(ptr);
}
auto mount_result = client_->MountSegment(ptr, segment_size, protocol);
if (!mount_result.has_value()) {
LOG(ERROR) << "Failed to mount segment: "
<< toString(mount_result.error());
return tl::unexpected(mount_result.error());
}
}
if (total_glbseg_size == 0) {
LOG(INFO) << "Global segment size is 0, skip mounting segment";
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The refactoring of setup_internal seems to have removed support for huge pages for non-CXL protocols. The original code had logic to handle should_use_hugepage, but it's missing in the new version. This might be an unintentional regression. If huge page support is still desired for non-CXL protocols, the logic should be restored.

DEFINE_string(cxl_device_name, "/dev/dax0.0", "Device name for cxl");
DEFINE_uint64(cxl_device_size, 8589934592, "Device Size for cxl");
DEFINE_bool(auto_disc, false, "Auto discover tcp devices");
DEFINE_string(transfer_engine_metadata_url, "etcd://10.130.5.132:2379", "Metadata connection string for transfer engine");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new test cxl_client_integration_test.cpp contains hardcoded IP addresses for services like etcd. This makes the test dependent on an external environment and not portable. It would be better to make the test self-contained, for example by using InProcMaster as done in client_integration_test.cpp to run a master service within the test process.

@magicYang1573
Copy link

Good feature!

@qiuweit7 qiuweit7 force-pushed the mooncake_qwt branch 2 times, most recently from 9f4fa4b to 6366b88 Compare January 14, 2026 02:05
void setAutoDiscover(bool auto_discover);

void* getBaseAddr();
// std::shared_ptr<TransferMetadata> getMetadata() { return metadata_; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this comment is better.

@ykwd
Copy link
Collaborator

ykwd commented Jan 14, 2026

Thanks for this work. Could you share more details about the system design? At the moment, most reviewers do not have access to CXL memory, so they are unable to run tests. When reviewing the code alone, there are many points that are confusing. For example, is there only a single global segment, and why does the client need to mount a segment?

@stmatengss stmatengss requested a review from Copilot January 16, 2026 03:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds CXL (Compute Express Link) storage support to Mooncake Store, enabling shared memory-backed storage with centralized segment management. The Master service manages CXL allocation using CacheLib, while clients can access CXL devices through Put/Get and BatchPut/BatchGet APIs.

Changes:

  • Added CXL transport integration in the transfer engine layer with base address retrieval
  • Implemented CXL-specific allocation strategy and segment management on the Master
  • Extended Client API to support CXL protocol with segment mounting and data operations

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
mooncake-transfer-engine/src/transfer_engine.cpp Added getBaseAddr() method for CXL base address retrieval
mooncake-transfer-engine/src/multi_transport.cpp Implemented CXL transport base address extraction
mooncake-transfer-engine/include/transfer_engine_impl.h Exposed getBaseAddr() through implementation layer
mooncake-transfer-engine/include/transfer_engine.h Added public getBaseAddr() API
mooncake-transfer-engine/include/multi_transport.h Declared getBaseAddr() method
mooncake-store/tests/cxl_client_integration_test.cpp Added integration tests for CXL client operations
mooncake-store/tests/client_integration_test.cpp Updated MountSegment calls to include protocol parameter
mooncake-store/tests/CMakeLists.txt Registered new CXL integration test
mooncake-store/src/segment.cpp Implemented CXL segment mounting and global allocator initialization
mooncake-store/src/real_client.cpp Added CXL-specific segment setup logic
mooncake-store/src/master_service.cpp Integrated CXL allocator and strategy initialization
mooncake-store/src/master.cpp Added CXL configuration flags and initialization
mooncake-store/src/client_service.cpp Added CXL transport installation and base address retrieval
mooncake-store/src/allocator.cpp Implemented CXL offset handling and deallocation logic
mooncake-store/include/types.h Added CXL constants and protocol field to Segment
mooncake-store/include/segment.h Added CXL allocator support to SegmentManager
mooncake-store/include/master_service.h Added CXL configuration fields
mooncake-store/include/master_config.h Extended configuration structures with CXL parameters
mooncake-store/include/client_service.h Updated MountSegment signature and added GetBaseAddr()
mooncake-store/include/allocator.h Added CXL conversion methods and protocol field to AllocatedBuffer
mooncake-store/include/allocation_strategy.h Implemented CxlAllocationStrategy
extern/pybind11 Updated pybind11 submodule commit

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

void setAutoDiscover(bool auto_discover);

void* getBaseAddr();
// std::shared_ptr<TransferMetadata> getMetadata() { return metadata_; }
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should be removed rather than left in the codebase. If this functionality is planned for future use, track it in an issue instead.

Suggested change
// std::shared_ptr<TransferMetadata> getMetadata() { return metadata_; }

Copilot uses AI. Check for mistakes.
Comment on lines 265 to 266
// std::this_thread::sleep_for(std::chrono::seconds(30));

Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out debug code before merging. If this sleep is needed for debugging, consider using a configurable flag instead.

Suggested change
// std::this_thread::sleep_for(std::chrono::seconds(30));

Copilot uses AI. Check for mistakes.
FLAGS_transfer_engine_metadata_url, // Metadata connection string
FLAGS_protocol, // Transfer protocol
std::nullopt, // RDMA device names (auto-discovery)
// master_address_ // Master server address (non-HA)
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out code blocks. If these represent alternative configuration approaches, document them separately or remove entirely.

Suggested change
// master_address_ // Master server address (non-HA)

Copilot uses AI. Check for mistakes.
Comment on lines 126 to 131
// Start an in-process non-HA master without HTTP metadata server
// ASSERT_TRUE(master_.Start(InProcMasterConfigBuilder().build()));
// master_address_ = master_.master_address();
// metadata_url_ = master_.metadata_url();
// LOG(INFO) << "Started in-proc master at " << master_address_
// << ", metadata=P2PHANDSHAKE";
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out code blocks. If these represent alternative configuration approaches, document them separately or remove entirely.

Suggested change
// Start an in-process non-HA master without HTTP metadata server
// ASSERT_TRUE(master_.Start(InProcMasterConfigBuilder().build()));
// master_address_ = master_.master_address();
// metadata_url_ = master_.metadata_url();
// LOG(INFO) << "Started in-proc master at " << master_address_
// << ", metadata=P2PHANDSHAKE";
// Tests expect an already running master/metadata service configured via flags.

Copilot uses AI. Check for mistakes.
#include "allocation_strategy.h"
#include "allocator.h"
#include "types.h"
// #include "master_config.h"
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out include directive. If the include is not needed, it should be deleted completely.

Suggested change
// #include "master_config.h"

Copilot uses AI. Check for mistakes.
Comment on lines 382 to 386
// if (excluded_segments.contains(cxl_segment_name)) {
// return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
// }
VLOG(1) << "Do cxl allocate, overwritten segment=" << cxl_segment_name;

Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out code. If excluding segments functionality is planned for future implementation, add a TODO comment or track it in an issue instead.

Suggested change
// if (excluded_segments.contains(cxl_segment_name)) {
// return tl::make_unexpected(ErrorCode::NO_AVAILABLE_HANDLE);
// }
VLOG(1) << "Do cxl allocate, overwritten segment=" << cxl_segment_name;
// TODO: If excluded_segments support is implemented for CXL allocation,
// ensure cxl_segment_name is not excluded before proceeding.
VLOG(1) << "Do cxl allocate, overwritten segment=" << cxl_segment_name;

Copilot uses AI. Check for mistakes.
}

void AllocatedBuffer::change_to_cxl(std::string client_segment_name) {
u_int64_t offset_raw = reinterpret_cast<uintptr_t>(buffer_ptr_);
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use standard type 'uint64_t' instead of non-standard 'u_int64_t' for consistency with the rest of the codebase.

Copilot uses AI. Check for mistakes.
}

void* AllocatedBuffer::get_vaddr_from_cxl() {
u_int64_t offset_raw = reinterpret_cast<uintptr_t>(buffer_ptr_);
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use standard type 'uint64_t' instead of non-standard 'u_int64_t' for consistency with the rest of the codebase.

Copilot uses AI. Check for mistakes.
@qiuweit7 qiuweit7 closed this Jan 21, 2026
@qiuweit7
Copy link
Contributor Author

Thanks for this work. Could you share more details about the system design? At the moment, most reviewers do not have access to CXL memory, so they are unable to run tests. When reviewing the code alone, there are many points that are confusing. For example, is there only a single global segment, and why does the client need to mount a segment?

Thank you for your valuable feedback and questions! Here is a detailed introduction to the system design to address the points you raised:
CXL is a high-speed interconnect protocol based on PCIe for high-performance computing. We leverage CXL to extend pooled memory and enable memory sharing across multiple hosts. Currently, Mooncake Store is a distributed storage based on network (TCP/CDMA) to achieve data sharing. With the shared nature of CXL pooled memory and ultra-low latency of memory-semantics access, CXL can serve as a new storage medium. To this end, we have completed the following work:
First, we implement CXL shared memory as an independent tier in the store, with cxl_transport (#670) serving as the client-side transfer engine. In cxl-store, the Master maintains the offsets of actual data in the CXL memory. It holds a globally unified Allocator for CXL memory allocation. Different clients mmap the CXL shared memory into the local address space via the Mount operation (CXL memory exists as the character device /dev/dax0.0 in the OS) and maintain a BaseAddr. Meanwhile, they associate the current Segment with the global CXL Allocator. Eventually, clients achieve data put/get operations by accessing the identical CXL memory offset address (BaseAddr + offset).
image
Future Plans: Based on the aforementioned work, we have also implemented the tiered storage feature for Mooncake Store, forming a three-tier DRAM-CXL-SSD architecture. To this end, we have delivered features such as multi-protocol support on the client side and hot/cold data promotion/demotion between DRAM and CXL. If you are interested, we will proceed to open-source these features in the future.

@stmatengss stmatengss reopened this Jan 21, 2026
Comment on lines +274 to +304
LOG(FATAL) << "MC_CXL_DEV_SIZE not set";
return tl::unexpected(ErrorCode::INVALID_PARAMS);
}
if (this->protocol == "ascend") {
ascend_segment_ptrs_.emplace_back(ptr);
} else if (should_use_hugepage) {
hugepage_segment_ptrs_.emplace_back(
ptr, HugepageSegmentDeleter{mapped_size});
} else {
segment_ptrs_.emplace_back(ptr);
}
auto mount_result = client_->MountSegment(ptr, mapped_size);

void *ptr = client_->GetBaseAddr();
LOG(INFO) << "Mounting CXL segment: " << cxl_dev_size << " bytes, "
<< ptr;
auto mount_result = client_->MountSegment(ptr, cxl_dev_size, protocol);
if (!mount_result.has_value()) {
LOG(ERROR) << "Failed to mount segment: "
<< toString(mount_result.error());
return tl::unexpected(mount_result.error());
}
}
if (total_glbseg_size == 0) {
LOG(INFO) << "Global segment size is 0, skip mounting segment";

} else {
auto max_mr_size = globalConfig().max_mr_size; // Max segment size
uint64_t total_glbseg_size = global_segment_size; // For logging
uint64_t current_glbseg_size = 0; // For logging
while (global_segment_size > 0) {
size_t segment_size = std::min(global_segment_size, max_mr_size);
global_segment_size -= segment_size;
current_glbseg_size += segment_size;
LOG(INFO) << "Mounting segment: " << segment_size << " bytes, "
<< current_glbseg_size << " of " << total_glbseg_size;

size_t mapped_size = segment_size;
void *ptr = nullptr;
if (should_use_hugepage) {
mapped_size =
align_up(segment_size, get_hugepage_size_from_env());
ptr = allocate_buffer_mmap_memory(mapped_size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase

@stmatengss stmatengss reopened this Jan 23, 2026
@XucSh XucSh added run-ci and removed run-ci labels Jan 23, 2026
@qiuweit7 qiuweit7 force-pushed the mooncake_qwt branch 2 times, most recently from d705b26 to 6926deb Compare January 27, 2026 11:31
@stmatengss stmatengss requested a review from Ann-1024 as a code owner January 27, 2026 13:30
@codecov-commenter
Copy link

codecov-commenter commented Jan 27, 2026

@qiuweit7 qiuweit7 force-pushed the mooncake_qwt branch 2 times, most recently from a793cd3 to 63bd273 Compare January 28, 2026 02:05
option(USE_UBSHMEM "option for using ascend npu with shmem" OFF)
option(USE_ASCEND_HETEROGENEOUS "option for transferring between ascend npu and gpu" OFF)
option(USE_MNNVL "option for using Multi-Node NVLink transport" OFF)
option(USE_CXL "option for using CXL protocol" OFF)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to enable it for all scenarios. Just test it in CI.


// From WrappedMasterServiceConfig
MasterServiceConfig(const WrappedMasterServiceConfig& config) {
auto cxl_allocator_type = BufferAllocatorType::CACHELIB;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CXL, we must use CACHELIB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, using CACHELIB for CXL is just a temporary design choice.

Comment on lines 312 to 317
VLOG(1) << "Init CXL global allocator.";
VLOG(1) << "[CXL] create allocator with "
<< "path=" << cxl_path << " base=0x" << std::hex << DEFAULT_CXL_BASE
<< std::dec << " size=" << cxl_size << " (" << std::fixed
<< std::setprecision(2) << cxl_size / (1024.0 * 1024 * 1024)
<< " GB)";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use LOG(INFO) instead?

Copy link
Collaborator

@stmatengss stmatengss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@stmatengss stmatengss merged commit 67d0afc into kvcache-ai:main Jan 28, 2026
16 checks passed
JasonZhang517 pushed a commit to JasonZhang517/Mooncake that referenced this pull request Feb 9, 2026
* [Store] feat: add cxl storage for mooncake store

* Update extern/pybind11 to match main

* fix: use fake cxl device to bypass ci-test error

* Fix code formatting in segment.cpp

---------

Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants