Skip to content

Comments

[TransferEngine] feat: Support CXL shared memory, and provide simple unit tests.#670

Merged
stmatengss merged 11 commits intokvcache-ai:mainfrom
hemist:cxl-engine-pr
Aug 1, 2025
Merged

[TransferEngine] feat: Support CXL shared memory, and provide simple unit tests.#670
stmatengss merged 11 commits intokvcache-ai:mainfrom
hemist:cxl-engine-pr

Conversation

@hemist
Copy link
Contributor

@hemist hemist commented Jul 25, 2025

This PR providesd a transfer-engine framework based on CXL shared memory. In the current implementation, submitTransfer used memcpy() for data transfer. Future work may switch to DSA to avoid CPU cache flushes.

@stmatengss stmatengss requested review from Copilot and stmatengss July 25, 2025 07:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces CXL (Compute Express Link) shared memory support to the transfer-engine framework by implementing a new CxlTransport class and providing basic unit tests. The implementation uses memcpy() for data transfer with plans to potentially upgrade to DSA (Data Streaming Accelerator) in the future.

Key changes include:

  • Implementation of CxlTransport class with CXL device initialization, memory mapping, and data transfer capabilities
  • Extension of metadata system to support CXL segment descriptors and buffer management
  • Addition of comprehensive unit tests for CXL transport operations

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
cxl_transport.cpp Core implementation of CXL transport with device initialization, memory mapping, and transfer operations
cxl_transport.h Header defining CxlTransport class interface and CXL-specific functionality
cxl_transport_test.cpp Unit tests for CXL transport operations including multi-write and multi-read scenarios
transfer_metadata.cpp Extended metadata encoding/decoding to support CXL segment descriptors
transfer_metadata.h Added CXL-specific fields to metadata structures
CMakeLists.txt Build configuration for CXL transport test
multi_transport.cpp Integration of CXL transport into multi-transport system
transfer_engine.cpp Auto-discovery setup for CXL transport
transport.h Updated Slice structure for CXL transport
common.cmake Build system support for CXL feature flag

@stmatengss stmatengss self-assigned this Jul 27, 2025
cxl_dev_path = std::string(env_cxl_dev_path);
cxl_dev_size = cxlGetDeviceSize(cxl_dev_path);
} else {
cxl_dev_path = "/dev/dax0.0";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is hard-coded. Can you modify it to support auto-detection? If no CXL device is found, it should trigger error handling.

#else

#ifdef USE_CXL
multi_transports_->installTransport("cxl", local_topology_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to check its return status?

Comment on lines 71 to 76
while (fgets(line, sizeof(line), fp) != NULL) {
char *size_str = strstr(line, "\"size\":");
if (size_str) {
size_str += 7; // skip over "\"size\":"

while (*size_str && (*size_str < '0' || *size_str > '9')) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not calculate its size using posix API?

LOG(ERROR) << "CxlTransport::cxlMemcpy invalid arguments: null pointer provided.";
return -1; // null pointer
}
std::memcpy(dest, src, size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is the same address, don't require memcpy

Copy link
Collaborator

@stmatengss stmatengss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass Tests. LGTM

const char* env_cxl_dev_path = std::getenv("MC_CXL_DEV_PATH");

CxlTransport::~CxlTransport() {}
LOG(INFO) << "MC_CXL_DEV_PATH: " << env_cxl_dev_path;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print it only when MC_CXL_DEV_PATH is set?

// for now, get cxl_shm size from env
const char* env_cxl_dev_size = std::getenv("MC_CXL_DEV_SIZE");

LOG(INFO) << "MC_CXL_DEV_SIZE: " << env_cxl_dev_size;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as line R44

TransferStatus &status) {
return 0;
CxlTransport::~CxlTransport() {
munmap(cxl_base_addr, cxl_dev_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check cxl_base_addr sanity before munmap it


int CxlTransport::cxlDevInit()
{
int fd = open(cxl_dev_path, O_RDWR);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check cxl_dev_path sanity

@staryxchen
Copy link
Collaborator

Hi @stmatengss
I have reviewed but forget to submit, sry. Could you modify it a little more?

@stmatengss
Copy link
Collaborator

Hi @stmatengss I have reviewed but forget to submit, sry. Could you modify it a little more?

Sure. Any reviews are welcome.

@stmatengss stmatengss merged commit 7cdae81 into kvcache-ai:main Aug 1, 2025
10 checks passed
wanyue-wy pushed a commit to wanyue-wy/Mooncake that referenced this pull request Dec 14, 2025
…unit tests. (kvcache-ai#670)

* [TransgerEngine] Add cxl transport test

* [Transfer Engine] Add cxl transfer support

* [TransferEngine] Update the way of obtaining the size of CXL memory from daxctl to env variable && fix some bugs.

* [CXL transport] refactor code & add test

* [TransferEngine] fix: modify the parameters of submitTransferTask() to match the base class 'Transport'.

* [TransferEngine] fix: update etcd address

* [TransferEngine] fix: unit tests error caused by cxl_transport

* Update mooncake-common/common.cmake

* [TransferEngine] fix: add some validity checks.

---------

Co-authored-by: YiHong Lian <lianyihong@ieisystem.com>
Co-authored-by: Karl Zhao <zhaozhiyuan05@ieisystem.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
Co-authored-by: Teng Ma <805522925@qq.com>
JasonZhang517 pushed a commit to JasonZhang517/Mooncake that referenced this pull request Feb 9, 2026
…unit tests. (kvcache-ai#670)

* [TransgerEngine] Add cxl transport test

* [Transfer Engine] Add cxl transfer support

* [TransferEngine] Update the way of obtaining the size of CXL memory from daxctl to env variable && fix some bugs.

* [CXL transport] refactor code & add test

* [TransferEngine] fix: modify the parameters of submitTransferTask() to match the base class 'Transport'.

* [TransferEngine] fix: update etcd address

* [TransferEngine] fix: unit tests error caused by cxl_transport

* Update mooncake-common/common.cmake

* [TransferEngine] fix: add some validity checks.

---------

Co-authored-by: YiHong Lian <lianyihong@ieisystem.com>
Co-authored-by: Karl Zhao <zhaozhiyuan05@ieisystem.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
Co-authored-by: Teng Ma <805522925@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants