[TransferEngine] feat: Support CXL shared memory, and provide simple unit tests.#670
[TransferEngine] feat: Support CXL shared memory, and provide simple unit tests.#670stmatengss merged 11 commits intokvcache-ai:mainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces CXL (Compute Express Link) shared memory support to the transfer-engine framework by implementing a new CxlTransport class and providing basic unit tests. The implementation uses memcpy() for data transfer with plans to potentially upgrade to DSA (Data Streaming Accelerator) in the future.
Key changes include:
- Implementation of CxlTransport class with CXL device initialization, memory mapping, and data transfer capabilities
- Extension of metadata system to support CXL segment descriptors and buffer management
- Addition of comprehensive unit tests for CXL transport operations
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| cxl_transport.cpp | Core implementation of CXL transport with device initialization, memory mapping, and transfer operations |
| cxl_transport.h | Header defining CxlTransport class interface and CXL-specific functionality |
| cxl_transport_test.cpp | Unit tests for CXL transport operations including multi-write and multi-read scenarios |
| transfer_metadata.cpp | Extended metadata encoding/decoding to support CXL segment descriptors |
| transfer_metadata.h | Added CXL-specific fields to metadata structures |
| CMakeLists.txt | Build configuration for CXL transport test |
| multi_transport.cpp | Integration of CXL transport into multi-transport system |
| transfer_engine.cpp | Auto-discovery setup for CXL transport |
| transport.h | Updated Slice structure for CXL transport |
| common.cmake | Build system support for CXL feature flag |
mooncake-transfer-engine/src/transport/cxl_transport/cxl_transport.cpp
Outdated
Show resolved
Hide resolved
mooncake-transfer-engine/src/transport/cxl_transport/cxl_transport.cpp
Outdated
Show resolved
Hide resolved
mooncake-transfer-engine/src/transport/cxl_transport/cxl_transport.cpp
Outdated
Show resolved
Hide resolved
mooncake-transfer-engine/src/transport/cxl_transport/cxl_transport.cpp
Outdated
Show resolved
Hide resolved
mooncake-transfer-engine/src/transport/cxl_transport/cxl_transport.cpp
Outdated
Show resolved
Hide resolved
mooncake-transfer-engine/src/transport/cxl_transport/cxl_transport.cpp
Outdated
Show resolved
Hide resolved
mooncake-transfer-engine/src/transport/cxl_transport/cxl_transport.cpp
Outdated
Show resolved
Hide resolved
mooncake-transfer-engine/src/transport/cxl_transport/cxl_transport.cpp
Outdated
Show resolved
Hide resolved
| cxl_dev_path = std::string(env_cxl_dev_path); | ||
| cxl_dev_size = cxlGetDeviceSize(cxl_dev_path); | ||
| } else { | ||
| cxl_dev_path = "/dev/dax0.0"; |
There was a problem hiding this comment.
The code is hard-coded. Can you modify it to support auto-detection? If no CXL device is found, it should trigger error handling.
| #else | ||
|
|
||
| #ifdef USE_CXL | ||
| multi_transports_->installTransport("cxl", local_topology_); |
There was a problem hiding this comment.
need to check its return status?
| while (fgets(line, sizeof(line), fp) != NULL) { | ||
| char *size_str = strstr(line, "\"size\":"); | ||
| if (size_str) { | ||
| size_str += 7; // skip over "\"size\":" | ||
|
|
||
| while (*size_str && (*size_str < '0' || *size_str > '9')) { |
There was a problem hiding this comment.
Why not calculate its size using posix API?
| LOG(ERROR) << "CxlTransport::cxlMemcpy invalid arguments: null pointer provided."; | ||
| return -1; // null pointer | ||
| } | ||
| std::memcpy(dest, src, size); |
There was a problem hiding this comment.
If it is the same address, don't require memcpy
…rom daxctl to env variable && fix some bugs.
Cxl engine pr
…o match the base class 'Transport'.
| const char* env_cxl_dev_path = std::getenv("MC_CXL_DEV_PATH"); | ||
|
|
||
| CxlTransport::~CxlTransport() {} | ||
| LOG(INFO) << "MC_CXL_DEV_PATH: " << env_cxl_dev_path; |
There was a problem hiding this comment.
print it only when MC_CXL_DEV_PATH is set?
| // for now, get cxl_shm size from env | ||
| const char* env_cxl_dev_size = std::getenv("MC_CXL_DEV_SIZE"); | ||
|
|
||
| LOG(INFO) << "MC_CXL_DEV_SIZE: " << env_cxl_dev_size; |
| TransferStatus &status) { | ||
| return 0; | ||
| CxlTransport::~CxlTransport() { | ||
| munmap(cxl_base_addr, cxl_dev_size); |
There was a problem hiding this comment.
check cxl_base_addr sanity before munmap it
|
|
||
| int CxlTransport::cxlDevInit() | ||
| { | ||
| int fd = open(cxl_dev_path, O_RDWR); |
There was a problem hiding this comment.
check cxl_dev_path sanity
|
Hi @stmatengss |
Sure. Any reviews are welcome. |
…unit tests. (kvcache-ai#670) * [TransgerEngine] Add cxl transport test * [Transfer Engine] Add cxl transfer support * [TransferEngine] Update the way of obtaining the size of CXL memory from daxctl to env variable && fix some bugs. * [CXL transport] refactor code & add test * [TransferEngine] fix: modify the parameters of submitTransferTask() to match the base class 'Transport'. * [TransferEngine] fix: update etcd address * [TransferEngine] fix: unit tests error caused by cxl_transport * Update mooncake-common/common.cmake * [TransferEngine] fix: add some validity checks. --------- Co-authored-by: YiHong Lian <lianyihong@ieisystem.com> Co-authored-by: Karl Zhao <zhaozhiyuan05@ieisystem.com> Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com> Co-authored-by: Teng Ma <805522925@qq.com>
…unit tests. (kvcache-ai#670) * [TransgerEngine] Add cxl transport test * [Transfer Engine] Add cxl transfer support * [TransferEngine] Update the way of obtaining the size of CXL memory from daxctl to env variable && fix some bugs. * [CXL transport] refactor code & add test * [TransferEngine] fix: modify the parameters of submitTransferTask() to match the base class 'Transport'. * [TransferEngine] fix: update etcd address * [TransferEngine] fix: unit tests error caused by cxl_transport * Update mooncake-common/common.cmake * [TransferEngine] fix: add some validity checks. --------- Co-authored-by: YiHong Lian <lianyihong@ieisystem.com> Co-authored-by: Karl Zhao <zhaozhiyuan05@ieisystem.com> Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com> Co-authored-by: Teng Ma <805522925@qq.com>
This PR providesd a transfer-engine framework based on CXL shared memory. In the current implementation,
submitTransferusedmemcpy()for data transfer. Future work may switch to DSA to avoid CPU cache flushes.