feat(transfer_engine_bench): Add multi-GPU support by staryxchen · Pull Request #675 · kvcache-ai/Mooncake

staryxchen · 2025-07-26T04:10:55Z

Summary

This PR enhances the mooncake transfer engine benchmark to support multi-GPU configurations and improves memory allocation logic for better performance demonstration.

Change

Added support for using all available GPUs: When gpu_id=-1, the benchmark now automatically detects and utilizes all available GPUs
Dynamic buffer allocation: The number of buffers now scales with the number of GPUs when using VRAM
Enhanced logging: Added detailed logging for GPU memory allocation to aid in debugging and monitoring

Usage

Single GPU: --gpu_id=0 (existing behavior)
Multi-GPU: --gpu_id=-1 (new feature - uses all available GPUs)

Testing Result

The changes maintain full backward compatibility with existing single-GPU configurations while enabling multi-GPU scenarios for enhanced performance testing. I have test the performance with the below command:

target:

./transfer_engine_bench --mode=target --metadata_server=redis://${redis_ip}:${redis_port} --local_server_name=${local_ip} --use_vram=true --block_size=$((1*1024*1024)) --threads=16 --auto_discovery --gpu_id=-1

initiator:

./transfer_engine_bench --metadata_server=redis://${redis_ip}:${redis_port} --local_server_name=${local_ip} segment_id=${target_ip} --block_size=$((1*1024*1024)) --operation=write --duration=30 --threads=16 --auto_discovery --use_vram=true --gpu_id=-1

And the result show that When allocating memory from eight H20 GPUs simultaneously (which is the purpose of this PR), the Mooncake Transfer Engine can fully utilize the bandwidth of eight 400Gbps RDMA NIC

… allocation logic - Added support for using all available GPUs when gpu_id=-1 - Improved memory allocation logic with dynamic buffer_num - Enhanced logging for GPU memory allocation - Refactored buffer naming convention Signed-off-by: staryxchen <staryxchen@tencent.com>

Signed-off-by: staryxchen <staryxchen@tencent.com>

alogfans

LGTM

* feat(transfer_engine_bench): add multi-GPU support and improve memory allocation logic - Added support for using all available GPUs when gpu_id=-1 - Improved memory allocation logic with dynamic buffer_num - Enhanced logging for GPU memory allocation - Refactored buffer naming convention Signed-off-by: staryxchen <staryxchen@tencent.com> * replace thread array with vector Signed-off-by: staryxchen <staryxchen@tencent.com> * move DRAM log messages to initialization blocks Signed-off-by: staryxchen <staryxchen@tencent.com> * fix typo Signed-off-by: staryxchen <staryxchen@tencent.com> --------- Signed-off-by: staryxchen <staryxchen@tencent.com>

staryxchen added 4 commits July 26, 2025 03:59

replace thread array with vector

3fac6ad

Signed-off-by: staryxchen <staryxchen@tencent.com>

move DRAM log messages to initialization blocks

d77bc7f

Signed-off-by: staryxchen <staryxchen@tencent.com>

fix typo

8c6a218

Signed-off-by: staryxchen <staryxchen@tencent.com>

alogfans approved these changes Jul 28, 2025

View reviewed changes

alogfans merged commit 68893af into kvcache-ai:main Jul 28, 2025
10 checks passed

staryxchen deleted the opt/bench branch July 28, 2025 03:34

staryxchen mentioned this pull request Feb 11, 2026

[TENT] Improve tebench: GPU selection, graceful interruption, and build fixes #1537

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat(transfer_engine_bench): Add multi-GPU support #675

feat(transfer_engine_bench): Add multi-GPU support #675
alogfans merged 4 commits intokvcache-ai:mainfrom
staryxchen:opt/bench

staryxchen commented Jul 26, 2025

Uh oh!

alogfans left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

staryxchen commented Jul 26, 2025

Summary

Change

Usage

Testing Result

Uh oh!

alogfans left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants