Skip to content

Comments

feat(transfer_engine_bench): Add multi-GPU support #675

Merged
alogfans merged 4 commits intokvcache-ai:mainfrom
staryxchen:opt/bench
Jul 28, 2025
Merged

feat(transfer_engine_bench): Add multi-GPU support #675
alogfans merged 4 commits intokvcache-ai:mainfrom
staryxchen:opt/bench

Conversation

@staryxchen
Copy link
Collaborator

Summary

This PR enhances the mooncake transfer engine benchmark to support multi-GPU configurations and improves memory allocation logic for better performance demonstration.

Change

  • Added support for using all available GPUs: When gpu_id=-1, the benchmark now automatically detects and utilizes all available GPUs
  • Dynamic buffer allocation: The number of buffers now scales with the number of GPUs when using VRAM
  • Enhanced logging: Added detailed logging for GPU memory allocation to aid in debugging and monitoring

Usage

  • Single GPU: --gpu_id=0 (existing behavior)
  • Multi-GPU: --gpu_id=-1 (new feature - uses all available GPUs)

Testing Result

The changes maintain full backward compatibility with existing single-GPU configurations while enabling multi-GPU scenarios for enhanced performance testing. I have test the performance with the below command:

  • target:
./transfer_engine_bench --mode=target --metadata_server=redis://${redis_ip}:${redis_port} --local_server_name=${local_ip} --use_vram=true --block_size=$((1*1024*1024)) --threads=16 --auto_discovery --gpu_id=-1
  • initiator:
./transfer_engine_bench --metadata_server=redis://${redis_ip}:${redis_port} --local_server_name=${local_ip} segment_id=${target_ip} --block_size=$((1*1024*1024)) --operation=write --duration=30 --threads=16 --auto_discovery --use_vram=true --gpu_id=-1

And the result show that When allocating memory from eight H20 GPUs simultaneously (which is the purpose of this PR), the Mooncake Transfer Engine can fully utilize the bandwidth of eight 400Gbps RDMA NIC
Clipboard_Screenshot_1753502699

… allocation logic

- Added support for using all available GPUs when gpu_id=-1
- Improved memory allocation logic with dynamic buffer_num
- Enhanced logging for GPU memory allocation
- Refactored buffer naming convention

Signed-off-by: staryxchen <staryxchen@tencent.com>
Signed-off-by: staryxchen <staryxchen@tencent.com>
Signed-off-by: staryxchen <staryxchen@tencent.com>
Signed-off-by: staryxchen <staryxchen@tencent.com>
Copy link
Collaborator

@alogfans alogfans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alogfans alogfans merged commit 68893af into kvcache-ai:main Jul 28, 2025
10 checks passed
@staryxchen staryxchen deleted the opt/bench branch July 28, 2025 03:34
wanyue-wy pushed a commit to wanyue-wy/Mooncake that referenced this pull request Dec 14, 2025
* feat(transfer_engine_bench): add multi-GPU support and improve memory allocation logic

- Added support for using all available GPUs when gpu_id=-1
- Improved memory allocation logic with dynamic buffer_num
- Enhanced logging for GPU memory allocation
- Refactored buffer naming convention

Signed-off-by: staryxchen <staryxchen@tencent.com>

* replace thread array with vector

Signed-off-by: staryxchen <staryxchen@tencent.com>

* move DRAM log messages to initialization blocks

Signed-off-by: staryxchen <staryxchen@tencent.com>

* fix typo

Signed-off-by: staryxchen <staryxchen@tencent.com>

---------

Signed-off-by: staryxchen <staryxchen@tencent.com>
JasonZhang517 pushed a commit to JasonZhang517/Mooncake that referenced this pull request Feb 9, 2026
* feat(transfer_engine_bench): add multi-GPU support and improve memory allocation logic

- Added support for using all available GPUs when gpu_id=-1
- Improved memory allocation logic with dynamic buffer_num
- Enhanced logging for GPU memory allocation
- Refactored buffer naming convention

Signed-off-by: staryxchen <staryxchen@tencent.com>

* replace thread array with vector

Signed-off-by: staryxchen <staryxchen@tencent.com>

* move DRAM log messages to initialization blocks

Signed-off-by: staryxchen <staryxchen@tencent.com>

* fix typo

Signed-off-by: staryxchen <staryxchen@tencent.com>

---------

Signed-off-by: staryxchen <staryxchen@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants