[TENT] Improve tebench: GPU selection, graceful interruption, and build fixes#1537
[TENT] Improve tebench: GPU selection, graceful interruption, and build fixes#1537alogfans merged 9 commits intokvcache-ai:mainfrom
Conversation
- Include csignal for signal handling in RPC module Signed-off-by: staryxchen <staryxchen@tencent.com>
- Configure tebench target to use relative RPATH for runtime library resolution - Ensure libasio.so can be found at runtime from the ../lib director Signed-off-by: staryxchen <staryxchen@tencent.com>
- Update runners to return error when stopped - Add early exit logic in main loop Signed-off-by: staryxchen <staryxchen@tencent.com>
- Update flags to accept -1 for all GPUs. - Adjust buffer allocation and address calculation for specific GPU selection. Signed-off-by: staryxchen <staryxchen@tencent.com>
- Extract common logic for device prefix and buffer count - Merge allocation and registration loops to reduce code duplication - Improve variable const-correctness Signed-off-by: staryxchen <staryxchen@tencent.com>
- Add`rpc_server_port`field to`XferBenchConfig` - Load port value from command line flags Signed-off-by: staryxchen <staryxchen@tencent.com>
Summary of ChangesHello @staryxchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces several valuable improvements to the tebench tool, including GPU selection, graceful interruption handling, and build fixes. The refactoring to unify buffer allocation logic in TENTBenchRunner::allocateBuffers is a notable improvement to code quality, reducing duplication. However, the new GPU selection logic lacks sufficient validation for negative ID values (other than the special -1 value), which can lead to out-of-bounds memory access and potential crashes or undefined behavior. It is recommended to validate that local_gpu_id and target_gpu_id are either -1 or non-negative before they are used as offsets or device indices. Additionally, there is a minor suggestion regarding error handling to improve robustness.
| auto target_gpu_offset = XferBenchConfig::target_gpu_id == -1 | ||
| ? 0 | ||
| : XferBenchConfig::target_gpu_id; | ||
| uint64_t local_addr = runner.getLocalBufferBase( | ||
| local_gpu_offset + thread_id, max_block_size, max_batch_size); | ||
| uint64_t target_addr = runner.getTargetBufferBase( | ||
| XferBenchConfig::target_gpu_id + thread_id, max_block_size, | ||
| max_batch_size); | ||
| target_gpu_offset + thread_id, max_block_size, max_batch_size); |
There was a problem hiding this comment.
The logic for target_gpu_offset allows XferBenchConfig::target_gpu_id to be a negative value other than -1. When such a value is provided, target_gpu_offset + thread_id can be negative. This value is passed to runner.getTargetBufferBase, which in both TEBenchRunner and TENTBenchRunner implementations uses it as an index into an array after a modulo operation. In C++, the result of the modulo operator % with a negative operand is negative, resulting in an out-of-bounds array access to info_->buffers or pinned_buffer_list_.
| if (XferBenchConfig::local_gpu_id != -1) { | ||
| start_idx = XferBenchConfig::local_gpu_id; | ||
| num_buffers = 1; | ||
| LOG_ASSERT(start_idx < gpu_count) | ||
| << "local_gpu_id " << start_idx << " exceeds device count " | ||
| << gpu_count; | ||
| } |
There was a problem hiding this comment.
Similar to the issue in te_backend.cpp, local_gpu_id is not validated to be non-negative. While TENTBenchRunner uses CHECK_FAIL to handle allocation errors, providing a negative GPU ID will still cause the benchmark to fail and exit unexpectedly due to an invalid device location string (e.g., 'cuda:-2') being passed to the transfer engine.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- Replace ternary operators with`std::max`for offset calculation - Update assertions to check for non-negative GPU IDs Signed-off-by: staryxchen <staryxchen@tencent.com>
Signed-off-by: staryxchen <staryxchen@tencent.com>
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
@staryxchen The original test benchmark uses GPU VRAM (#675), while the tebench may not use it by default. |
@alogfans I specified seg_type as VRAM, the specific command used and the result is as follows:
I believe there's still significant room for improvement. Have I missed any important configuration? |
…ld fixes (kvcache-ai#1537) * build: add csignal header to rpc.h - Include csignal for signal handling in RPC module Signed-off-by: staryxchen <staryxchen@tencent.com> * build(benchmark): set RPATH for tebench executable - Configure tebench target to use relative RPATH for runtime library resolution - Ensure libasio.so can be found at runtime from the ../lib director Signed-off-by: staryxchen <staryxchen@tencent.com> * feat(benchmark): support graceful interruption during execution - Update runners to return error when stopped - Add early exit logic in main loop Signed-off-by: staryxchen <staryxchen@tencent.com> * feat(benchmark): support specifying single GPU ID or all GPUs - Update flags to accept -1 for all GPUs. - Adjust buffer allocation and address calculation for specific GPU selection. Signed-off-by: staryxchen <staryxchen@tencent.com> * refactor(benchmark): unify buffer allocation logic for DRAM and VRAM - Extract common logic for device prefix and buffer count - Merge allocation and registration loops to reduce code duplication - Improve variable const-correctness Signed-off-by: staryxchen <staryxchen@tencent.com> * feat(benchmark): add rpc server port configuration - Add`rpc_server_port`field to`XferBenchConfig` - Load port value from command line flags Signed-off-by: staryxchen <staryxchen@tencent.com> * Update mooncake-transfer-engine/benchmark/tent_backend.cpp Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * refactor(benchmark): simplify gpu id validation and offset calculation - Replace ternary operators with`std::max`for offset calculation - Update assertions to check for non-negative GPU IDs Signed-off-by: staryxchen <staryxchen@tencent.com> * retrigger CI Signed-off-by: staryxchen <staryxchen@tencent.com> --------- Signed-off-by: staryxchen <staryxchen@tencent.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Description
This PR includes several improvements to the
tebenchbenchmark tool:Features
--local_gpu_id/--target_gpu_id, or-1for all GPUs.--rpc_server_portflag to configure the RPC server port (0 = auto-select).Refactor
TENTBenchRunner::allocateBuffers(), eliminating duplicated code across the two paths.Build
$ORIGIN/../lib) on thetebenchexecutable so it can findlibasio.soat runtime.<csignal>include inrpc.h.Type of Change
How Has This Been Tested?
Checklist
./scripts/code_format.shbefore submitting.