Skip to content

Conversation

@alexm-redhat
Copy link
Collaborator

@alexm-redhat alexm-redhat commented Sep 16, 2025

This PR fixes the hang issue with cutlass_mla when batch size is sufficiently large and kv_splits is high.
The solution is to limit the max kv_splits to 2 when batch size >= 1. We avoid limiting batch_size == 1, since larger kv_splits improve low-latency performance.

@mergify mergify bot added documentation Improvements or additions to documentation ci/build rocm Related to AMD ROCm labels Sep 16, 2025
@alexm-redhat alexm-redhat self-assigned this Sep 16, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a workaround to fix a hang in cutlass_mla for large batch sizes by limiting kv_splits. The fix itself seems reasonable given the context. However, I've noticed that several debugging print statements were uncommented in csrc/attention/mla/cutlass_sm100_mla/device/sm100_mla.hpp. These should be removed before merging to keep the codebase clean and avoid performance issues. The PR also includes substantial changes to dependency management files, which seem unrelated to the bugfix. It would be beneficial to address these dependency changes in a separate pull request with a dedicated description.

@alexm-redhat alexm-redhat force-pushed the fix_kv_split branch 2 times, most recently from bb41cec to 37f1a09 Compare September 16, 2025 13:45
@robertgshaw2-redhat robertgshaw2-redhat changed the title [Bugfix] Fix cutlass_mla hang for large batch size by limiting the kv… [Bugfix][B200] Fix cutlass_mla hang Sep 16, 2025
@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed deepseek Related to DeepSeek models bug Something isn't working and removed documentation Improvements or additions to documentation rocm Related to AMD ROCm ci/build labels Sep 16, 2025
@pavanimajety
Copy link
Collaborator

In my small model tests with few prompts(BS < 8), the engine still hangs. Would it be worth investing in why there is a hang?

@alexm-redhat
Copy link
Collaborator Author

@pavanimajety I will limit for B>1

@alexm-redhat alexm-redhat force-pushed the fix_kv_split branch 2 times, most recently from 145c28f to 6870d15 Compare September 16, 2025 21:12
…for larger batch size

Signed-off-by: Alexander Matveev <amatveev@redhat.com>
@mgoin mgoin merged commit fedb75f into main Sep 17, 2025
82 checks passed
@mgoin mgoin deleted the fix_kv_split branch September 17, 2025 22:06
@pavanimajety
Copy link
Collaborator

@pavanimajety I will limit for B>1

Thanks, that works for now.

debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: charlifu <charlifu@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants