[Bugfix][B200] Fix `cutlass_mla` hang #24966

alexm-redhat · 2025-09-16T13:37:37Z

This PR fixes the hang issue with cutlass_mla when batch size is sufficiently large and kv_splits is high.
The solution is to limit the max kv_splits to 2 when batch size >= 1. We avoid limiting batch_size == 1, since larger kv_splits improve low-latency performance.

gemini-code-assist

Code Review

This pull request introduces a workaround to fix a hang in cutlass_mla for large batch sizes by limiting kv_splits. The fix itself seems reasonable given the context. However, I've noticed that several debugging print statements were uncommented in csrc/attention/mla/cutlass_sm100_mla/device/sm100_mla.hpp. These should be removed before merging to keep the codebase clean and avoid performance issues. The PR also includes substantial changes to dependency management files, which seem unrelated to the bugfix. It would be beneficial to address these dependency changes in a separate pull request with a dedicated description.

csrc/attention/mla/cutlass_sm100_mla/device/sm100_mla.hpp

pavanimajety · 2025-09-16T15:08:25Z

In my small model tests with few prompts(BS < 8), the engine still hangs. Would it be worth investing in why there is a hang?

alexm-redhat · 2025-09-16T21:09:43Z

@pavanimajety I will limit for B>1

…for larger batch size Signed-off-by: Alexander Matveev <amatveev@redhat.com>

pavanimajety · 2025-09-18T00:21:10Z

@pavanimajety I will limit for B>1

Thanks, that works for now.

Signed-off-by: Alexander Matveev <amatveev@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

Signed-off-by: Alexander Matveev <amatveev@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: charlifu <charlifu@amd.com>

Signed-off-by: Alexander Matveev <amatveev@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Alexander Matveev <amatveev@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

Signed-off-by: Alexander Matveev <amatveev@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

alexm-redhat requested review from mgoin and robertgshaw2-redhat September 16, 2025 13:37

mergify bot added documentation Improvements or additions to documentation ci/build rocm Related to AMD ROCm labels Sep 16, 2025

alexm-redhat self-assigned this Sep 16, 2025

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

csrc/attention/mla/cutlass_sm100_mla/device/sm100_mla.hpp Outdated Show resolved Hide resolved

alexm-redhat force-pushed the fix_kv_split branch 2 times, most recently from bb41cec to 37f1a09 Compare September 16, 2025 13:45

robertgshaw2-redhat changed the title ~~[Bugfix] Fix cutlass_mla hang for large batch size by limiting the kv…~~ [Bugfix][B200] Fix cutlass_mla hang Sep 16, 2025

mgoin added ready ONLY add when PR is ready to merge/full CI is needed deepseek Related to DeepSeek models bug Something isn't working and removed documentation Improvements or additions to documentation rocm Related to AMD ROCm ci/build labels Sep 16, 2025

alexm-redhat force-pushed the fix_kv_split branch 2 times, most recently from 145c28f to 6870d15 Compare September 16, 2025 21:12

[Bugfix] Fix hanging issue with cutlass_mla by limiting the kv_split …

2975c3f

…for larger batch size Signed-off-by: Alexander Matveev <amatveev@redhat.com>

alexm-redhat force-pushed the fix_kv_split branch from 6870d15 to 2975c3f Compare September 16, 2025 21:13

mgoin added 2 commits September 17, 2025 13:08

Merge branch 'main' into fix_kv_split

7f22006

Merge branch 'main' into fix_kv_split

9850284

mgoin approved these changes Sep 17, 2025

View reviewed changes

mgoin merged commit fedb75f into main Sep 17, 2025
82 checks passed

mgoin deleted the fix_kv_split branch September 17, 2025 22:06

debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025

[Bugfix][B200] Fix cutlass_mla hang (vllm-project#24966)

0c141a6

Signed-off-by: Alexander Matveev <amatveev@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Bugfix][B200] Fix cutlass_mla hang (vllm-project#24966)

cc65153

Signed-off-by: Alexander Matveev <amatveev@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

LucasWilkinson mentioned this pull request Oct 1, 2025

revert max split heuristics neuralmagic/vllm#121

Draft

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Bugfix][B200] Fix cutlass_mla hang (vllm-project#24966)

1efe0b6

Signed-off-by: Alexander Matveev <amatveev@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Bugfix][B200] Fix `cutlass_mla` hang #24966

[Bugfix][B200] Fix `cutlass_mla` hang #24966

Uh oh!

alexm-redhat commented Sep 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

pavanimajety commented Sep 16, 2025

Uh oh!

alexm-redhat commented Sep 16, 2025

Uh oh!

Uh oh!

pavanimajety commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Uh oh!

[Bugfix][B200] Fix cutlass_mla hang #24966

[Bugfix][B200] Fix cutlass_mla hang #24966

Uh oh!

Conversation

alexm-redhat commented Sep 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

pavanimajety commented Sep 16, 2025

Uh oh!

alexm-redhat commented Sep 16, 2025

Uh oh!

Uh oh!

pavanimajety commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Bugfix][B200] Fix `cutlass_mla` hang #24966

[Bugfix][B200] Fix `cutlass_mla` hang #24966

alexm-redhat commented Sep 16, 2025 •

edited by github-actions bot

Loading