[BugFix] Fix MLA assert with CUTLASS MLA #25478

LucasWilkinson · 2025-09-23T14:07:30Z

Cutlass MLA has a block_size of 128 so following #25290 this would assert since default max_num_seqs is 1024

also make sure we capture the size of the up-projection in the profile run (this was in v0 backend but failed to make it to v0)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

gemini-code-assist

Code Review

This pull request addresses an assertion failure in the Multi-Level Attention (MLA) implementation by replacing a strict assert with a max() function to ensure a minimum workspace size. This change makes the code more robust against certain configurations. However, this fix introduces a critical issue: the newly calculated chunked_prefill_workspace_size may no longer be divisible by dcp_world_size, which is a requirement enforced by a subsequent assertion and is crucial for the correctness of distributed computations. I have provided a comment detailing the issue and a suggested fix to prevent this new potential crash.

gemini-code-assist · 2025-09-23T14:09:37Z

vllm/v1/attention/backends/mla/common.py

+        self.chunked_prefill_workspace_size = max(
+            self.chunked_prefill_workspace_size,
+            scheduler_config.max_num_seqs * cache_config.block_size)


This change correctly replaces the assert with max() to ensure a minimum workspace size. However, this can introduce a new critical issue.

The chunked_prefill_workspace_size is required to be divisible by dcp_world_size when dcp_world_size > 1, as enforced by an assert on line 495. This is necessary for the correctness of the all-gather workspace calculation later on (line 1368).

Your change can result in chunked_prefill_workspace_size not being a multiple of dcp_world_size, which will cause the assertion on line 495 to fail.

For example, if dcp_world_size = 3 and scheduler_config.max_num_seqs * cache_config.block_size is not a multiple of 3, this will lead to a crash.

To fix this, you should round up chunked_prefill_workspace_size to the nearest multiple of dcp_world_size when dcp_world_size > 1.

A possible fix would be to add the following inside the if self.dcp_world_size > 1: block, before the assertion at line 495:

self.chunked_prefill_workspace_size = cdiv( self.chunked_prefill_workspace_size, self.dcp_world_size ) * self.dcp_world_size

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

yewentao256 · 2025-09-23T15:18:30Z

vllm/v1/attention/backends/mla/common.py

+        # Enforce that we enough for at least 1 page per request
+        self.chunked_prefill_workspace_size = max(
+            self.chunked_prefill_workspace_size,
+            scheduler_config.max_num_seqs * cache_config.block_size)


In cutlass MLA case block size = 128, if max_num_seqs == 1024 it is back to 128 * 1024 again, so seems that we will meet an OOM issue again?

smarterclayton · 2025-09-23T15:20:06Z

This allowed me to start a vllm using deepseek v3.1 again (DP=16, B200)

Deferring to Wentao

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: gaojc <1055866782@qq.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

LucasWilkinson added 2 commits September 23, 2025 14:05

testing

5c74b1e

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

format

2b97c14

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 23, 2025 14:07

mergify bot added the v1 label Sep 23, 2025

gemini-code-assist bot reviewed Sep 23, 2025

View reviewed changes

cleanup

3b56a18

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

yewentao256 reviewed Sep 23, 2025

View reviewed changes

simon-mo previously approved these changes Sep 23, 2025

View reviewed changes

LucasWilkinson added 4 commits September 23, 2025 16:04

allocate workspace on boot

395214e

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix

192ddfc

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fixes

a29c6b5

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

486d9fe

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025

robertgshaw2-redhat approved these changes Sep 24, 2025

View reviewed changes

robertgshaw2-redhat merged commit 9df8da5 into main Sep 24, 2025
56 checks passed

robertgshaw2-redhat deleted the lwilkinson/fix-mla-assert branch September 24, 2025 01:09

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[BugFix] Fix MLA assert with CUTLASS MLA (vllm-project#25478)

291fa11

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[BugFix] Fix MLA assert with CUTLASS MLA (#25478)

a986f17

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025

[BugFix] Fix MLA assert with CUTLASS MLA (vllm-project#25478)

5be6509

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: gaojc <1055866782@qq.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[BugFix] Fix MLA assert with CUTLASS MLA (vllm-project#25478)

11b3bee

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[BugFix] Fix MLA assert with CUTLASS MLA (vllm-project#25478)

c59542b

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[BugFix] Fix MLA assert with CUTLASS MLA (vllm-project#25478)

4b16430

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[BugFix] Fix MLA assert with CUTLASS MLA (vllm-project#25478)

dbaf50c

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[BugFix] Fix MLA assert with CUTLASS MLA #25478

[BugFix] Fix MLA assert with CUTLASS MLA #25478

Uh oh!

LucasWilkinson commented Sep 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 23, 2025

Uh oh!

yewentao256 Sep 23, 2025

Uh oh!

smarterclayton commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Uh oh!

[BugFix] Fix MLA assert with CUTLASS MLA #25478

[BugFix] Fix MLA assert with CUTLASS MLA #25478

Uh oh!

Conversation

LucasWilkinson commented Sep 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

yewentao256 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

smarterclayton commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

LucasWilkinson commented Sep 23, 2025 •

edited by github-actions bot

Loading