[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch #25505

LucasWilkinson · 2025-09-23T19:08:15Z

Fix for #25494 , optionally let the dispatcher decide the cudagraph_mode in the dummy run and remove assert

ywang96 · 2025-09-23T19:14:04Z

Is this a duplicate of #25498?

LucasWilkinson · 2025-09-23T19:15:06Z

Is this a duplicate of #25498?

No not quite; #25498 will regress the fix #25407

gemini-code-assist

Code Review

This pull request addresses an AssertionError that occurs during dummy runs for CUDA graph capturing, specifically when the number of requests (num_reqs) exceeds the configured maximum (max_num_reqs). The fix involves removing the assertion and allowing the cudagraph_runtime_mode to be determined dynamically by the dispatcher. This is a sensible change, as dummy runs for profiling or graph capturing may need to simulate scenarios that don't adhere to normal request limits. The code modifications are clean and directly address the issue. I have one suggestion to improve the logic for determining the cudagraph_runtime_mode.

gemini-code-assist · 2025-09-23T19:16:58Z

vllm/v1/worker/gpu_model_runner.py

+
+            # filter out the valid batch descriptor
+            _cg_mode, batch_descriptor = self.cudagraph_dispatcher.dispatch(
+                BatchDescriptor(num_tokens=num_tokens,
+                                uniform_decode=uniform_decode))
+            if cudagraph_runtime_mode is not None:
                # sanity check
                assert cudagraph_runtime_mode == _cg_mode, (
                    f"Cudagraph runtime mode mismatch at dummy_run. "
                    f"Expected {_cg_mode}, but got {cudagraph_runtime_mode}.")


The logic for determining cudagraph_runtime_mode has been refactored. While the new implementation is cleaner, it introduces a redundant call to self.cudagraph_dispatcher.dispatch in cases where cudagraph_runtime_mode is already CUDAGraphMode.NONE. In the previous version, this call was correctly skipped. To optimize this, we can check for CUDAGraphMode.NONE upfront and only call the dispatcher when necessary.

if cudagraph_runtime_mode == CUDAGraphMode.NONE: _cg_mode = CUDAGraphMode.NONE batch_descriptor = None else: # filter out the valid batch descriptor _cg_mode, batch_descriptor = self.cudagraph_dispatcher.dispatch( BatchDescriptor(num_tokens=num_tokens, uniform_decode=uniform_decode)) if cudagraph_runtime_mode is not None: # sanity check assert cudagraph_runtime_mode == _cg_mode, ( f"Cudagraph runtime mode mismatch at dummy_run. " f"Expected {_cg_mode}, but got {cudagraph_runtime_mode}." ) else: cudagraph_runtime_mode = _cg_mode

yewentao256

Thanks for the work! Just question

vllm/v1/worker/gpu_model_runner.py

mergify · 2025-09-23T20:13:28Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @LucasWilkinson.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

yewentao256

Verified that this also could solve #25494

yewentao256 · 2025-09-23T20:47:30Z

vllm/v1/worker/gpu_model_runner.py

+                # warm ups for cudagraph capture
+                assert cudagraph_runtime_mode == CUDAGraphMode.NONE or \
+                    cudagraph_runtime_mode == _cg_mode, (
                    f"Cudagraph runtime mode mismatch at dummy_run. "


Suggested change

f"Cudagraph runtime mode mismatch at dummy_run. "

assert cudagraph_runtime_mode in [CUDAGraphMode.NONE, _cg_mode], (

…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

…niform batch (#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: gaojc <1055866782@qq.com>

…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

LucasWilkinson requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 23, 2025 19:08

mergify bot added the v1 label Sep 23, 2025

gemini-code-assist bot reviewed Sep 23, 2025

View reviewed changes

yewentao256 reviewed Sep 23, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Sep 23, 2025

LucasWilkinson force-pushed the lwilkinson/fix-dummy-run-assert branch from ca661f4 to 4058de8 Compare September 23, 2025 20:22

LucasWilkinson added 3 commits September 23, 2025 20:23

wip

486e499

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix

c79cdc4

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix

c745540

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson force-pushed the lwilkinson/fix-dummy-run-assert branch from 4058de8 to c745540 Compare September 23, 2025 20:23

mergify bot removed the needs-rebase label Sep 23, 2025

yewentao256 approved these changes Sep 23, 2025

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025

mgoin merged commit dc464a3 into vllm-project:main Sep 24, 2025
51 checks passed

mgoin mentioned this pull request Sep 24, 2025

[Bug] Fix AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch #25498

Closed

robertgshaw2-redhat mentioned this pull request Sep 24, 2025

[Bug]: AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch #25494

Closed

1 task

fhl2000 mentioned this pull request Sep 25, 2025

[V1] address post issues related to #20059 (part 1); cascade attention reenable by default #23046

Merged

8 tasks

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for u…

5496fcc

…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for u…

8e6a5e7

…niform batch (#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for u…

42aeb61

…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for u…

85c009b

…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch #25505

[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch #25505

Uh oh!

LucasWilkinson commented Sep 23, 2025

Uh oh!

ywang96 commented Sep 23, 2025

Uh oh!

LucasWilkinson commented Sep 23, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 23, 2025

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

mergify bot commented Sep 23, 2025

Uh oh!

yewentao256 left a comment

Uh oh!

yewentao256 Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	f"Cudagraph runtime mode mismatch at dummy_run. "
	assert cudagraph_runtime_mode in [CUDAGraphMode.NONE, _cg_mode], (

Uh oh!

Uh oh!

[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch #25505

[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch #25505

Uh oh!

Conversation

LucasWilkinson commented Sep 23, 2025

Uh oh!

ywang96 commented Sep 23, 2025

Uh oh!

LucasWilkinson commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Sep 23, 2025

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

yewentao256 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LucasWilkinson commented Sep 23, 2025 •

edited

Loading