Skip to content

Conversation

@LucasWilkinson
Copy link
Collaborator

Fix for #25494 , optionally let the dispatcher decide the cudagraph_mode in the dummy run and remove assert

@ywang96
Copy link
Member

ywang96 commented Sep 23, 2025

Is this a duplicate of #25498?

@LucasWilkinson
Copy link
Collaborator Author

LucasWilkinson commented Sep 23, 2025

Is this a duplicate of #25498?

No not quite; #25498 will regress the fix #25407

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an AssertionError that occurs during dummy runs for CUDA graph capturing, specifically when the number of requests (num_reqs) exceeds the configured maximum (max_num_reqs). The fix involves removing the assertion and allowing the cudagraph_runtime_mode to be determined dynamically by the dispatcher. This is a sensible change, as dummy runs for profiling or graph capturing may need to simulate scenarios that don't adhere to normal request limits. The code modifications are clean and directly address the issue. I have one suggestion to improve the logic for determining the cudagraph_runtime_mode.

Comment on lines 3037 to 3055

# filter out the valid batch descriptor
_cg_mode, batch_descriptor = self.cudagraph_dispatcher.dispatch(
BatchDescriptor(num_tokens=num_tokens,
uniform_decode=uniform_decode))
if cudagraph_runtime_mode is not None:
# sanity check
assert cudagraph_runtime_mode == _cg_mode, (
f"Cudagraph runtime mode mismatch at dummy_run. "
f"Expected {_cg_mode}, but got {cudagraph_runtime_mode}.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for determining cudagraph_runtime_mode has been refactored. While the new implementation is cleaner, it introduces a redundant call to self.cudagraph_dispatcher.dispatch in cases where cudagraph_runtime_mode is already CUDAGraphMode.NONE. In the previous version, this call was correctly skipped. To optimize this, we can check for CUDAGraphMode.NONE upfront and only call the dispatcher when necessary.

            if cudagraph_runtime_mode == CUDAGraphMode.NONE:
                _cg_mode = CUDAGraphMode.NONE
                batch_descriptor = None
            else:
                # filter out the valid batch descriptor
                _cg_mode, batch_descriptor = self.cudagraph_dispatcher.dispatch(
                    BatchDescriptor(num_tokens=num_tokens,
                                    uniform_decode=uniform_decode))
                if cudagraph_runtime_mode is not None:
                    # sanity check
                    assert cudagraph_runtime_mode == _cg_mode, (
                        f"Cudagraph runtime mode mismatch at dummy_run. "
                        f"Expected {_cg_mode}, but got {cudagraph_runtime_mode}."
                    )
                else:
                    cudagraph_runtime_mode = _cg_mode

Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! Just question

@mergify
Copy link

mergify bot commented Sep 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @LucasWilkinson.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 23, 2025
@LucasWilkinson LucasWilkinson force-pushed the lwilkinson/fix-dummy-run-assert branch from ca661f4 to 4058de8 Compare September 23, 2025 20:22
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@LucasWilkinson LucasWilkinson force-pushed the lwilkinson/fix-dummy-run-assert branch from 4058de8 to c745540 Compare September 23, 2025 20:23
@mergify mergify bot removed the needs-rebase label Sep 23, 2025
Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified that this also could solve #25494

# warm ups for cudagraph capture
assert cudagraph_runtime_mode == CUDAGraphMode.NONE or \
cudagraph_runtime_mode == _cg_mode, (
f"Cudagraph runtime mode mismatch at dummy_run. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Cudagraph runtime mode mismatch at dummy_run. "
assert cudagraph_runtime_mode in [CUDAGraphMode.NONE, _cg_mode], (

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025
@mgoin mgoin merged commit dc464a3 into vllm-project:main Sep 24, 2025
51 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…niform batch (vllm-project#25505)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
…niform batch (#25505)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025
…niform batch (vllm-project#25505)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: gaojc <1055866782@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…niform batch (vllm-project#25505)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
…niform batch (vllm-project#25505)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…niform batch (vllm-project#25505)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…niform batch (vllm-project#25505)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants