-
-
Couldn't load subscription status.
- Fork 10.9k
[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch #25505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch #25505
Conversation
|
Is this a duplicate of #25498? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an AssertionError that occurs during dummy runs for CUDA graph capturing, specifically when the number of requests (num_reqs) exceeds the configured maximum (max_num_reqs). The fix involves removing the assertion and allowing the cudagraph_runtime_mode to be determined dynamically by the dispatcher. This is a sensible change, as dummy runs for profiling or graph capturing may need to simulate scenarios that don't adhere to normal request limits. The code modifications are clean and directly address the issue. I have one suggestion to improve the logic for determining the cudagraph_runtime_mode.
|
|
||
| # filter out the valid batch descriptor | ||
| _cg_mode, batch_descriptor = self.cudagraph_dispatcher.dispatch( | ||
| BatchDescriptor(num_tokens=num_tokens, | ||
| uniform_decode=uniform_decode)) | ||
| if cudagraph_runtime_mode is not None: | ||
| # sanity check | ||
| assert cudagraph_runtime_mode == _cg_mode, ( | ||
| f"Cudagraph runtime mode mismatch at dummy_run. " | ||
| f"Expected {_cg_mode}, but got {cudagraph_runtime_mode}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for determining cudagraph_runtime_mode has been refactored. While the new implementation is cleaner, it introduces a redundant call to self.cudagraph_dispatcher.dispatch in cases where cudagraph_runtime_mode is already CUDAGraphMode.NONE. In the previous version, this call was correctly skipped. To optimize this, we can check for CUDAGraphMode.NONE upfront and only call the dispatcher when necessary.
if cudagraph_runtime_mode == CUDAGraphMode.NONE:
_cg_mode = CUDAGraphMode.NONE
batch_descriptor = None
else:
# filter out the valid batch descriptor
_cg_mode, batch_descriptor = self.cudagraph_dispatcher.dispatch(
BatchDescriptor(num_tokens=num_tokens,
uniform_decode=uniform_decode))
if cudagraph_runtime_mode is not None:
# sanity check
assert cudagraph_runtime_mode == _cg_mode, (
f"Cudagraph runtime mode mismatch at dummy_run. "
f"Expected {_cg_mode}, but got {cudagraph_runtime_mode}."
)
else:
cudagraph_runtime_mode = _cg_modeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work! Just question
|
This pull request has merge conflicts that must be resolved before it can be |
ca661f4 to
4058de8
Compare
4058de8 to
c745540
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified that this also could solve #25494
| # warm ups for cudagraph capture | ||
| assert cudagraph_runtime_mode == CUDAGraphMode.NONE or \ | ||
| cudagraph_runtime_mode == _cg_mode, ( | ||
| f"Cudagraph runtime mode mismatch at dummy_run. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| f"Cudagraph runtime mode mismatch at dummy_run. " | |
| assert cudagraph_runtime_mode in [CUDAGraphMode.NONE, _cg_mode], ( |
…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
…niform batch (#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: gaojc <1055866782@qq.com>
…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
…niform batch (vllm-project#25505) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Fix for #25494 , optionally let the dispatcher decide the cudagraph_mode in the dummy run and remove assert