-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Bugfix] fix qwen3-next crash #28202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug in the qwen3-next model's Qwen3NextGatedDeltaNet layer. The change correctly adjusts the slicing of non_spec_state_indices_tensor by using attn_metadata.num_actual_tokens instead of attn_metadata.num_decodes. This is a critical fix for scenarios involving CUDA graph capture, where tensors are padded to a fixed size. The original code could lead to shape mismatches and assertion failures, while the new code ensures the tensor size is correct, preventing potential crashes. The fix is accurate and necessary for robust model execution.
|
It seems this pr has accuracy issue
|
|
at first glance it looks like the right change ... |
|
Sometimes it will crash for illegal memory access and sometimes for assertion error. I think the root cause is |
|
Could you check |
much worse
This is most likely due to the Triton kernel cache |
|
I tested it on 2 H200 and there is no problem now. Could you please help to test this PR on you machine? @vadiklyutiy
|
|
|
|
After merging from main this pr
|
heheda12345
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix!
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
partially fix #27571
In decoding phase with cuda garaph, we will pad for pre-captured cudagraph size.
This makes
batchdon't equal toattn_metadata.num_decodesand trigger assertion error incausal_conv1d_updateTest Plan
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.