[Bugfix] fix qwen3-next crash #28202

ZJY0516 · 2025-11-06T09:23:12Z

Purpose

partially fix #27571

In decoding phase with cuda garaph, we will pad for pre-captured cudagraph size.
This makes batch don't equal to attn_metadata.num_decodes and trigger assertion error in causal_conv1d_update

mixed_qkv_non_spec = causal_conv1d_update(
                mixed_qkv_non_spec,
                conv_state,
                conv_weights,
                self.conv1d.bias,
                self.activation,
                conv_state_indices=non_spec_state_indices_tensor[
                    : attn_metadata.num_decodes
                ],
                validate_data=True,
            )
# inside causal_conv1d_update
if conv_state_indices is None:
    assert conv_state.size(0) >= batch
else:
    assert (batch,) == conv_state_indices.shape

Test Plan

vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --enable-expert-parallel -tp 4 -dp 2

vllm bench serve \
--model Qwen/Qwen3-Next-80B-A3B-Instruct \
--dataset-name random \
--tokenizer Qwen/Qwen3-Next-80B-A3B-Instruct \
--num-prompts 512 \
--random-input-len 2048 \
--random-output-len 1024 --request-rate 30

lm_eval --model local-chat-completions --model_args model=Qwen/Qwen3-Next-80B-A3B-Instruct,base_url=http://localhost:8000/v1/chat/completions,num_concurrent=280 --tasks gsm8k --apply_chat_template --num_fewshot 5

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.5967	±	0.0135
		strict-match	5	exact_match	↑	0.4170	±	0.0136

vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --enable-expert-parallel -tp 4

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.7839	±	0.0113
		strict-match	5	exact_match	↑	0.6611	±	0.0130

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

gemini-code-assist

Code Review

This pull request addresses a bug in the qwen3-next model's Qwen3NextGatedDeltaNet layer. The change correctly adjusts the slicing of non_spec_state_indices_tensor by using attn_metadata.num_actual_tokens instead of attn_metadata.num_decodes. This is a critical fix for scenarios involving CUDA graph capture, where tensors are padded to a fixed size. The original code could lead to shape mismatches and assertion failures, while the new code ensures the tensor size is correct, preventing potential crashes. The fix is accurate and necessary for robust model execution.

ZJY0516 · 2025-11-06T09:54:13Z

It seems this pr has accuracy issue

lm_eval --model local-completions --model_args model=Qwen/Qwen3-
Next-80B-A3B-Instruct,base_url=http://localhost:8000/v1/completions -t gsm8k --num_fewshot 5 --batch_size 250

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.3480	±	0.0131
		strict-match	5	exact_match	↑	0.2782	±	0.0123

vadiklyutiy · 2025-11-06T11:48:33Z

at first glance it looks like the right change ...

vadiklyutiy · 2025-11-06T11:52:22Z

@ZJY0516 could you describe what happens in #27571? Why does it cause illegal memory access?

ZJY0516 · 2025-11-06T11:55:59Z

Sometimes it will crash for illegal memory access and sometimes for assertion error.

I think the root cause is assert (batch,) == conv_state_indices.shape @vadiklyutiy

vadiklyutiy · 2025-11-06T11:58:58Z

Could you check lm_eval with this PR changes but without -dp

ZJY0516 · 2025-11-06T12:11:34Z

Could you check lm_eval with this PR changes but without -dp

much worse

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0	±	0
		strict-match	5	exact_match	↑	0	±	0

This is most likely due to the Triton kernel cache

ZJY0516 · 2025-11-07T08:27:20Z

I tested it on 2 H200 and there is no problem now. Could you please help to test this PR on you machine? @vadiklyutiy

vllm serve /data/datasets/models-hf/Qwen3-Next-80B-A3B-Instruct --served-model-name Qwen/Qwen3-Next-80B-A3B-Instruct -tp 2 --enable-expert-parallel --compilation-config '{"cudagraph_mode": "NONE"}'

lm_eval --model local-chat-completions --model_args model=Qwen/Qwen3-Next-80B-A3BInstruct,base_url=http://localhost:8000/v1/chat/completions,num_concurrent=280 --tasks gsm8k --apply_chat_template --num_fewshot 5

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.7892	±	0.0112
		strict-match	5	exact_match	↑	0.6649	±	0.0130

vadiklyutiy · 2025-11-07T23:57:35Z

--no-enable-prefix-caching is missed

ZJY0516 · 2025-11-10T10:22:21Z

After merging from main

lm_eval --model local-chat-completions --model_args model=Qwen/Qwen3-Next-80B-A3B-Instruct,base_url=http://localhost:8000/v1/chat/completions,num_concurrent=280 --tasks gsm8k --apply_chat_template --num_fewshot 5

this pr

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.5967	±	0.0135
		strict-match	5	exact_match	↑	0.4170	±	0.0136

heheda12345

Nice fix!

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

fix

9fd14c8

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 requested a review from sighingnow as a code owner November 6, 2025 09:23

mergify bot added the qwen Related to Qwen models label Nov 6, 2025

gemini-code-assist bot reviewed Nov 6, 2025

View reviewed changes

ZJY0516 changed the title ~~[Bugfix] fix qwen3-next ima~~ [Bugfix] fix qwen3-next crash Nov 6, 2025

ZJY0516 mentioned this pull request Nov 6, 2025

[Bug]: qwen3-next failed with CUDA error: an illegal memory access was encountered #27571

Closed

1 task

vadiklyutiy self-requested a review November 6, 2025 11:48

Merge branch 'main' into fix-q3n

27d3775

heheda12345 approved these changes Nov 11, 2025

View reviewed changes

heheda12345 enabled auto-merge (squash) November 11, 2025 04:13

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 11, 2025

heheda12345 merged commit f0359ff into vllm-project:main Nov 11, 2025
60 checks passed

ZJY0516 deleted the fix-q3n branch November 13, 2025 09:39

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025

[Bugfix] fix qwen3-next crash (vllm-project#28202)

7e0d791

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] fix qwen3-next crash #28202

[Bugfix] fix qwen3-next crash #28202

ZJY0516 commented Nov 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

ZJY0516 commented Nov 6, 2025

Uh oh!

vadiklyutiy commented Nov 6, 2025

Uh oh!

vadiklyutiy commented Nov 6, 2025

Uh oh!

ZJY0516 commented Nov 6, 2025

Uh oh!

vadiklyutiy commented Nov 6, 2025

Uh oh!

ZJY0516 commented Nov 6, 2025 •

edited

Loading

Uh oh!

ZJY0516 commented Nov 7, 2025 •

edited

Loading

Uh oh!

vadiklyutiy commented Nov 7, 2025

Uh oh!

ZJY0516 commented Nov 10, 2025

Uh oh!

heheda12345 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] fix qwen3-next crash #28202

[Bugfix] fix qwen3-next crash #28202

Conversation

ZJY0516 commented Nov 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

ZJY0516 commented Nov 6, 2025

Uh oh!

vadiklyutiy commented Nov 6, 2025

Uh oh!

vadiklyutiy commented Nov 6, 2025

Uh oh!

ZJY0516 commented Nov 6, 2025

Uh oh!

vadiklyutiy commented Nov 6, 2025

Uh oh!

ZJY0516 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZJY0516 commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadiklyutiy commented Nov 7, 2025

Uh oh!

ZJY0516 commented Nov 10, 2025

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZJY0516 commented Nov 6, 2025 •

edited by github-actions bot

Loading

ZJY0516 commented Nov 6, 2025 •

edited

Loading

ZJY0516 commented Nov 7, 2025 •

edited

Loading