-
Notifications
You must be signed in to change notification settings - Fork 362
[Bugfix] Fix the bug that qwen3 moe doesn't work with aclgraph #2478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: shen-shanshan <467638484@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix an issue where qwen3 MoE models do not work with aclgraph. The changes involve moving the Ascend-specific MoE block implementation to the qwen3 model file and conditionally disabling it when aclgraph is enabled. While the overall approach is sound, the current implementation introduces critical bugs. Specifically, the method signatures for the newly introduced CustomSparseMoeBlock
and the conditionally used Qwen3MoeSparseMoeBlock
are incompatible with how they are called, which will lead to runtime TypeError
s. I have provided detailed comments and suggestions to address these critical issues.
f3fb510
to
ff50da6
Compare
example_prompts = [ | ||
"Hello, my name is", | ||
] | ||
dtype = "half" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllm runner actually runs in eager mode by default.
What this PR does / why we need it?
What's the PR does:
Does this PR introduce any user-facing change?
How was this patch tested?