-
Notifications
You must be signed in to change notification settings - Fork 222
add qwen3-moe optimization #1441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update commits msg more meanful, such as mention what kind of change apply compare to upstream impementation and performance test results
@@ -35,6 +35,7 @@ | |||
MODELS = [ | |||
"Qwen/Qwen2.5-0.5B-Instruct", | |||
"Qwen/Qwen3-0.6B-Base", | |||
"Qwen/Qwen3-30B-A3B", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is too huge and will cost lots of time to run ci, please try to used reduce layer model: https://vllm-ascend.readthedocs.io/en/latest/developer_guide/contribution/testing.html#e2e-test-example
@@ -33,3 +57,89 @@ class CustomQwen3MoeForCausalLM(Qwen3MoeForCausalLM): | |||
"experts": | |||
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"], | |||
} | |||
|
|||
|
|||
class AscendQwen3MoeSparseMoeBlock(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add ut for this: https://vllm-ascend.readthedocs.io/en/latest/developer_guide/contribution/testing.html
cc28837
to
0461ef2
Compare
Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>
3c8a113
to
db4c8c3
Compare
Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>
What this PR does / why we need it?
Reuse some optimizations from deepseek.Does this PR introduce any user-facing change?
How was this patch tested?
test in 235b
parallelism tps open
dp16tp2ep32 160 close
dp16tp2ep32 192 on
dp8tp4ep32 76 close
dp8tp4ep32 128 on