Skip to content

add qwen3-moe optimization #1441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

shiyuan680
Copy link

@shiyuan680 shiyuan680 commented Jun 26, 2025

What this PR does / why we need it?

Reuse some optimizations from deepseek.

Does this PR introduce any user-facing change?

How was this patch tested?

test in 235b
parallelism tps open
dp16tp2ep32 160 close
dp16tp2ep32 192 on
dp8tp4ep32 76 close
dp8tp4ep32 128 on

Copy link
Collaborator

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update commits msg more meanful, such as mention what kind of change apply compare to upstream impementation and performance test results

@@ -35,6 +35,7 @@
MODELS = [
"Qwen/Qwen2.5-0.5B-Instruct",
"Qwen/Qwen3-0.6B-Base",
"Qwen/Qwen3-30B-A3B",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is too huge and will cost lots of time to run ci, please try to used reduce layer model: https://vllm-ascend.readthedocs.io/en/latest/developer_guide/contribution/testing.html#e2e-test-example

@@ -33,3 +57,89 @@ class CustomQwen3MoeForCausalLM(Qwen3MoeForCausalLM):
"experts":
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"],
}


class AscendQwen3MoeSparseMoeBlock(nn.Module):
Copy link
Collaborator

@Yikun Yikun Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shiyuan680 shiyuan680 force-pushed the qwen3 branch 9 times, most recently from cc28837 to 0461ef2 Compare June 27, 2025 02:30
Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>
@shiyuan680 shiyuan680 force-pushed the qwen3 branch 2 times, most recently from 3c8a113 to db4c8c3 Compare June 27, 2025 07:45
Signed-off-by: yangcheng (AJ) <y00806874@china.huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants