Skip to content

[v0.9.1][perf] add a switch for enabling NZ layout in weights and enable NZ for GMM. #1409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2025

Conversation

linfeng-yuan
Copy link
Contributor

@linfeng-yuan linfeng-yuan commented Jun 24, 2025

What this PR does / why we need it?

  1. add a switch for enabling NZ layout in weights
  2. enable NZ for GMM
  3. replace magic number of weights layout

Does this PR introduce any user-facing change?

Users should set enable_weight_nz_layout to true in --additional-config when they wanna enable weights NZ layout.

How was this patch tested?

  1. CI passed.
  2. accuracy and performance comparison (only gsm8k-lite)

@@ -41,6 +41,8 @@ def __init__(self, vllm_config):
self.expert_map_path = additional_config.get("expert_map_path", None)
self.chunked_prefill_for_mla = additional_config.get(
"chunked_prefill_for_mla", False)
self.enable_weight_nz_layout = additional_config.get(
"enable_weight_nz_layout", False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once when you sync the PR to main, please update additional config doc as well. And if this is a temp config, add TODO here is necessary as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once when you sync the PR to main, please update additional config doc as well. And if this is a temp config, add TODO here is necessary as well.

Sure, thanks for reminding me.

@wangxiyuan
Copy link
Collaborator

commit message should be updated

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: linfeng-yuan <1102311262@qq.com>
@linfeng-yuan
Copy link
Contributor Author

commit message should be updated

Done.

@ganyi1996ppo ganyi1996ppo merged commit bc546a9 into vllm-project:v0.9.1-dev Jun 26, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants