-
Notifications
You must be signed in to change notification settings - Fork 157
AWQModifier fast resolve mappings, better logging, MoE support #1444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
83333a6
to
63012b4
Compare
6749f77
to
661454f
Compare
ba26683
to
0ec8e0e
Compare
f3d9f10
to
ac26dbf
Compare
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
ac26dbf
to
75a1602
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many nits, otherwise looks good
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
96d5f59
https://github.com/vllm-project/llm-compressor/blob/ceffa644072b1d440df3d99b0f98f6416a05bf2f/examples/awq/qwen3_moe_example.py#L55~L56 |
@Chao-Xue , yes that is for other qwen MoE architectures where there is a shared expert that we ignore. not all qwen MoE models have that though |
SUMMARY:
In AWQ, resolving mappings can take a while because it is traversing the entire model tree, rather than just the parent, to find the balance layers. This scopes the search to just the parent module. For MoE models, the previous implementation only found a single layer for each regex string provided in mappings. This updates that to find as many as it can, which is necessary for mappings like
which have multiple gate_proj and up_proj layers, one for each expert.
gsm8k results with
Qwen/Qwen3-30B-A3B
MoE model after AWQ W4A16 Symmetric:TEST PLAN:
Qwen/Qwen3-30B-A3B
with same set of mappings used in AutoAWQ. Example included in this PR inexamples/awq/qwen3_moe_example.py
. Ran successfully in ~2 hours on a single H100 with ~70GB of 80GB used (additional memory needed during saving)main
formeta-llama/Llama-3.2-3B-Instruct