-
Notifications
You must be signed in to change notification settings - Fork 125
AWQ Modifier #1177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWQ Modifier #1177
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
9273ef3
to
28f8bca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add evals comparing to GPTQ?
8120fe5
to
d76ba6d
Compare
Using the latest commit at this time, I am getting the following results via lm-eval.
|
9168743
to
21fc931
Compare
Comparing AWQ vs. GPTQ vs. RTN for
|
21fc931
to
03f7546
Compare
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove/fix the example, otherwise LGTM
4599d39
to
4b3325c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fix quality
4b3325c
to
dd163b0
Compare
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
dd163b0
to
d1d3766
Compare
This PR updates the main README.md to introduce a "New Features" section, improving visibility for recent major additions to LLM Compressor. This section highlights: - Axolotl Sparse Finetuning Integration (https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor) - AutoAWQ Integration for low-bit weight quantization (#1177) - Day 0 Llama 4 support and its use by Meta This helps users quickly understand the latest capabilities of the library. --------- Signed-off-by: Rahul Tuli <rtuli@redhat.com>
This PR updates the main README.md to introduce a "New Features" section, improving visibility for recent major additions to LLM Compressor. This section highlights: - Axolotl Sparse Finetuning Integration (https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor) - AutoAWQ Integration for low-bit weight quantization (#1177) - Day 0 Llama 4 support and its use by Meta This helps users quickly understand the latest capabilities of the library. --------- Signed-off-by: Rahul Tuli <rtuli@redhat.com>
SUMMARY:
Addition of
AWQModifier
, based on AutoAWQ implementation.Should be reviewed/merged in conjunction with neuralmagic/compressed-tensors#269
Replaces #181 and #824
TEST PLAN:
Some unit tests included, but as this was mostly a port from AutoAWQ, we validated the code by ensuring we could reproduce the evaluation metrics in Table 4 of the paper. We achieve the following wikitext PPL scores:
Llama-2 7B Group 128:
Llama-2 13B Group 128:
NOTE: We are excluding the clipping logic in this implementation, if we want to add it we should add it as another modifier, they are mutually exclusive and the data model for AWQ doesn't align well with clipping. That might be the reason for the slight deviation of results reported in the paper and in our implementation