Skip to content

AWQ Modifier #1177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Apr 21, 2025
Merged

AWQ Modifier #1177

merged 41 commits into from
Apr 21, 2025

Conversation

brian-dellabetta
Copy link
Collaborator

@brian-dellabetta brian-dellabetta commented Feb 19, 2025

SUMMARY:
Addition of AWQModifier, based on AutoAWQ implementation.

Should be reviewed/merged in conjunction with neuralmagic/compressed-tensors#269

Replaces #181 and #824

TEST PLAN:
Some unit tests included, but as this was mostly a port from AutoAWQ, we validated the code by ensuring we could reproduce the evaluation metrics in Table 4 of the paper. We achieve the following wikitext PPL scores:

Llama-2 7B Group 128:

  1. Paper: 5.60
  2. AutoAWQ: 5.615
  3. This implementation: 5.612
  4. we match what the paper reports for just RTN -- 5.73
  5. We get reasonable results for channel-wise -- 6.788. AutoAWQ errors out for this (setting "q_group_size": -1 in the quant_config), and results not reported in paper.

Llama-2 13B Group 128:

  1. We match the results of AutoAWQ and the results shown in the paper: 4.97
  2. We match what the paper reports for just RTN -- 4.984

NOTE: We are excluding the clipping logic in this implementation, if we want to add it we should add it as another modifier, they are mutually exclusive and the data model for AWQ doesn't align well with clipping. That might be the reason for the slight deviation of results reported in the paper and in our implementation

Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@brian-dellabetta brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch 2 times, most recently from 9273ef3 to 28f8bca Compare February 20, 2025 17:27
@brian-dellabetta brian-dellabetta changed the title Bdellabe/awq modifier v3 Bdellabe/Rtuli awq modifier v3 Mar 10, 2025
@brian-dellabetta brian-dellabetta marked this pull request as ready for review March 10, 2025 21:45
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add evals comparing to GPTQ?

@brian-dellabetta
Copy link
Collaborator Author

Using the latest commit at this time, I am getting the following results via lm-eval.

deepseek-ai/DeepSeek-R1-Distill-Llama-8B:
 dense:
   #gsm flexible-extract, strict-match
   gsm8k: .6619, .6490
   wititext ppl: 15.4498
 awq+quant sym:
   gsm8k: .6376, .6217
   wititext ppl: 18.8623
 quant sym:
   gsm8k: .6732, .6543
   wititext ppl: 16.7398
meta-llama/Llama-2-7b-hf:
 dense:
   gsm8k: .1342, .1342
   wititext ppl: 8.7587
 awq+quant sym:
   gsm8k: .1024, .1001
   wititext ppl: 9.194
 quant sym:
   gsm8k: .1183, .1152
   wititext ppl: 9.311

@dsikka dsikka changed the title Bdellabe/Rtuli awq modifier v3 AWQ Modifier Support Mar 25, 2025
@brian-dellabetta brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch 2 times, most recently from 9168743 to 21fc931 Compare April 1, 2025 20:23
@brian-dellabetta
Copy link
Collaborator Author

brian-dellabetta commented Apr 2, 2025

Comparing AWQ vs. GPTQ vs. RTN for meta-llama/Llama-2-7b-hf, using example script:

Type gsm8k wikitext
FP16 .1395, .1387 8.7521
AWQ ASYM .1281, .1274 9.0281
GPTQ ASYM .1312, .1296 9.1954
AWQ+GPTQ ASYM .1251, .1221 9.1449
RTN ASYM .1198, .1190 9.2098
AWQ SYM .1069, .1054 9.1931
GPTQ SYM .1046, .1039 9.3525
AWQ+GPTQ SYM .0955, .0925 9.4326
RTN SYM .1183, .1152 9.3114

@brian-dellabetta brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch from 21fc931 to 03f7546 Compare April 2, 2025 20:20
@brian-dellabetta brian-dellabetta added the ready When a PR is ready for review label Apr 3, 2025
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@brian-dellabetta brian-dellabetta requested a review from dsikka April 15, 2025 15:06
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
dsikka
dsikka previously approved these changes Apr 17, 2025
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove/fix the example, otherwise LGTM

dsikka
dsikka previously approved these changes Apr 18, 2025
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix quality

@brian-dellabetta brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch from 4b3325c to dd163b0 Compare April 18, 2025 19:31
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@brian-dellabetta brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch from dd163b0 to d1d3766 Compare April 18, 2025 19:40
@dsikka dsikka merged commit 549b42a into main Apr 21, 2025
8 checks passed
@dsikka dsikka deleted the bdellabe/awq-modifier-v3 branch April 21, 2025 14:50
rahul-tuli added a commit that referenced this pull request May 2, 2025
This PR updates the main README.md to introduce a "New Features"
section, improving visibility for recent major additions to LLM
Compressor.

This section highlights:

- Axolotl Sparse Finetuning Integration
(https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor)
- AutoAWQ Integration for low-bit weight quantization (#1177)
- Day 0 Llama 4 support and its use by Meta
This helps users quickly understand the latest capabilities of the
library.

---------

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
kylesayrs pushed a commit that referenced this pull request May 4, 2025
This PR updates the main README.md to introduce a "New Features"
section, improving visibility for recent major additions to LLM
Compressor.

This section highlights:

- Axolotl Sparse Finetuning Integration
(https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor)
- AutoAWQ Integration for low-bit weight quantization (#1177)
- Day 0 Llama 4 support and its use by Meta
This helps users quickly understand the latest capabilities of the
library.

---------

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants