Skip to content

Awq re implementation #824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Awq re implementation #824

wants to merge 2 commits into from

Conversation

rahul-tuli
Copy link
Collaborator

SUMMARY:
"please provide a brief summary"

TEST PLAN:
"please outline how the changes were tested"

Copy link

github-actions bot commented Oct 7, 2024

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@kylesayrs
Copy link
Collaborator

Note that we should update the list of supported algorithms in the readme when this lands

@brian-dellabetta brian-dellabetta mentioned this pull request Feb 19, 2025
dsikka pushed a commit that referenced this pull request Apr 21, 2025
SUMMARY:
Addition of [`AWQModifier`](https://arxiv.org/pdf/2306.00978), based on
[AutoAWQ
implementation](https://github.com/casper-hansen/AutoAWQ/blob/main/awq/quantize/quantizer.py#L28).

Should be reviewed/merged in conjunction with
neuralmagic/compressed-tensors#269

Replaces #181 and #824 

TEST PLAN:
Some unit tests included, but as this was mostly a port from AutoAWQ, we
validated the code by ensuring we could reproduce the evaluation metrics
in Table 4 of [the paper](https://arxiv.org/pdf/2306.00978). We achieve
the following wikitext PPL scores:

Llama-2 7B Group 128:
1. Paper: 5.60
2. AutoAWQ: 5.615
3. This implementation: 5.612
4. we match what the paper reports for just RTN -- 5.73
5. We get reasonable results for channel-wise -- 6.788. AutoAWQ errors
out for this (setting "q_group_size": -1 in the quant_config), and
results not reported in paper.

Llama-2 13B Group 128:
1. We match the results of AutoAWQ and the results shown in the paper:
4.97
2. We match what the paper reports for just RTN -- 4.984

NOTE: We are excluding the clipping logic in this implementation, if we
want to add it we should add it as another modifier, they are mutually
exclusive and the data model for AWQ doesn't align well with clipping.
That might be the reason for the slight deviation of results reported in
the paper and in our implementation

---------

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@dsikka dsikka closed this Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants