Skip to content

Feature Branch for AWQ Modifier #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed

Feature Branch for AWQ Modifier #181

wants to merge 13 commits into from

Conversation

rahul-tuli
Copy link
Collaborator

@rahul-tuli rahul-tuli commented Sep 18, 2024

This PR is a feature branch for AWQ Modifier, with smaller PRs stacked on top


Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@rahul-tuli rahul-tuli marked this pull request as ready for review September 18, 2024 14:18
@rahul-tuli rahul-tuli self-assigned this Sep 18, 2024
* Add PileEvalDataset to data/__init__.py and create pile.py dataset module

* Some cleanup

* Update src/llmcompressor/transformers/finetune/data/pile.py

---------

Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com>
@markurtz
Copy link
Collaborator

@rahul-tuli can we finalize the state for the AWQ modifiers and get them into a single PR?

markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024
@@ -1194,3 +1241,69 @@ def swap_modules(
parent.__setattr__(sections[-1], submodule_to_replace)

return cur


def pseudo_quantize_tensor(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please add a comment as to what this function's purpose/function is?

return w


def clear_memory(value: Optional[Any] = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is super not how the python garbage collector works

brian-dellabetta added a commit that referenced this pull request Feb 18, 2025
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@brian-dellabetta brian-dellabetta mentioned this pull request Feb 19, 2025
DEFAULT_AWQ_MAPPINGS = [
[["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"], "re:.*input_layernorm"],
[["re:.*gate_proj", "re:.*up_proj"], "re:.*post_attention_layernorm"],
[["re:.*down_proj"], "re:.*up_proj"],
Copy link
Collaborator Author

@rahul-tuli rahul-tuli Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[["re:.*down_proj"], "re:.*up_proj"],
[["re:.*down_proj"], "re:.*up_proj"],
[["re:.*o_proj"], "re:.*v_proj"],

brian-dellabetta added a commit that referenced this pull request Mar 19, 2025
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
brian-dellabetta added a commit that referenced this pull request Mar 26, 2025
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
brian-dellabetta added a commit that referenced this pull request Apr 1, 2025
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
brian-dellabetta added a commit that referenced this pull request Apr 2, 2025
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
brian-dellabetta added a commit that referenced this pull request Apr 10, 2025
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
brian-dellabetta added a commit that referenced this pull request Apr 13, 2025
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
dsikka pushed a commit that referenced this pull request Apr 21, 2025
SUMMARY:
Addition of [`AWQModifier`](https://arxiv.org/pdf/2306.00978), based on
[AutoAWQ
implementation](https://github.com/casper-hansen/AutoAWQ/blob/main/awq/quantize/quantizer.py#L28).

Should be reviewed/merged in conjunction with
neuralmagic/compressed-tensors#269

Replaces #181 and #824 

TEST PLAN:
Some unit tests included, but as this was mostly a port from AutoAWQ, we
validated the code by ensuring we could reproduce the evaluation metrics
in Table 4 of [the paper](https://arxiv.org/pdf/2306.00978). We achieve
the following wikitext PPL scores:

Llama-2 7B Group 128:
1. Paper: 5.60
2. AutoAWQ: 5.615
3. This implementation: 5.612
4. we match what the paper reports for just RTN -- 5.73
5. We get reasonable results for channel-wise -- 6.788. AutoAWQ errors
out for this (setting "q_group_size": -1 in the quant_config), and
results not reported in paper.

Llama-2 13B Group 128:
1. We match the results of AutoAWQ and the results shown in the paper:
4.97
2. We match what the paper reports for just RTN -- 4.984

NOTE: We are excluding the clipping logic in this implementation, if we
want to add it we should add it as another modifier, they are mutually
exclusive and the data model for AWQ doesn't align well with clipping.
That might be the reason for the slight deviation of results reported in
the paper and in our implementation

---------

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@brian-dellabetta
Copy link
Collaborator

We can close this now that #1177 is in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants