-
Notifications
You must be signed in to change notification settings - Fork 126
Feature Branch for AWQ Modifier #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. |
* Add PileEvalDataset to data/__init__.py and create pile.py dataset module * Some cleanup * Update src/llmcompressor/transformers/finetune/data/pile.py --------- Co-authored-by: Mark Kurtz <mark.kurtz@neuralmagic.com>
@rahul-tuli can we finalize the state for the AWQ modifiers and get them into a single PR? |
…ctions to helpers.py
Adds weight clipping similar to AutoAWQ to our implementation https://github.com/casper-hansen/AutoAWQ/blob/79547665bdb27768a9b392ef375776b020acbf0c/awq/quantize/quantizer.py#L176 Depends on: - [ ] #183
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@@ -1194,3 +1241,69 @@ def swap_modules( | |||
parent.__setattr__(sections[-1], submodule_to_replace) | |||
|
|||
return cur | |||
|
|||
|
|||
def pseudo_quantize_tensor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we please add a comment as to what this function's purpose/function is?
return w | ||
|
||
|
||
def clear_memory(value: Optional[Any] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is super not how the python garbage collector works
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
DEFAULT_AWQ_MAPPINGS = [ | ||
[["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"], "re:.*input_layernorm"], | ||
[["re:.*gate_proj", "re:.*up_proj"], "re:.*post_attention_layernorm"], | ||
[["re:.*down_proj"], "re:.*up_proj"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[["re:.*down_proj"], "re:.*up_proj"], | |
[["re:.*down_proj"], "re:.*up_proj"], | |
[["re:.*o_proj"], "re:.*v_proj"], |
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
SUMMARY: Addition of [`AWQModifier`](https://arxiv.org/pdf/2306.00978), based on [AutoAWQ implementation](https://github.com/casper-hansen/AutoAWQ/blob/main/awq/quantize/quantizer.py#L28). Should be reviewed/merged in conjunction with neuralmagic/compressed-tensors#269 Replaces #181 and #824 TEST PLAN: Some unit tests included, but as this was mostly a port from AutoAWQ, we validated the code by ensuring we could reproduce the evaluation metrics in Table 4 of [the paper](https://arxiv.org/pdf/2306.00978). We achieve the following wikitext PPL scores: Llama-2 7B Group 128: 1. Paper: 5.60 2. AutoAWQ: 5.615 3. This implementation: 5.612 4. we match what the paper reports for just RTN -- 5.73 5. We get reasonable results for channel-wise -- 6.788. AutoAWQ errors out for this (setting "q_group_size": -1 in the quant_config), and results not reported in paper. Llama-2 13B Group 128: 1. We match the results of AutoAWQ and the results shown in the paper: 4.97 2. We match what the paper reports for just RTN -- 4.984 NOTE: We are excluding the clipping logic in this implementation, if we want to add it we should add it as another modifier, they are mutually exclusive and the data model for AWQ doesn't align well with clipping. That might be the reason for the slight deviation of results reported in the paper and in our implementation --------- Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
We can close this now that #1177 is in |
This PR is a feature branch for AWQ Modifier, with smaller PRs stacked on top