-
Notifications
You must be signed in to change notification settings - Fork 267
GPTQ updates #2235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
HDCharles
wants to merge
1
commit into
main
Choose a base branch
from
098_gptq
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
GPTQ updates #2235
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2235
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New FailuresAs of commit 4b44d67 with merge base 446f07d ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
andrewor14
reviewed
May 22, 2025
913bf3c
to
956e16c
Compare
Summary: 1) reorganized GPTQ a) got rid of old GPTQ and renamed GPTQ_MT to GPTQ b) moved new GPTQ to prototype c) moved quantized linear modules in GPTQ.py to linear_quant_modules.py 2) removed dependence on lm_eval for input_recorder a) created new input recorder that doesn't depend on lm_eval b) made lm_eval input recorder depend on new generic input_recorder c) made TransformerEvalWrapper the base class and made d) updated apis generally to work with new input recorder LMEvalInputRecorder inherit from it instead of vice-versa 3) reorganized GPTQ tests a) moved tests from test_quant_api.py to test_gptq.py b) added new test that can be run in CI that doesn't depend on lm_eval/llama weights c) got rid of test_gptq_mt.py 4) added new documentation for lm_eval 5) GPTQ improvements a) reimplemented faster quant b) tested compilation of hessian calculation and parts of faster quant, generally they were slower. c) moved helper functions out of the class. They're largely generic and this is less cluttered. d) some improvements to the duplication checking and copying to be faster when possible e) fixed some bugs due to this not being in CI and things changing for int4wo tensor subclass. Test Plan: 1) `python test_gptq.py` note: the skipped test test_gptq_quantizer_int4_weight_only also ran. 2) I verified that all activation match between old GPTQ and current GPTQ 3) ```shell export CHECKPOINT_PATH=../../../checkpoints # path to checkpoints folder export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64 python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-gptq-64 --calibration_limit 10 export MODEL_REPO=meta-llama/Meta-Llama-3-8B python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64 python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-gptq-64 --calibration_limit 10 ``` see README.md for results but they show GPTQ is working Reviewers: Subscribers: Tasks: Tags:
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
topic: improvement
Use this tag if this PR is an improvement (doesn't fit into any of the other categories)
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
a) got rid of old GPTQ and renamed GPTQ_MT to GPTQ
b) moved new GPTQ to prototype
c) moved quantized linear modules in GPTQ.py to linear_quant_modules.py
a) created new input recorder that doesn't depend on lm_eval
b) made lm_eval input recorder depend on new generic input_recorder
c) made TransformerEvalWrapper the base class and made LMEvalInputRecorder inherit from it (rather than vice versa like before)
d) updated apis generally to work with new input recorder (inputs have to be passed in as though they were being passed into the model, so input_recorder(*args) rather than input_recorder(args)
a) moved tests from test_quant_api.py to test_gptq.py
b) added new test that can be run in CI that doesn't depend on
c) removed all the 8aw4 tests that we never got working (is this fine?)
lm_eval/llama weights
c) got rid of test_gptq_mt.py (consolidated into test_gptq where relevant.
a) new readme and eval benchmarks for GPTQ
b) comments in GPTQ.py
a) tested compilation of hessian calculation and parts of faster quant,
generally they were slower or buggy. Possible to get a speedup but it was inconsistent so removed it.
b) reimplemented faster quant while trying to compile it(improved speed by 2-5% and code is clearer while trying to compile pasrts)
c) moved helper functions out of the class. They're largely generic and
this is less cluttered. May need to revisit how generic they are if new GPTQQuantizers are made.
d) some improvements to the duplication checking and copying to be
faster when possible (previously MultiTensor.unpad had checks but at the point that its called, we already did equality checks so we don't need them in unpad.
e) fixed some bugs due to this not being in CI and things changing for
int4wo tensor subclass.
a) got rid of Int8DynActInt4WeightGPTQQuantizer since it was unused. Can re-add if desired.
b) for other imports, maintained BC, previous impoirts from quantization/GPTQ.py now go to quantization/GPTQ/init.py
c) InputRecorder -> LMEvalInputRecorder but left BC import in as an option.
Test Plan:
python test_gptq.py
note1: the skipped test test_gptq_quantizer_int4_weight_only also ran.
note2: we now have a CI ready test in test_gptq using the generic input recorder.
I verified that all activation match between old GPTQ (non MT) and current
GPTQ, this can be seen by passing the test_gptq_quantizer_int4_weight_only as mentioned above but was also verified by comparing debug outputs and printing activation values for first 3 multi tensors.
eval benchmarks:
see README.md for results but they show GPTQ is working
Reviewers:
Subscribers:
Tasks:
Tags: