-
Notifications
You must be signed in to change notification settings - Fork 349
Bump Int4WeightOnlyConfig
version to 2
#2949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2949
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d2168f2 with merge base c452495 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
2341ca6
to
e00fe75
Compare
e00fe75
to
0ad98af
Compare
Summary: This is in preparation for version bump in #2949 added version=1 for both `int4_weight_only` and `Int4WeightOnlyConfig` Test Plan: regression tests with CI Reviewers: Subscribers: Tasks: Tags:
Summary: This is in preparation for version bump in #2949 added version=1 for both `int4_weight_only` and `Int4WeightOnlyConfig` Test Plan: regression tests with CI Reviewers: Subscribers: Tasks: Tags:
edca31d
to
5301a7e
Compare
5301a7e
to
9364280
Compare
Int4WeightOnlyConfig
version to 2
_int4_quant_code = """ | ||
from torchao.quantization import Int4WeightOnlyConfig | ||
quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq", version=2) | ||
quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's called int4_packing_format now, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's true, I have updated locally, will push change together with other things
_int4_quant_code = """ | ||
from torchao.quantization import Int4WeightOnlyConfig | ||
quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq", version=2) | ||
quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just found we also need to update packing_format to int4_packing_format
I have made change locally, can push these changes before land.
Should you import to fbcode to see if you break any internal tests? |
9364280
to
7e8b47d
Compare
@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this in D81985661. |
looks like there are some conflicts in importing, I'll unlink and merge for now, will rely on diff train |
Summary: Current Int4WeightOnlyConfig has version 1 and 2, and default is 1, this PR changes the default to 2 and made modification to callsites. For the Int4WeightOnlyConfig that's using the old configuration, we added explicit `version=1`, we can migrate the callsite to use the version 2 separately For READMEs we migrate the usage to version 2 directly Deprecation: TODO Test Plan: Regression tests: python test/dtypes/test_affine_quantized.py python test/quantization/test_quant_api.py python test/quantization/quantize_/workflows/int4/test_int4_marlin_sparse_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_opaque_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py Reviewers: Subscribers: Tasks: Tags:
7e8b47d
to
d2168f2
Compare
Summary:
Current Int4WeightOnlyConfig has version 1 and 2, and default is 1, this PR
version=1
, we can migrate the callsite to use the version 2 separately (note this is done in Add version=1 for calls to int4 weight only config #2958)Deprecation Note:
We updated the implementation for int4 Tensor, so bumps the default version from 1 to 2 for these two configs.
Suggestion: upgrade torchao to 0.14 and later and generate the checkpoint again:
Or download the checkpoint again (please let us know if the checkpoint is not updated)
Please see #2948 for more details around the deprecation.
Test Plan:
Regression tests:
python test/dtypes/test_affine_quantized.py
python test/quantization/test_quant_api.py
python test/quantization/quantize_/workflows/int4/test_int4_marlin_sparse_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_opaque_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py
python test/integration/test_load_and_run_checkpoint.py
Reviewers:
Subscribers:
Tasks:
Tags: