Skip to content

Conversation

jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Sep 5, 2025

Summary:
Current Int4WeightOnlyConfig has version 1 and 2, and default is 1, this PR

Deprecation Note:

We updated the implementation for int4 Tensor, so bumps the default version from 1 to 2 for these two configs.

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "torchao-testing/opt-125m-Int4WeightOnlyConfig-v1-0.14.dev"
quantized_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="bfloat16",
    device_map="cuda",
)

/data/users/jerryzh/ao/torchao/core/config.py:250: UserWarning: Stored version is not the same as current default version of the config: stored_version=1, current_default_version=2, please check the deprecation warning
  warnings.warn(
/data/users/jerryzh/ao/torchao/dtypes/uintx/tensor_core_tiled_layout.py:241: UserWarning: Models quantized with version 1 of Int4WeightOnlyConfig is deprecated and will no longer be supported in a future release, please upgrade torchao and quantize again, or download a newer torchao checkpoint, see https://github.com/pytorch/ao/issues/2948 for more details
  warnings.warn(

Suggestion: upgrade torchao to 0.14 and later and generate the checkpoint again:

quantize_(model, Int4WeightOnlyConfig(group_size=128))

Or download the checkpoint again (please let us know if the checkpoint is not updated)

Please see #2948 for more details around the deprecation.

Test Plan:
Regression tests:
python test/dtypes/test_affine_quantized.py
python test/quantization/test_quant_api.py
python test/quantization/quantize_/workflows/int4/test_int4_marlin_sparse_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_opaque_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py
python test/integration/test_load_and_run_checkpoint.py

Reviewers:

Subscribers:

Tasks:

Tags:

Copy link

pytorch-bot bot commented Sep 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2949

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d2168f2 with merge base c452495 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 5, 2025
@jerryzh168 jerryzh168 added the topic: deprecation Use this tag if this PR deprecates a feature label Sep 8, 2025
jerryzh168 added a commit that referenced this pull request Sep 8, 2025
Summary:
This is in preparation for version bump in #2949

added version=1 for both `int4_weight_only` and `Int4WeightOnlyConfig`

Test Plan:
regression tests with CI

Reviewers:

Subscribers:

Tasks:

Tags:
jerryzh168 added a commit that referenced this pull request Sep 8, 2025
Summary:
This is in preparation for version bump in #2949

added version=1 for both `int4_weight_only` and `Int4WeightOnlyConfig`

Test Plan:
regression tests with CI

Reviewers:

Subscribers:

Tasks:

Tags:
@jerryzh168 jerryzh168 force-pushed the bump-int4-version branch 2 times, most recently from edca31d to 5301a7e Compare September 8, 2025 23:17
@jerryzh168 jerryzh168 marked this pull request as ready for review September 8, 2025 23:21
@jerryzh168 jerryzh168 changed the title Bump int4 weight only config version to 2 Bump Int4WeightOnlyConfig version to 2 Sep 8, 2025
_int4_quant_code = """
from torchao.quantization import Int4WeightOnlyConfig
quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq", version=2)
quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called int4_packing_format now, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's true, I have updated locally, will push change together with other things

_int4_quant_code = """
from torchao.quantization import Int4WeightOnlyConfig
quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq", version=2)
quant_config = Int4WeightOnlyConfig(group_size=128, packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just found we also need to update packing_format to int4_packing_format I have made change locally, can push these changes before land.

@metascroy
Copy link
Contributor

Should you import to fbcode to see if you break any internal tests?

@facebook-github-bot
Copy link
Contributor

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this in D81985661.

@jerryzh168
Copy link
Contributor Author

looks like there are some conflicts in importing, I'll unlink and merge for now, will rely on diff train

Summary:
Current Int4WeightOnlyConfig has version 1 and 2, and default is 1, this PR changes the default to 2
and made modification to callsites.
For the Int4WeightOnlyConfig that's using the old configuration, we added explicit `version=1`, we can migrate the callsite to
use the version 2 separately

For READMEs we migrate the usage to version 2 directly

Deprecation: TODO

Test Plan:
Regression tests:
python test/dtypes/test_affine_quantized.py
python test/quantization/test_quant_api.py
python test/quantization/quantize_/workflows/int4/test_int4_marlin_sparse_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_opaque_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py

Reviewers:

Subscribers:

Tasks:

Tags:
@jerryzh168 jerryzh168 merged commit b10876b into main Sep 9, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: deprecation Use this tag if this PR deprecates a feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants