Fix xnnpack export #2941

metascroy · 2025-09-04T21:20:03Z

The previous version of IntxUnpackedToInt8Tensor had a reshape operator after export. Although this does not change any numerics, it does interfere with XNNPACK's ability to recognize and lower the dynamic quantization pattern.

This PR removes the reshape operator, and adjusts the unit tests to look for 0 reshapes.

pytorch-bot · 2025-09-04T21:20:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2941

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit aa67cb3 with merge base 9d01b43 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

metascroy · 2025-09-04T21:21:07Z

torchao/quantization/quantize_/workflows/intx/intx_unpacked_to_int8_tensor.py

    assert isinstance(weight_tensor, IntxUnpackedToInt8Tensor)

    # Apply dynamic activation quant
    if weight_tensor.apply_int8_act_asym_per_token_quant:


@jerryzh168 do you think we should make apply_int8_act_asym_per_token_quant an enum instead of a boolean. It will make it easier to extend if we want in the future.

yeah sure, if you anticipate more cases then it makes sense for it to be a enum I think

I'll add just in case for future-proofing.

also apply_int8_act_asym_per_token_quant should be apply_int8_act_asym_per_token_quant_dequant I think

jerryzh168 · 2025-09-04T21:27:13Z

torchao/quantization/quantize_/workflows/intx/intx_unpacked_to_int8_tensor.py

        )


+def _fake_apply_int8_act_asym_per_token_quant(hp_tensor):


actually maybe a closer name for this is quantize_and_dequantize

to not confuse with fake_quant, which does not quantize to the real int values in quantize_affine

test/quantization/quantize_/workflows/intx/test_intx_unpacked_to_int8_tensor.py

jerryzh168

looks good!

jerryzh168 · 2025-09-04T21:35:43Z

torchao/quantization/quantize_/workflows/intx/intx_unpacked_to_int8_tensor.py

-            target_dtype=torch.int8,
-            mapping_type=MappingType.ASYMMETRIC,
-        ).dequantize()
+        input_tensor = _fake_apply_int8_act_asym_per_token_quant(input_tensor)


why is this changed to a function call btw?

choose_qparams returns a rank 0 tensor instead of a rank 1 when there is only 1 number, and this fails the assert in the constructor of IntxUnpackedToInt8Tensor (scale.shape == n_blocks).

why do you need this assert? if you pass around block_size to quantize/dequantize op, this will be handled correctly

So if a user constructs the tensor using from_hp, it's not a concern.

But the user can also construct the tensor directly by supplying qdata, scale, zero_point and block_size separately, and in that case I think it makes sense to assert

jerryzh168 · 2025-09-04T22:51:55Z

torchao/quantization/quantize_/workflows/intx/intx_unpacked_to_int8_tensor.py

 _FLOAT_TYPES: List[torch.dtype] = [torch.float16, torch.bfloat16, torch.float32]


+class ActivationQuantization(enum.Enum):


you can use ActivationQuantization(str, Enum) to avoid checking for str btw

example

ao/torchao/quantization/quantize_/common/kernel_preference.py

Lines 12 to 14 in b34c103

# can switch to StrEnum (https://docs.python.org/3/library/enum.html#enum.StrEnum)

# after python 3.10 is end of life (https://devguide.python.org/versions/)

class KernelPreference(str, Enum):

jerryzh168 · 2025-09-04T22:56:05Z

torchao/quantization/quantize_/workflows/intx/intx_opaque_tensor.py

 from torchao.quantization.quant_primitives import _DTYPE_TO_BIT_WIDTH
 from torchao.quantization.quantize_.workflows.intx.intx_unpacked_to_int8_tensor import (
    IntxUnpackedToInt8Tensor,
+    ActivationQuantization,


naming is a bit too general I feel, how about:

IntxUnpackedToInt8ActivationQuantization

we can move this to intx folder and rename to IntxActivationQuantization if this enum can be reused by other intx tensors

metascroy added 3 commits September 4, 2025 13:00

Fix int8_unpacked for xnnpack export

8d9e77a

up

75134f3

up

f6b9888

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 4, 2025

metascroy requested a review from jerryzh168 September 4, 2025 21:20

metascroy commented Sep 4, 2025

View reviewed changes

jerryzh168 reviewed Sep 4, 2025

View reviewed changes

test/quantization/quantize_/workflows/intx/test_intx_unpacked_to_int8_tensor.py Show resolved Hide resolved

jerryzh168 approved these changes Sep 4, 2025

View reviewed changes

jerryzh168 reviewed Sep 4, 2025

View reviewed changes

up

e3de220

jerryzh168 reviewed Sep 4, 2025

View reviewed changes

metascroy added 2 commits September 4, 2025 16:38

up

5dd71f9

up

aa67cb3

jerryzh168 added the topic: bug fix Use this tag for PRs that fix bugs label Sep 5, 2025

metascroy merged commit 2c18dcb into main Sep 5, 2025
18 of 19 checks passed

		_FLOAT_TYPES: List[torch.dtype] = [torch.float16, torch.bfloat16, torch.float32]


		class ActivationQuantization(enum.Enum):

	# can switch to StrEnum (https://docs.python.org/3/library/enum.html#enum.StrEnum)
	# after python 3.10 is end of life (https://devguide.python.org/versions/)
	class KernelPreference(str, Enum):

		)


		def _fake_apply_int8_act_asym_per_token_quant(hp_tensor):

Fix xnnpack export #2941

Fix xnnpack export #2941

Uh oh!

Conversation

metascroy commented Sep 4, 2025

Uh oh!

pytorch-bot bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2941

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Sep 4, 2025 •

edited

Loading

jerryzh168 Sep 4, 2025 •

edited

Loading