Skip to content

Conversation

metascroy
Copy link
Contributor

dequantize_affine does not work correctly before iOS18, when the op gets translated to constexpr_affine_dequantize instead of constexpr_blockwise_shift_scale.

This is because the axis should be None (default value) instead of -1. In this case, the axis gets inferred from the shape of scales.

This is similar to the change in the ExecuTorch PR: pytorch/executorch#13896

@metascroy
Copy link
Contributor Author

@YifanShenSZ can you have a look?

@YifanShenSZ
Copy link
Collaborator

Hey Scott, first to confirm your use case: Do you want weight-only quantization, or activation quantization?

Some background about coreml ops that might be helpful:

  1. The constexpr_ ops are for weight-only quantization, that's why they are "const expression"s: the quantized weight are known at compile time
  2. Among const expr ops, Blockwise shift scale is new in iOS18, affine dequantize was new in iOS16
  3. Dequantize is for activation quantization, and was new in iOS17

@metascroy
Copy link
Contributor Author

@YifanShenSZ this is for weight-only quantization, which is represented as dequantize_affine (applied to the weights) followed by linear/embedding.

The existing registration works for iOS18 (blockwise scale shift), but has a bug in affine_dequantize (iOS16). The bug is the axis is incorrect. This doesn't affect iOS18 because axis is not used in blockwise scale shift.

Dequantize is for activation quantization, and was new in iOS17

I assume you mean "quantize" here is used for activation quantization?

@YifanShenSZ
Copy link
Collaborator

@YifanShenSZ this is for weight-only quantization, which is represented as dequantize_affine (applied to the weights) followed by linear/embedding.

The existing registration works for iOS18 (blockwise scale shift), but has a bug in affine_dequantize (iOS16). The bug is the axis is incorrect. This doesn't affect iOS18 because axis is not used in blockwise scale shift.

I see. So in this case, you will want

  1. For iOS 18 and higher, translate to constexpr_blockwise_shift_scale
  2. For iOS 16 and 17, translate to constexpr_affine_dequantize
  3. For iOS 15 and earlier, error out

Dequantize is for activation quantization, and was new in iOS17

I assume you mean "quantize" here is used for activation quantization?

Yeah I mean quantize and dequantize, which are for activation quantization (i.e. quantize / dequantize variables) and introduced in iOS 17

@metascroy
Copy link
Contributor Author

metascroy commented Sep 3, 2025

@YifanShenSZ this is for weight-only quantization, which is represented as dequantize_affine (applied to the weights) followed by linear/embedding.
The existing registration works for iOS18 (blockwise scale shift), but has a bug in affine_dequantize (iOS16). The bug is the axis is incorrect. This doesn't affect iOS18 because axis is not used in blockwise scale shift.

I see. So in this case, you will want

  1. For iOS 18 and higher, translate to constexpr_blockwise_shift_scale
  2. For iOS 16 and 17, translate to constexpr_affine_dequantize
  3. For iOS 15 and earlier, error out

Dequantize is for activation quantization, and was new in iOS17

I assume you mean "quantize" here is used for activation quantization?

Yeah I mean quantize and dequantize, which are for activation quantization (i.e. quantize / dequantize variables) and introduced in iOS 17

Yes, but the function _utils._construct_constexpr_dequant_op already does this translation. I'm just fixing the arg passed to it for axis (axis is only used by _utils._construct_constexpr_dequant_op before iOS18, where it translates the op to constexpr_affine_dequantize; it is not used for iOS18, where the op gets translated to constexpr_blockwise_shift_scale). The CI only tests on iOS18, so I fix the arg for axis and add a test for iOS16.

@YifanShenSZ
Copy link
Collaborator

YifanShenSZ commented Sep 4, 2025

Got it, then yes this is correct. Let's hear what CI has to say

CI 🔴 https://gitlab.com/coremltools1/coremltools/-/commit/c47235b01c27d101c373ce06b03f1758d3457557/pipelines

@pytest.mark.skipif(not _HAS_TORCHAO, reason=MSG_TORCHAO_NOT_FOUND)
@pytest.mark.parametrize(
"compute_unit, has_zeros",
itertools.product(compute_units, [True, False], [ct.target.IOS16, ct.target.IOS17]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lower case "i", i.e. iOS

@YifanShenSZ
Copy link
Collaborator

YifanShenSZ commented Sep 5, 2025


@pytest.mark.skipif(not _HAS_TORCHAO, reason=MSG_TORCHAO_NOT_FOUND)
@pytest.mark.parametrize(
"compute_unit, has_zeros",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you might missed minimum_deployment_target?

@YifanShenSZ
Copy link
Collaborator

@metascroy please rebase on top of latest main + address comment, then let's try CI again

@YifanShenSZ
Copy link
Collaborator

CI 🔴 https://gitlab.com/coremltools1/coremltools/-/commit/30b74a15f3f3d6fef6b7ffbbfb1b79b2fcb0bfb6/pipelines

because we support torch up to 2.7.1 😮‍💨 please rebase on top of latest main to pick up #2591

@metascroy metascroy force-pushed the fix-torchao-dequant-ios16 branch from 9721717 to fb4eb52 Compare September 25, 2025 20:33
@metascroy
Copy link
Contributor Author

CI 🔴 https://gitlab.com/coremltools1/coremltools/-/commit/30b74a15f3f3d6fef6b7ffbbfb1b79b2fcb0bfb6/pipelines

because we support torch up to 2.7.1 😮‍💨 please rebase on top of latest main to pick up #2591

rebased

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants