-
Notifications
You must be signed in to change notification settings - Fork 720
Fix dequantize_affine before iOS18 #2589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@YifanShenSZ can you have a look? |
Hey Scott, first to confirm your use case: Do you want weight-only quantization, or activation quantization? Some background about coreml ops that might be helpful:
|
@YifanShenSZ this is for weight-only quantization, which is represented as dequantize_affine (applied to the weights) followed by linear/embedding. The existing registration works for iOS18 (blockwise scale shift), but has a bug in affine_dequantize (iOS16). The bug is the axis is incorrect. This doesn't affect iOS18 because axis is not used in blockwise scale shift.
I assume you mean "quantize" here is used for activation quantization? |
I see. So in this case, you will want
Yeah I mean |
Yes, but the function _utils._construct_constexpr_dequant_op already does this translation. I'm just fixing the arg passed to it for axis (axis is only used by _utils._construct_constexpr_dequant_op before iOS18, where it translates the op to constexpr_affine_dequantize; it is not used for iOS18, where the op gets translated to constexpr_blockwise_shift_scale). The CI only tests on iOS18, so I fix the arg for axis and add a test for iOS16. |
Got it, then yes this is correct. Let's hear what CI has to say |
@pytest.mark.skipif(not _HAS_TORCHAO, reason=MSG_TORCHAO_NOT_FOUND) | ||
@pytest.mark.parametrize( | ||
"compute_unit, has_zeros", | ||
itertools.product(compute_units, [True, False], [ct.target.IOS16, ct.target.IOS17]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lower case "i", i.e. iOS
|
||
@pytest.mark.skipif(not _HAS_TORCHAO, reason=MSG_TORCHAO_NOT_FOUND) | ||
@pytest.mark.parametrize( | ||
"compute_unit, has_zeros", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you might missed minimum_deployment_target
?
@metascroy please rebase on top of latest main + address comment, then let's try CI again |
because we support torch up to 2.7.1 😮💨 please rebase on top of latest main to pick up #2591 |
9721717
to
fb4eb52
Compare
rebased |
dequantize_affine does not work correctly before iOS18, when the op gets translated to constexpr_affine_dequantize instead of constexpr_blockwise_shift_scale.
This is because the axis should be None (default value) instead of -1. In this case, the axis gets inferred from the shape of scales.
This is similar to the change in the ExecuTorch PR: pytorch/executorch#13896