-
Notifications
You must be signed in to change notification settings - Fork 311
skip quant/dequant decomposed #2299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2299
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 1 Cancelled JobAs of commit 0eedbf3 with merge base 1239842 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @shiyang-weng! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
||
|
||
_quantize_affine, _quantize_affine_meta = register_custom_op_with_meta( | ||
_quantize_affine, _quantize_affine_meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we separate the registrations for meta and non-meta device and also use them as decorators instead calling it? Then it would be better aligned with the rest of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we separate the registrations for meta and non-meta device and also use them as decorators instead calling it? Then it would be better aligned with the rest of the file.
Done
|
||
lib_namespace = lib.ns | ||
op = getattr(getattr(torch.ops, lib_namespace), op_name) | ||
register_decomposition([op])(fn) | ||
if dispatch_key == "CompositeImplicitAutograd": | ||
register_decomposition([op])(fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feeling adding a flag in _register_custom_op
to indicate whether this op should be decomposed will be clearer, cc @jerryzh168 what's your suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I feel adding a flag might be easier here
…ay is to register torchao.dequantize_affine.default but fail
@implements(torch.ops.torchao.dequantize_affine) | ||
def _(func, types, args, kwargs): | ||
return return_and_correct_aliasing( | ||
func.default, # work around |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better way may be
@implements(torch.ops.torchao.dequantize_affine.default)
def _(func, types, args, kwargs):
return return_and_correct_aliasing(
func,
args,
kwargs,
_dequantize_affine_impl(*args),
)
But there will be following issue. NotImplementedError: UintxTensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.to', overload='dtype')>, types=(<class 'torchao.dtypes.uintx.uintx_layout.UintxTensor'>,)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you using uintx layout for float8? why is this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you using uintx layout for float8? why is this changed?
dequantize_affine/quantize_affine also used in uintx.
Without this patch, dequant would be decomposed, so it's ok for uintx, but now it is not decomposed. It is necessary to be found on dispatch__torch_function_.
So I use _implements to add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there will be following issue. NotImplementedError: UintxTensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.to', overload='dtype')>, types=(<class 'torchao.dtypes.uintx.uintx_layout.UintxTensor'>,)
for this, you could implement to
in the UintxTensor as well I think
Remove this pr still has this accuracy error. $ conda list |grep torch
pytorch-triton 3.3.1+gitc8757738 pypi_0 pypi
torch 2.8.0.dev20250608+cu126 pypi_0 pypi
torchao 0.12.0+git3aa9361 dev_0 <develop>
torchaudio 2.8.0.dev20250609+cu126 pypi_0 pypi
torchvision 0.23.0.dev20250609+cu126 pypi_0 pypi
$ git log -1
commit 3aa93619466739c9d9845e1db3bfb2ff0f464857 (HEAD, origin/main, origin/HEAD, main)
$ rm -rf /tmp/torchinductor_pt-gpu/ && /data/wengshiy/conda/envs/ao/bin/pytest -s test/integration/test_integration.py::TestSubclass
if q.is_Number and p.is_Number:
> assert p >= 0, p
E torch._inductor.exc.InductorError: AssertionError: -420769046757011/1000000000000000
E
../conda/envs/ao/lib/python3.10/site-packages/torch/utils/_sympy/functions.py:488: InductorError |
Could you help reivew this pr? @jerryzh168 And I need to fill out CLA, do you know who I can fill out on the "point of contact"? |
quant_min=torch.finfo(float8_dtype).min, | ||
quant_max=torch.finfo(float8_dtype).max, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember now we have quantize_affine_float8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this patch, dequantize_affine/quantize_affine will be decomposed both on int8 and fp8.
If you think it's better to handle fp8 operators separately, I will create another pr to register fp8 operators separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use quantize_affine_float8 / dequantize_affine_float8 for float8 I think
for CLA I think you can just follow the process in the CI:
|
@@ -179,7 +179,7 @@ def find_multiple(n: int, *args: int) -> int: | |||
return n + k - (n % k) | |||
|
|||
|
|||
def _register_custom_op(lib): | |||
def _register_custom_op(lib, dispatch_key="CompositeImplicitAutograd"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you planning to change dispatch key to a flag of whether to decompose or not
Fixed on #2379 |
Fix #2228