Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove input_quant_func from AffineQuantizedTensor subclass #243

Merged
merged 4 commits into from
May 16, 2024

Conversation

jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented May 15, 2024

Summary:
Currently we have a input_quant_func in the AffineQuantizedTensor, which is a bit convoluted, we want to use a separate LinearActAffineQuantizedTensor subclass for activation quantization (dynamic quantization) instead

also added dispatch for int8act-int8 weight dynamic quantization that's calling int_scaled_matmul kernel in the end

Test Plan:
python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_8da4w
python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8_dyn_quant

Reviewers:

Subscribers:

Tasks:

Tags:

Copy link

pytorch-bot bot commented May 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/243

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 166353f with merge base cae3d82 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2024
Summary:
Currently we have a input_quant_func in the AffineQuantizedTensor, which is a bit convoluted, we want to use a
separate LinearActAffineQuantizedTensor subclass for activation quantization (dynamic quantization) instead

Test Plan:
python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_8da4w

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:
This PR added dispatch for int8act-int8 weight dynamic quantization that's calling `int_scaled_matmul` kernel in the end

Test Plan:
python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8_dyn_quant

Reviewers:

Subscribers:

Tasks:

Tags:
Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great :) Let's move AffineQuantizedTensor into dtypes next and create a PyTorch style conversion function? We should also not need to use torch_function to overwrite linear, but it makes sense to do it as a follow up because it'll require us to add support for detach, view, addmm, etc. to AffineQuantizedTensor

@jerryzh168
Copy link
Contributor Author

Great :) Let's move AffineQuantizedTensor into dtypes next and create a PyTorch style conversion function? We should also not need to use torch_function to overwrite linear, but it makes sense to do it as a follow up because it'll require us to add support for detach, view, addmm, etc. to AffineQuantizedTensor

sounds good. main thing is transpose, we need to think about how to support that with the scales/zero_point and block_size arg

@jerryzh168 jerryzh168 merged commit cda787c into pytorch:main May 16, 2024
13 checks passed
@jerryzh168 jerryzh168 deleted the dyn_quant branch May 16, 2024 00:45
lancerts pushed a commit to lancerts/ao that referenced this pull request May 17, 2024
)

* Remove input_quant_func from AffineQuantizedTensor subclass

Summary:
Currently we have a input_quant_func in the AffineQuantizedTensor, which is a bit convoluted, we want to use a
separate LinearActAffineQuantizedTensor subclass for activation quantization (dynamic quantization) instead

Test Plan:
python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_8da4w

Reviewers:

Subscribers:

Tasks:

Tags:

* Add dispatch for dynamic quantization in `AffineQuantizedTensor`

Summary:
This PR added dispatch for int8act-int8 weight dynamic quantization that's calling `int_scaled_matmul` kernel in the end

Test Plan:
python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8_dyn_quant

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix test
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
)

* Remove input_quant_func from AffineQuantizedTensor subclass

Summary:
Currently we have a input_quant_func in the AffineQuantizedTensor, which is a bit convoluted, we want to use a
separate LinearActAffineQuantizedTensor subclass for activation quantization (dynamic quantization) instead

Test Plan:
python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_8da4w

Reviewers:

Subscribers:

Tasks:

Tags:

* Add dispatch for dynamic quantization in `AffineQuantizedTensor`

Summary:
This PR added dispatch for int8act-int8 weight dynamic quantization that's calling `int_scaled_matmul` kernel in the end

Test Plan:
python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8_dyn_quant

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants