-
Notifications
You must be signed in to change notification settings - Fork 272
Relax dtype requirements for int4 and float8 quants in autoquant #1571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Some of the int4 quant only works with bfloat16/float16, previously we require the model to be in correct dtype to apply these in autoquant, this PR relaxes the constraints by converting weight and activation to compatible dtypes Test Plan: python test/integration/test_integration.py -k test_autoquant_int4wo Reviewers: Subscribers: Tasks: Tags:
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1571
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 3 PendingAs of commit c03760d with merge base e1cb44a ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@jcaip I can't run sparse marlin kernel locally: |
@@ -227,6 +227,11 @@ def from_plain( | |||
# Linear layers are (in_features, out_features) but the int_data that is reaching this point | |||
# is (out_features, in_features). We need to transpose it to match the expected shape in the marlin code. | |||
q_w_24 = int_data.t() | |||
# addressing the case when scale has dimension 1, happens when | |||
# weight_shape[-1] == group_size == 128 | |||
if scale.ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Jerry, I haven’t been actively working on this repo lately as I have been more focused on work, so I am not entirely up to date. That said, please keep in mind that my input might not fully reflect the current state of things.
My two cents here is that this should work well, as the condition seems to allow for generalization to other scenarios that might produce that corner case in the future i.e: if weight_shape[-1] == group_size == 64
yields a scale with dimension 1.
You need to build with USE_CPP=1 and it should show up |
Summary:
Some of the int4 quant and fp8 only works with bfloat16/float16, previously we require the model to be in correct dtype to apply these in autoquant, this PR relaxes the constraints by converting weight, bias and activation to compatible dtypes
Test Plan:
python test/integration/test_integration.py -k test_autoquant_int4wo
Reviewers:
Subscribers:
Tasks:
Tags: