-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Int4WeightOnlyQuantizer to set different dtype for scales_and_zeros #479
Conversation
scales_and_zeros As titled. Currently `Int4WeightOnlyQuantizer` is hardcoded to return `scales_and_zeros` with dtype `torch.bfloat16`. Adding `dtype` argument into the flow so that it can be different dtype.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/479
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f3c320a with merge base a35a1cd (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
) -> None: | ||
super().__init__() | ||
self.padding = not _check_linear_int4_k(in_features, groupsize, inner_k_tiles) | ||
if self.padding: | ||
from model import find_multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a module called model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I think this is a relic of when gptq was more deeply coupled with gpt-fast
This seems fine to merge although I do worry that most of our gptq tests are disabled right now in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks fine but FYI we don't really have anyone maintaining the gptq example so if there's a use-case for it please let me know
I'm migrating torchchat to use these APIs, to be prepared for shared kernels across ET and PyTorch eager/compile. |
…eros (pytorch#479) * Allow Int4WeightOnlyQuantizer to set different dtype for scales_and_zeros As titled. Currently `Int4WeightOnlyQuantizer` is hardcoded to return `scales_and_zeros` with dtype `torch.bfloat16`. Adding `dtype` argument into the flow so that it can be different dtype. * Add comment
As titled. Currently
Int4WeightOnlyQuantizer
is hardcoded to returnscales_and_zeros
with dtypetorch.bfloat16
. Addingdtype
argument into the flow so that it can be different dtype.