Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HFQuantizer implementation for compressed-tensors library #31704
HFQuantizer implementation for compressed-tensors library #31704
Changes from all commits
d695ec3
f468964
41224d3
b61bfb9
ff8f1c5
c1cb55d
ef9d3f1
117d050
1901c3e
3ca270d
9a14b09
ec59052
520ded8
7dec8fc
afb550d
d9b3660
e51ac59
ccb5442
bfd9220
71a80f9
547f9cc
8acbc09
eaa5f20
4ba75fb
94ea0d3
c48840d
ab74d26
2ecf711
e1ae504
ea9e927
1c3ad5c
aa1a4f9
81a13dd
f53d7b9
d8f7073
c4fbf70
1992a88
298a638
3cb4415
a943157
64f475a
fabe8a3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue with bfloat16? We should try to allow this for llama models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issue with bfloat16, we just recommend float16 as a default since that is what vLLM expects for the scale/zp