-
Notifications
You must be signed in to change notification settings - Fork 38
[Quant Args] Clean-up #513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We want the |
We can go either way. Original config.json includes |
This looks like it does not include |
I mean they include |
kylesayrs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still many places in the code base where scale_dtype is read but not checked for nullability, right?
https://github.com/search?q=repo%3Avllm-project%2Fcompressed-tensors%20scale_dtype&type=code
|
@brian-dellabetta TorchDtype annotation will accept either on read, and could be modified to remove the torch prefix on write |
No. Did you look through the link you sent? I can take a look if there's anything specific but sending a generic look-up isn't helpful here. |
oh, i thought that's the deprecated field name from transfomers that causes all the warnings like
either way, i think it's fine. they'll be loaded up the same anyway |
|
@dsikka I misread one of the checks. Here we assert that The other issue I'd like to call out is that maintaining the |
If the weight isn't one of the dense dtypes, we likely have a bug as all weights remain dense until compression. |
|
@dsikka Yeah, it's base models with non-full-precision weights is definitely an edge case we can ignore for now. But syntactically, we should remove this case all together if we're not going to support it. Something for the future. |
Summary
scale_dtypeis no longer being overwritten with theobserved_dtypeif not already set. By doing this, we do not touch the quantization_args once initialized and assumeNonein our scheme corresponds to the default weight / observed dtype. With this,scale_dtypeshows up as None if not set by the scheme in the config.jsonTesting:
Asym:
NVFp4:
KV Cache: