-
Notifications
You must be signed in to change notification settings - Fork 31.5k
🚨🚨🚨 [Quantization] Store the original dtype in the config as a private attribute 🚨🚨🚨
#26761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we prevent people form casting an already quantised model WDYT? Should not be a recommended / desirable use case
src/transformers/modeling_utils.py
Outdated
| # one the weights have been quantized | ||
| # Note that once you have loaded a quantized model, you can't change its dtype so this will | ||
| # remain a single source of truth | ||
| config._quantization_original_dtype = torch_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| config._quantization_original_dtype = torch_dtype | |
| config._pre_quantization_dtype = torch_dtype |
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, LGTM let's make sure we don't break the workflow for other as it's breaking (not being able to cast to a dtype after init) and add a 🚨 !
| # pop the `_pre_quantization_dtype` as torch.dtypes are not serializable. | ||
| _ = serializable_config_dict.pop("_pre_quantization_dtype", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we pop it because it should not be saved no?
| # Note that once you have loaded a quantized model, you can't change its dtype so this will | ||
| # remain a single source of truth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be needed in the quantizer config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok since users can always load back quantized models with new torch_dtype making that _pre_quantization_dtype obsolete
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Quantization] Store the original dtype in the config as a private attributeQuantization] Store the original dtype in the config as a private attribute 🚨🚨🚨
…ate attribute 🚨🚨🚨 (huggingface#26761) * First step * fix * add adjustements for gptq * change to `_pre_quantization_dtype` * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix serialization * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
What does this PR do?
First step of an alternative design of #26560
For quantized models, instead of introducing a complex logic of retrieving the original weights dtype, I propose to simply add a private attribute
_quantization_original_dtypein the config object.tomethod does not need to be touched here astocannot be called on quantized models (but for GPTQ models you can calltoto perform device placement only - not for dtype casting)that way we could adapt #26560 to simply check if the config has the attribute
_quantization_original_dtypewhich is the case only for quantized models, else retrieve the dtype by retrieving the dtype of the linear layer weights in a classic manner.cc @LysandreJik