-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save and load dynamically quantized model #99
Comments
Hi, Thanks a lot for your interest in the INSTRUCTOR model! The following works for me:
Hope this helps! |
@hongjin-su |
Yeah, this seems to work:
|
@hongjin-su |
Hello! First of all, great work on instructor.
I'd like to load a quantized model to avoid CPU/memory spikes on my script startup which happen during quantization itself.
I tried static quantization first but it is not supported for SentenceTransformers for float16 or qint8.
For dynamic quantization I get the following errors when trying to load a saved state_dict:
I tried two save methods: direct
torch.save(model.state_dict())
and saving traced version withtorch.jit.trace
but bothresult in the same error.
So, is there a way to save/load a quantized model?
The text was updated successfully, but these errors were encountered: