-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨 Support dequantization for most GGML types #32625
Conversation
cc @SunMarc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that's a nice cleanup ! Great that this has been added in gguf package ! We would also need to advise the user to install the latest version of gguf when it cames out , otherwise he will get an error.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @Isotr0py ! This is a nice cleanup ! LGTM !
Nice ! Make sure to fix the CI with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for enabling this
The min version requirement is technically a breaking change as older versions previously worked, however I think this is OK. Could you prefix the PR title with 🚨 to highlight this?
Overall, very nice, just a small thing to update and we're good to go!
* use gguf internal dequantize * add Q5_0 test * add iq1 test * add remained test * remove duplicated test * update docs * add gguf version limit * make style * update gguf import catch * revert vocab_size patch * make style * use GGUF_MIN_VERSION everywhere
Great job on this PR @Isotr0py 🙌 |
* use gguf internal dequantize * add Q5_0 test * add iq1 test * add remained test * remove duplicated test * update docs * add gguf version limit * make style * update gguf import catch * revert vocab_size patch * make style * use GGUF_MIN_VERSION everywhere
What does this PR do?
This PR needs to wait
gguf
package version update and still work in progress.ggml_types
. And clean up the current ggml dequantization implementation.llama.cpp
has added numpy dequantization implementation ongguf-py
in gguf-py : Numpy dequantization for most types ggerganov/llama.cpp#8939, we can dequantize most of ggml tensor easily in one line.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.