-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attributed in huggingface/transformers #1
Comments
Very cool! I am glad that you found my code is useful! But I am also a bit worried about potential bugs. I've only tested with tinyllama so far. It might totally break for any other model. For example, I am not sure about the transposed shapes. In addition, I am not sure if this is the best way forward for the transformers library. Not having to add additional dependencies is certainly nice, but using NumPy is significantly slower than writing the bit wrangling code in C, because of all the copying from NumPy array to NumPy array. |
Anyway, it might be nice to have a NumPy implementation to fall back on. For better completeness, I have implemented the missing quantization formats |
I've just heard that llama.cpp implements dequantization now, so you might want to consider switching to it instead of pygguf since it supports more quantization formats: ggerganov/llama.cpp#8939 |
Thanks for the heads-up @99991. Really appreciate it ! We already have a PR opened to make the switch: huggingface/transformers#32625 |
Hello!
FYI we've been using your code in order to offer support for gguf files within the python ecosystem, by offering the ability to load them within
transformers
.We're doing so here, we've credited you in the documentation and I've added you as a co-author: https://github.com/LysandreJik/transformers/pull/2/files
We'll open a PR on the main fork in the coming days so I wanted to give you an opportunity to give it a look beforehand.
Thanks a lot for your work 🤗
cc @younesbelkada
The text was updated successfully, but these errors were encountered: