Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attributed in huggingface/transformers #1

Open
LysandreJik opened this issue Apr 19, 2024 · 5 comments
Open

Attributed in huggingface/transformers #1

LysandreJik opened this issue Apr 19, 2024 · 5 comments

Comments

@LysandreJik
Copy link

Hello!

FYI we've been using your code in order to offer support for gguf files within the python ecosystem, by offering the ability to load them within transformers.

We're doing so here, we've credited you in the documentation and I've added you as a co-author: https://github.com/LysandreJik/transformers/pull/2/files

We'll open a PR on the main fork in the coming days so I wanted to give you an opportunity to give it a look beforehand.

Thanks a lot for your work 🤗

cc @younesbelkada

@99991
Copy link
Owner

99991 commented Apr 19, 2024

Very cool! I am glad that you found my code is useful!

But I am also a bit worried about potential bugs. I've only tested with tinyllama so far. It might totally break for any other model. For example, I am not sure about the transposed shapes.

In addition, I am not sure if this is the best way forward for the transformers library. Not having to add additional dependencies is certainly nice, but using NumPy is significantly slower than writing the bit wrangling code in C, because of all the copying from NumPy array to NumPy array.

@99991
Copy link
Owner

99991 commented Apr 21, 2024

Anyway, it might be nice to have a NumPy implementation to fall back on. For better completeness, I have implemented the missing quantization formats Q2_K, Q3_K and Q5_K. I have not implemented the other formats, since they are expected to be worse than the existing ones.

a417edb

@99991
Copy link
Owner

99991 commented Aug 15, 2024

I've just heard that llama.cpp implements dequantization now, so you might want to consider switching to it instead of pygguf since it supports more quantization formats: ggerganov/llama.cpp#8939

@LysandreJik
Copy link
Author

Thanks for the heads-up @99991!

cc @SunMarc for your information

@SunMarc
Copy link

SunMarc commented Aug 26, 2024

Thanks for the heads-up @99991. Really appreciate it ! We already have a PR opened to make the switch: huggingface/transformers#32625

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants