Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int 8 / FP 8 quantization support similar to bnb #24

Closed
alexconstant9108 opened this issue Feb 22, 2023 · 2 comments
Closed

Int 8 / FP 8 quantization support similar to bnb #24

alexconstant9108 opened this issue Feb 22, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@alexconstant9108
Copy link

Hi there @ggerganov and great work. The performance on CPU is just amazing.
Would it be possible in the future to also implement Int 8 / FP 8 loading of models (a few layers still must be loaded with their original fp16 or fp32 weights) similar to bitsandbytes library: https://github.com/TimDettmers/bitsandbytes
This would allow loading of bigger models on some systems with limited amount of cpu ram. Or perhaps even faster inference for models like GPT-J.
In theory on a mac system or x64 (AVX2 or AVX512) system with 128GB cpu ram you would be able load a 120B model this way... Wouldnt that be amazing :)))

@ggerganov
Copy link
Owner

ggerganov commented Feb 22, 2023

Hi, these days I actually started working on 4-bit / n-bit quantization support.
There are some optimistic results already, but not 100% sure yet if I will be able to make it work efficiently and accurately.

@alexconstant9108
Copy link
Author

4 bit! Wow! This reminds me of https://github.com/THUDM/GLM-130B/blob/main/docs/quantization.md
It would be total game changer if it works. Some degradation in output accuracy is expected for some models of course but still much better than not able to run the model at all due to hardware limitations :))))

@ggerganov ggerganov added the enhancement New feature or request label Feb 26, 2023
@ggerganov ggerganov pinned this issue Feb 26, 2023
@ggerganov ggerganov unpinned this issue May 20, 2023
PABannier added a commit to PABannier/ggml that referenced this issue Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants