Int 8 / FP 8 quantization support similar to bnb #24

alexconstant9108 · 2023-02-22T07:16:34Z

Hi there @ggerganov and great work. The performance on CPU is just amazing.
Would it be possible in the future to also implement Int 8 / FP 8 loading of models (a few layers still must be loaded with their original fp16 or fp32 weights) similar to bitsandbytes library: https://github.com/TimDettmers/bitsandbytes
This would allow loading of bigger models on some systems with limited amount of cpu ram. Or perhaps even faster inference for models like GPT-J.
In theory on a mac system or x64 (AVX2 or AVX512) system with 128GB cpu ram you would be able load a 120B model this way... Wouldnt that be amazing :)))

ggerganov · 2023-02-22T07:23:03Z

Hi, these days I actually started working on 4-bit / n-bit quantization support.
There are some optimistic results already, but not 100% sure yet if I will be able to make it work efficiently and accurately.

alexconstant9108 · 2023-02-22T12:13:35Z

4 bit! Wow! This reminds me of https://github.com/THUDM/GLM-130B/blob/main/docs/quantization.md
It would be total game changer if it works. Some degradation in output accuracy is expected for some models of course but still much better than not able to run the model at all due to hardware limitations :))))

ggerganov added the enhancement New feature or request label Feb 26, 2023

ggerganov mentioned this issue Feb 26, 2023

4-bit Integer quantisation #27

Merged

8 tasks

ggerganov pinned this issue Feb 26, 2023

ggerganov closed this as completed May 8, 2023

ggerganov unpinned this issue May 20, 2023

PABannier added a commit to PABannier/ggml that referenced this issue Oct 20, 2024

feat: support Metal backend (ggerganov#24)

45964ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Int 8 / FP 8 quantization support similar to bnb #24

Int 8 / FP 8 quantization support similar to bnb #24

alexconstant9108 commented Feb 22, 2023

ggerganov commented Feb 22, 2023 •

edited

Loading

alexconstant9108 commented Feb 22, 2023

Int 8 / FP 8 quantization support similar to bnb #24

Int 8 / FP 8 quantization support similar to bnb #24

Comments

alexconstant9108 commented Feb 22, 2023

ggerganov commented Feb 22, 2023 • edited Loading

alexconstant9108 commented Feb 22, 2023

ggerganov commented Feb 22, 2023 •

edited

Loading