You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there @ggerganov and great work. The performance on CPU is just amazing.
Would it be possible in the future to also implement Int 8 / FP 8 loading of models (a few layers still must be loaded with their original fp16 or fp32 weights) similar to bitsandbytes library: https://github.com/TimDettmers/bitsandbytes
This would allow loading of bigger models on some systems with limited amount of cpu ram. Or perhaps even faster inference for models like GPT-J.
In theory on a mac system or x64 (AVX2 or AVX512) system with 128GB cpu ram you would be able load a 120B model this way... Wouldnt that be amazing :)))
The text was updated successfully, but these errors were encountered:
Hi, these days I actually started working on 4-bit / n-bit quantization support.
There are some optimistic results already, but not 100% sure yet if I will be able to make it work efficiently and accurately.
4 bit! Wow! This reminds me of https://github.com/THUDM/GLM-130B/blob/main/docs/quantization.md
It would be total game changer if it works. Some degradation in output accuracy is expected for some models of course but still much better than not able to run the model at all due to hardware limitations :))))
Hi there @ggerganov and great work. The performance on CPU is just amazing.
Would it be possible in the future to also implement Int 8 / FP 8 loading of models (a few layers still must be loaded with their original fp16 or fp32 weights) similar to bitsandbytes library: https://github.com/TimDettmers/bitsandbytes
This would allow loading of bigger models on some systems with limited amount of cpu ram. Or perhaps even faster inference for models like GPT-J.
In theory on a mac system or x64 (AVX2 or AVX512) system with 128GB cpu ram you would be able load a 120B model this way... Wouldnt that be amazing :)))
The text was updated successfully, but these errors were encountered: