Releases: MDK8888/GPTFast
Releases · MDK8888/GPTFast
GPTFast-0.3.1
GPTFast 0.3.1 is here 🚀🚀🚀!
- Stabilized GPTQ for all models, both with and without bias.
- Customized W4A16 matmul kernels with tiling that outperform nn.Linear by 30% on RTX 3050.
GPTFast 0.3.0
- GPTQ INT4 quantization available for all HF models
- Accelerates inference speed by 7.6x-9x
- Integrates optimized INT4 matrix multiplication kernels from the PyTorch team for all HF models
GPTFast 0.2.1
- Minor fixes for PyYAML
GPTFast 0.2.0
- Inference speeds are now accelerated by 6-8.5x
- Static key-value caching is now enabled for all Hugging Face models
- Support for generic sampling functions in addition to argmax
- Debugged speculative decoding