Releases · MDK8888/GPTFast · GitHub

22 Aug 04:11

MDK8888

GPTFast-0.3.1 Latest

Latest

GPTFast 0.3.1 is here 🚀🚀🚀!

Stabilized GPTQ for all models, both with and without bias.
Customized W4A16 matmul kernels with tiling that outperform nn.Linear by 30% on RTX 3050.

Assets 2

21 Jun 01:51

MDK8888

GPTFast 0.3.0

GPTQ INT4 quantization available for all HF models
Accelerates inference speed by 7.6x-9x
Integrates optimized INT4 matrix multiplication kernels from the PyTorch team for all HF models

Assets 2

02 Apr 12:50

MDK8888

GPTFast 0.2.1

Minor fixes for PyYAML

Assets 2

02 Apr 04:16

MDK8888

GPTFast 0.2.0

Inference speeds are now accelerated by 6-8.5x
Static key-value caching is now enabled for all Hugging Face models
Support for generic sampling functions in addition to argmax
Debugged speculative decoding

Assets 2