GPTQModel v0.9.5
What's Changed
Another large update with added support for Intel/Qbits quantization/inference on CPU. Cuda kernels have been fully deprecated in favor of better performing Exllama (v1/v2), Marlin, and Triton kernels.
- 🚀🚀 [KERNEL] Added Intel QBits support with [2, 3, 4, 8] bits quantization/inference on CPU by @CSY-ModelCloud in #137
- ✨ [CORE] BaseQuantLinear add SUPPORTED_DEVICES by @ZX-ModelCloud in #174
- ✨ [DEPRECATION] Remove Backend.CUDA and Backend.CUDA_OLD by @ZX-ModelCloud in #165
- 👾 [CI] FIX test perplexity by @ZYC-ModelCloud in #160
Full Changelog: v0.9.4...v0.9.5