GPTQModel v0.9.6
What's Changed
Intel/AutoRound QUANT_METHOD support added for a potentially higher quality quantization with lm_head
module quantization support for even more vram reduction: format export to FORMAT.GPTQ
for max inference compatibility.
- 🚀 [CORE] Add AutoRound as Quantizer option by @LRL-ModelCloud in #166
- 👾 [FIX] [CI] Update test by @CSY-ModelCloud in #177
- 👾 Cleanup Triton by @Qubitium in #178
Full Changelog: v0.9.5...v0.9.6