Skip to content

GPTQModel v0.9.10

Compare
Choose a tag to compare
@Qubitium Qubitium released this 30 Jul 19:04
· 167 commits to main since this release
233548b

What's Changed

Ported vllm/nm gptq_marlin inference kernel with expanded bits (8bits), group_size (64,32), and desc_act support for all GPTQ models with format = FORMAT.GPTQ. Auto calculate auto-round nsamples/seglen parameters based on calibration dataset. Fixed save_quantized() called on pre-quantized models with non-supported backends. HF transformers depend updated to ensure Llama 3.1 fixes are correctly applied to both quant and inference stage.

Full Changelog: v0.9.9...v0.9.10