GPTQModel v1.0.4
What's Changed
Liger Kernel support added for ~50% vram reduction in quantization stage for some models. Added toggle to disable parallel packing to avoid oom larger models. Transformers depend updated to 4.45.0 for Llama 3.2 support.
- [FEATURE] add a parallel_packing toggle by @LRL-ModelCloud in #393
- [FEATURE] add liger_kernel support by @LRL-ModelCloud in #394
Full Changelog: v1.0.3...v1.0.4