GPTQModel v0.9.3

Qubitium released this 02 Jul 18:05

· 306 commits to main since this release

What's Changed

🚀 [MODEL] Add Gemma 2 support by @LRL-ModelCloud in #131
🚀 [OTHER] Calculate ppl on gpu by @ZYC-ModelCloud in #135
✨ [REFRACTOR] BaseQuantLinear and avoid using shared QuantLinear cls name by @PZS-ModelCloud in #116
✨ [KERNEL] Bitblas cache stablity by @Qubitium in #129
👾 [FIX] Export TORCH_CUDA_ARCH_LIST in install.sh by @LeiWang1999 in #133
👾 [FIX] Limit Bitblas numexpr thread usage by @Qubitium in #125
👾 [FIX] Revert "Skip opt fc1/fc2 for quantization" due to inference regressions (#118)" by @Qubitium in #149
✨ [REFRACTOR] remove max_memory arg by @CL-ModelCloud in #144
🤖 [CI] Fix test was skipped by @CSY-ModelCloud in #145
🤖 [CI] Add GPU selector for runner by @CSY-ModelCloud in #148

New Contributors

@LeiWang1999 made their first contribution in #133

Full Changelog: v0.9.2...v0.9.3

Contributors

Qubitium, LeiWang1999, and 5 other contributors

Assets 2