GPTQModel v0.9.3
What's Changed
- π [MODEL] Add Gemma 2 support by @LRL-ModelCloud in #131
- π [OTHER] Calculate ppl on gpu by @ZYC-ModelCloud in #135
- β¨ [REFRACTOR] BaseQuantLinear and avoid using shared QuantLinear cls name by @PZS-ModelCloud in #116
- β¨ [KERNEL] Bitblas cache stablity by @Qubitium in #129
- πΎ [FIX] Export TORCH_CUDA_ARCH_LIST in install.sh by @LeiWang1999 in #133
- πΎ [FIX] Limit Bitblas numexpr thread usage by @Qubitium in #125
- πΎ [FIX] Revert "Skip opt fc1/fc2 for quantization" due to inference regressions (#118)" by @Qubitium in #149
- β¨ [REFRACTOR] remove max_memory arg by @CL-ModelCloud in #144
- π€ [CI] Fix test was skipped by @CSY-ModelCloud in #145
- π€ [CI] Add GPU selector for runner by @CSY-ModelCloud in #148
New Contributors
- @LeiWang1999 made their first contribution in #133
Full Changelog: v0.9.2...v0.9.3