From 6b3923e704c72bb3b60b1b4f604c45d4045ef3a0 Mon Sep 17 00:00:00 2001 From: Qubitium-ModelCloud Date: Sat, 29 Jun 2024 20:09:07 +0800 Subject: [PATCH] Update README.md (#122) --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 1652b7cc..65af7067 100644 --- a/README.md +++ b/README.md @@ -45,6 +45,8 @@ We will backport bug fixes to AutoGPTQ on a case-by-case basis. * 🚀 Model weights sharding support * 🚀 Security: hash check of model weights on load * ✨ Alert users of sub-optimal calibration data. Most new users get this part horribly wrong. +* ✨ Increased compatiblity with newest models with auto-padding of in/out-features for [ Exllama, Exllama V2, Marlin ] backends. +* 👾 Fixed OPT quantization. Original OPT model code resulted in unusable quantized models. * 👾 Removed non-working, partially working, or fully deprecated features: Peft, ROCM, AWQ Gemm inference, Triton v1 (replaced by v2), Fused Attention (Replaced by Marlin/Exllama). * 👾 Fixed packing Performance regression on high core-count systems. Backported to AutoGPTQ * 👾 Fixed crash on H100. Backported to AutoGPTQ