Merge pull request #9 from SidaZh/main

[docs] Update readme
NetEase-FuXi · Feb 1, 2024 · d8cd881 · d8cd881
2 parents 8877e96 + cb80ba2
commit d8cd881
Show file tree

Hide file tree

Showing 4 changed files with 15 additions and 10 deletions.
diff --git a/README.md b/README.md
@@ -14,6 +14,7 @@ Easy & Efficient Quantization for Transformers
   - [Performance](#performance)
 
 ## Features
+- **New**🔥: [Implement gemv](https://github.com/huggingface/text-generation-inference/pull/1502) in w8a16, performance improvement 10~30%. 
 - INT8 weight only PTQ
   * High-performance GEMM kernels from FasterTransformer, [original code](https://github.com/NVIDIA/FasterTransformer/tree/main/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm)
   * No need for quantization training
@@ -70,10 +71,9 @@ model.to("cuda:0")
 res = model.generate(...)
 ```
 
-3. Use EETQ in TGI([text-generation-inference](https://github.com/huggingface/text-generation-inference))
-see [this](https://github.com/huggingface/text-generation-inference/pull/1018)
+3. Use EETQ in [TGI](https://github.com/huggingface/text-generation-inference). see [this PR](https://github.com/huggingface/text-generation-inference/pull/1068).
 ```bash
---quantize eetq
+text-generation-launcher --model-id mistralai/Mistral-7B-v0.1 --quantize eetq ...
 ```
 
 4. Use EETQ in [LoRAX](https://github.com/predibase/lorax). See [docs](https://predibase.github.io/lorax/guides/quantization/#eetq) here.
@@ -89,5 +89,5 @@ Model:
 ## Performance
 
 - llama-13b (test on 3090)
-
-<img src="./docs/images/benchmark.png" style="zoom:50%;" />
+prompt=1024, max_new_tokens=50
+<img src="./docs/images/benchmark.jpg" style="zoom:50%;" />
diff --git a/README_zh.md b/README_zh.md
@@ -15,6 +15,7 @@ EETQ(Easy & Efficient Quantization for Transformers)是一款针对transformer
   - [性能测试](#性能测试)
 
 ## 特点
+- 新特性🔥: [引入gemv算子](https://github.com/huggingface/text-generation-inference/pull/1502) 提升性能10%~30%. 
 
 - 高性能的INT8权重训练后量化算子
 
@@ -68,10 +69,14 @@ res = model.generate(...)
 
 ```
 
-3. 在TGI中使用eetq进行量化加速
-[PR链接](https://github.com/huggingface/text-generation-inference/pull/1018)
+3. 在[TGI](https://github.com/huggingface/text-generation-inference)中使用eetq进行量化加速，[PR链接](https://github.com/huggingface/text-generation-inference/pull/1068)
 ```bash
---quantize eetq
+text-generation-launcher --model-id mistralai/Mistral-7B-v0.1 --quantize eetq ...
+```
+
+4. 在[LoRAX](https://github.com/predibase/lorax)中使用EETQ. 参考[文档](https://predibase.github.io/lorax/guides/quantization/#eetq).
+```bash
+lorax-launcher --model-id mistralai/Mistral-7B-v0.1 --quantize eetq ...
 ```
 
 ## 参考用例
@@ -81,5 +86,5 @@ res = model.generate(...)
 ## 性能测试
 
 - llama-13b (test on 3090)
-
-<img src="./docs/images/benchmark.png" style="zoom:50%;" />
+prompt=1024, max_new_tokens=50
+<img src="./docs/images/benchmark.jpg" style="zoom:50%;" />
diff --git a/docs/images/benchmark.jpg b/docs/images/benchmark.jpg
diff --git a/docs/images/benchmark.png b/docs/images/benchmark.png