Skip to content

Commit

Permalink
Merge pull request #9 from SidaZh/main
Browse files Browse the repository at this point in the history
[docs] Update readme
  • Loading branch information
NetEase-FuXi authored Feb 1, 2024
2 parents 8877e96 + cb80ba2 commit d8cd881
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 10 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Easy & Efficient Quantization for Transformers
- [Performance](#performance)

## Features
- **New**🔥: [Implement gemv](https://github.com/huggingface/text-generation-inference/pull/1502) in w8a16, performance improvement 10~30%.
- INT8 weight only PTQ
* High-performance GEMM kernels from FasterTransformer, [original code](https://github.com/NVIDIA/FasterTransformer/tree/main/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm)
* No need for quantization training
Expand Down Expand Up @@ -70,10 +71,9 @@ model.to("cuda:0")
res = model.generate(...)
```

3. Use EETQ in TGI([text-generation-inference](https://github.com/huggingface/text-generation-inference))
see [this](https://github.com/huggingface/text-generation-inference/pull/1018)
3. Use EETQ in [TGI](https://github.com/huggingface/text-generation-inference). see [this PR](https://github.com/huggingface/text-generation-inference/pull/1068).
```bash
--quantize eetq
text-generation-launcher --model-id mistralai/Mistral-7B-v0.1 --quantize eetq ...
```

4. Use EETQ in [LoRAX](https://github.com/predibase/lorax). See [docs](https://predibase.github.io/lorax/guides/quantization/#eetq) here.
Expand All @@ -89,5 +89,5 @@ Model:
## Performance

- llama-13b (test on 3090)

<img src="./docs/images/benchmark.png" style="zoom:50%;" />
prompt=1024, max_new_tokens=50
<img src="./docs/images/benchmark.jpg" style="zoom:50%;" />
15 changes: 10 additions & 5 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ EETQ(Easy & Efficient Quantization for Transformers)是一款针对transformer
- [性能测试](#性能测试)

## 特点
- 新特性🔥: [引入gemv算子](https://github.com/huggingface/text-generation-inference/pull/1502) 提升性能10%~30%.

- 高性能的INT8权重训练后量化算子

Expand Down Expand Up @@ -68,10 +69,14 @@ res = model.generate(...)

```

3. 在TGI中使用eetq进行量化加速
[PR链接](https://github.com/huggingface/text-generation-inference/pull/1018)
3.[TGI](https://github.com/huggingface/text-generation-inference)中使用eetq进行量化加速,[PR链接](https://github.com/huggingface/text-generation-inference/pull/1068)
```bash
--quantize eetq
text-generation-launcher --model-id mistralai/Mistral-7B-v0.1 --quantize eetq ...
```

4.[LoRAX](https://github.com/predibase/lorax)中使用EETQ. 参考[文档](https://predibase.github.io/lorax/guides/quantization/#eetq).
```bash
lorax-launcher --model-id mistralai/Mistral-7B-v0.1 --quantize eetq ...
```

## 参考用例
Expand All @@ -81,5 +86,5 @@ res = model.generate(...)
## 性能测试

- llama-13b (test on 3090)

<img src="./docs/images/benchmark.png" style="zoom:50%;" />
prompt=1024, max_new_tokens=50
<img src="./docs/images/benchmark.jpg" style="zoom:50%;" />
Binary file added docs/images/benchmark.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/images/benchmark.png
Binary file not shown.

0 comments on commit d8cd881

Please sign in to comment.