Skip to content

Commit 505a7ed

Browse files
authored
Update README.md
1 parent e944610 commit 505a7ed

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
**ATTENTION: AutoFP8 has been deprecated in preference of [`llm-compressor`](https://github.com/vllm-project/llm-compressor), a library for producing all sorts of model compression in addition to FP8. Check out the [FP8 example here](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8).**
44

5+
<details>
6+
<summary>Old content</summary>
7+
58
Open-source FP8 quantization library for producing compressed checkpoints for running in vLLM - see https://github.com/vllm-project/vllm/pull/4332 for details on the implementation for inference. This library focuses on providing quantized weight, activation, and kv cache scales for FP8_E4M3 precision.
69

710
[FP8 Model Collection from Neural Magic](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127) with many accurate (<1% accuracy drop) FP8 checkpoints ready for inference with vLLM.
@@ -103,3 +106,5 @@ If config has `"activation_scheme": "dynamic"`:
103106
model.layers.0.mlp.down_proj.weight < F8_E4M3
104107
model.layers.0.mlp.down_proj.weight_scale < F32
105108
```
109+
110+
</details>

0 commit comments

Comments
 (0)