Skip to content

Commit 87e5bbb

Browse files
authored
update sq doc (#1301)
Signed-off-by: Lu, Yintong <yintong.lu@intel.com>
1 parent 6cad500 commit 87e5bbb

File tree

1 file changed

+43
-17
lines changed

1 file changed

+43
-17
lines changed

docs/source/smooth_quant.md

Lines changed: 43 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -318,27 +318,53 @@ conv2d/linear->conv2d/linear/layernorm/batchnorm/instancenorm/t5norm/llamanorm/g
318318
## Validated Models
319319
Neural Compressor: 2.1
320320

321-
IPEX (Intel Extension for PyTorch): 2.0
321+
IPEX (Intel Extension for PyTorch): 2.0/2.1
322322

323-
Dataset: lambada
323+
Dataset: lambada_openai
324324

325325
Task: text-generation
326326

327-
alpha [0.4, 0.6] is sweet spot region in SmoothQuant paper
328-
329-
| Model\Last token accuracy | FP32 | INT8 (w/o SmoothQuant) | INT8 (w/ SmoothQuant) | INT8 (w/ SmoothQuant auto tuning) |
330-
|---------------------|:------:|:----------------------:|-----------------------|-----------------------------------|
331-
| bigscience/bloom-560m | 65.20% | 63.44% | 66.48% (alpha=0.5) | 64.76% (alpha: 95.9% over 0.6, 4.1% in [0.4, 0.6]) |
332-
| bigscience/bloom-1b7 | 71.43% | 67.78% | 72.56% (alpha=0.5) | 72.58% (alpha: 55.1% over 0.6, 30.6% in [0.4, 0.6], 14.3% under 0.4) |
333-
| bigscience/bloom-3b | 73.97% | 69.99% | 74.02% (alpha=0.5) | 74.16% (alpha: 100% over 0.6) |
334-
| bigscience/bloom-7b1 | 77.44% | 75.46% | 77.02%(alpha=0.5) | 77.45% (alpha: 91.8% over 0.6, 4.9% in [0.4, 0.6], 3.3% under 0.4) |
335-
| bigscience/bloom-176b | 84.17% | 82.13% | 83.52% (alpha=0.6) | - |
336-
| facebook/opt-125m | 63.89% | 63.48% | 63.44% (alpha=0.5) | 64.14% (alpha: 59.4% over 0.6, 8.1% in [0.4, 0.6], 32.4% under 0.4) |
337-
| facebook/opt-1.3b | 75.41% | 73.59% | 70.94% (alpha=0.5) | 74.80% (alpha: 69.9% over 0.6, 24.7% in [0.4, 0.6], 5.5% under 0.4) |
338-
| facebook/opt-2.7b | 77.79% | 78.57% | 78.60%(alpha=0.5) | 78.25% (alpha: 73.2% over 0.6, 21.6% in [0.4, 0.6], 5.2% under 0.4) |
339-
| facebook/opt-6.7b | 81.26% | 76.65% | 81.58%(alpha=0.5) | 81.39% (alpha: 68.0% over 0.6, 26.8% in [0.4, 0.6], 5.2% under 0.4) |
340-
| EleutherAI/gpt-j-6B | 79.17% | 78.82% | 78.84%(alpha=0.6) | 79.29% (alpha: 96.4% over 0.6, 3.6% in [0.4, 0.6]) |
341-
327+
alpha [0.4, 0.6] is sweet spot region in SmoothQuant paper.
328+
329+
A list of models that achieved a <1% accuracy drop is shown below.
330+
331+
| Model/Last token accuracy | FP32 Accuracy | INT8 (w/ SmoothQuant) | Notes |
332+
|:----------:|:------:|:------:|-----------------------------------|
333+
| bigscience/bloom-560m | 0.354 | 0.3542 | alpha=0.5, Ipex 2.1 |
334+
| bigscience/bloom-1b7 | 0.4634 | 0.4936 | alpha=0.5, Ipex 2.0 |
335+
| bigscience/bloom-3b | 0.518 | 0.5185 | alpha=0.8, Ipex 2.1 |
336+
| bigscience/bloom-7b1 | 0.5764 | 0.5977 | alpha=0.5, Ipex 2.0 |
337+
| bigscience/bloomz-560m | 0.3947 | 0.3930 | alpha=0.8, Ipex 2.1 |
338+
| bigscience/bloomz-1b7 | 0.4828 | 0.4906 | alpha=0.5, Ipex 2.1 |
339+
| bigscience/bloomz-3b | 0.5018 | 0.4980 | alpha=0.5, Ipex 2.1 |
340+
| bigscience/bloomz-7b1 | 0.5593 | 0.5552 | alpha=0.5, Ipex 2.1 |
341+
| facebook/opt-125m | 0.379 | 0.3757 | alpha=0.5, Ipex 2.1 |
342+
| facebook/opt-350m | 0.4516 | 0.4533 | alpha=0.8, Ipex 2.1 |
343+
| facebook/opt-1.3b | 0.5789 | 0.5742 | alpha=0.8, Ipex 2.0 |
344+
| facebook/opt-2.7b | 0.6365 | 0.6404 | alpha=0.5, Ipex 2.0 |
345+
| facebook/opt-6.7b | 0.6769 | 0.6804 | alpha=0.5, Ipex 2.0 |
346+
| facebook/opt-13b | 0.6872 | 0.6814 | alpha=0.5, Ipex 2.1 |
347+
| facebook/opt-30b | 0.7149 | 0.7128 | alpha=0.5, Ipex 2.1 |
348+
| facebook/opt-66b | 0.7398 | 0.7326 | alpha=0.5, Ipex 2.1 |
349+
| LLaMa-7b | 0.7361 | 0.7357 | alpha=0.8, Ipex 2.1 |
350+
| LLaMa-13b | 0.7627 | 0.7590 | alpha=0.7, Ipex 2.1 |
351+
| LLaMa-30b | 0.7759 | 0.7840 | alpha=0.7, Ipex 2.1 |
352+
| LLaMa-65b | 0.7908 | 0.7957 | alpha=0.9, Ipex 2.1 |
353+
| LLaMa-2-7b | 0.7369/0.7262 | 0.7330 | alpha=Auto, Ipex 2.1/Pytorch |
354+
| EleutherAI/gpt-j-6B | 0.6831 | 0.6821 | alpha=1.0, Ipex 2.1 |
355+
| MBZUAI/LaMini-GPT-124m | 0.3804 | 0.3887 | alpha=0.5, Ipex 2.1 |
356+
| MBZUAI/LaMini-GPT-774m | 0.5048 | 0.5057 | alpha=0.5, Ipex 2.1 |
357+
| MBZUAI/LaMini-GPT-1.5b | 0.5443 | 0.5436 | alpha=0.5, Ipex 2.1 |
358+
| mosaicml/mpt-7b-chat | 0.655 | 0.6499 | alpha=0.7, Ipex 2.1 |
359+
| stabilityai/stablelm-base-alpha-3b | 0.4172 | 0.4149 | alpha=0.6, Ipex 2.1 |
360+
| togethercomputer/RedPajama-INCITE-Base-3B-v1 | 0.6542 | 0.6735 | alpha=0.5, Ipex 2.1 |
361+
| togethercomputer/RedPajama-INCITE-Chat-3B-v1 | 0.6718 | 0.6740 | alpha=0.5, Ipex 2.0 |
362+
| togethercomputer/RedPajama-INCITE-Instruct-3B-v1 | 0.6569 | 0.6621 | alpha=0.5, Ipex 2.0 |
363+
| togethercomputer/RedPajama-INCITE-Base-7B-v0.1 | 0.7143 | 0.7221 | alpha=0.5, Ipex 2.0 |
364+
| togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1 | 0.6895 | 0.6953 | alpha=0.5, Ipex 2.0 |
365+
| databricks/dolly-v1-6b | 0.6866 | 0.6895 | alpha=0.8, Ipex 2.1 |
366+
| databricks/dolly-v2-3b | 0.6297 | 0.6247 | alpha=0.5, Ipex 2.1 |
367+
| tiiuae/falcon-7b-instruct | 0.6437 | 0.6392 | alpha=0.7, Pytorch |
342368

343369
## Example
344370

0 commit comments

Comments
 (0)