@@ -318,27 +318,53 @@ conv2d/linear->conv2d/linear/layernorm/batchnorm/instancenorm/t5norm/llamanorm/g
318318## Validated Models
319319Neural Compressor: 2.1
320320
321- IPEX (Intel Extension for PyTorch): 2.0
321+ IPEX (Intel Extension for PyTorch): 2.0/2.1
322322
323- Dataset: lambada
323+ Dataset: lambada_openai
324324
325325Task: text-generation
326326
327- alpha [ 0.4, 0.6] is sweet spot region in SmoothQuant paper
328-
329- | Model\Last token accuracy | FP32 | INT8 (w/o SmoothQuant) | INT8 (w/ SmoothQuant) | INT8 (w/ SmoothQuant auto tuning) |
330- | ---------------------| :------:| :----------------------:| -----------------------| -----------------------------------|
331- | bigscience/bloom-560m | 65.20% | 63.44% | 66.48% (alpha=0.5) | 64.76% (alpha: 95.9% over 0.6, 4.1% in [ 0.4, 0.6] ) |
332- | bigscience/bloom-1b7 | 71.43% | 67.78% | 72.56% (alpha=0.5) | 72.58% (alpha: 55.1% over 0.6, 30.6% in [ 0.4, 0.6] , 14.3% under 0.4) |
333- | bigscience/bloom-3b | 73.97% | 69.99% | 74.02% (alpha=0.5) | 74.16% (alpha: 100% over 0.6) |
334- | bigscience/bloom-7b1 | 77.44% | 75.46% | 77.02%(alpha=0.5) | 77.45% (alpha: 91.8% over 0.6, 4.9% in [ 0.4, 0.6] , 3.3% under 0.4) |
335- | bigscience/bloom-176b | 84.17% | 82.13% | 83.52% (alpha=0.6) | - |
336- | facebook/opt-125m | 63.89% | 63.48% | 63.44% (alpha=0.5) | 64.14% (alpha: 59.4% over 0.6, 8.1% in [ 0.4, 0.6] , 32.4% under 0.4) |
337- | facebook/opt-1.3b | 75.41% | 73.59% | 70.94% (alpha=0.5) | 74.80% (alpha: 69.9% over 0.6, 24.7% in [ 0.4, 0.6] , 5.5% under 0.4) |
338- | facebook/opt-2.7b | 77.79% | 78.57% | 78.60%(alpha=0.5) | 78.25% (alpha: 73.2% over 0.6, 21.6% in [ 0.4, 0.6] , 5.2% under 0.4) |
339- | facebook/opt-6.7b | 81.26% | 76.65% | 81.58%(alpha=0.5) | 81.39% (alpha: 68.0% over 0.6, 26.8% in [ 0.4, 0.6] , 5.2% under 0.4) |
340- | EleutherAI/gpt-j-6B | 79.17% | 78.82% | 78.84%(alpha=0.6) | 79.29% (alpha: 96.4% over 0.6, 3.6% in [ 0.4, 0.6] ) |
341-
327+ alpha [ 0.4, 0.6] is sweet spot region in SmoothQuant paper.
328+
329+ A list of models that achieved a <1% accuracy drop is shown below.
330+
331+ | Model/Last token accuracy | FP32 Accuracy | INT8 (w/ SmoothQuant) | Notes |
332+ | :----------:| :------:| :------:| -----------------------------------|
333+ | bigscience/bloom-560m | 0.354 | 0.3542 | alpha=0.5, Ipex 2.1 |
334+ | bigscience/bloom-1b7 | 0.4634 | 0.4936 | alpha=0.5, Ipex 2.0 |
335+ | bigscience/bloom-3b | 0.518 | 0.5185 | alpha=0.8, Ipex 2.1 |
336+ | bigscience/bloom-7b1 | 0.5764 | 0.5977 | alpha=0.5, Ipex 2.0 |
337+ | bigscience/bloomz-560m | 0.3947 | 0.3930 | alpha=0.8, Ipex 2.1 |
338+ | bigscience/bloomz-1b7 | 0.4828 | 0.4906 | alpha=0.5, Ipex 2.1 |
339+ | bigscience/bloomz-3b | 0.5018 | 0.4980 | alpha=0.5, Ipex 2.1 |
340+ | bigscience/bloomz-7b1 | 0.5593 | 0.5552 | alpha=0.5, Ipex 2.1 |
341+ | facebook/opt-125m | 0.379 | 0.3757 | alpha=0.5, Ipex 2.1 |
342+ | facebook/opt-350m | 0.4516 | 0.4533 | alpha=0.8, Ipex 2.1 |
343+ | facebook/opt-1.3b | 0.5789 | 0.5742 | alpha=0.8, Ipex 2.0 |
344+ | facebook/opt-2.7b | 0.6365 | 0.6404 | alpha=0.5, Ipex 2.0 |
345+ | facebook/opt-6.7b | 0.6769 | 0.6804 | alpha=0.5, Ipex 2.0 |
346+ | facebook/opt-13b | 0.6872 | 0.6814 | alpha=0.5, Ipex 2.1 |
347+ | facebook/opt-30b | 0.7149 | 0.7128 | alpha=0.5, Ipex 2.1 |
348+ | facebook/opt-66b | 0.7398 | 0.7326 | alpha=0.5, Ipex 2.1 |
349+ | LLaMa-7b | 0.7361 | 0.7357 | alpha=0.8, Ipex 2.1 |
350+ | LLaMa-13b | 0.7627 | 0.7590 | alpha=0.7, Ipex 2.1 |
351+ | LLaMa-30b | 0.7759 | 0.7840 | alpha=0.7, Ipex 2.1 |
352+ | LLaMa-65b | 0.7908 | 0.7957 | alpha=0.9, Ipex 2.1 |
353+ | LLaMa-2-7b | 0.7369/0.7262 | 0.7330 | alpha=Auto, Ipex 2.1/Pytorch |
354+ | EleutherAI/gpt-j-6B | 0.6831 | 0.6821 | alpha=1.0, Ipex 2.1 |
355+ | MBZUAI/LaMini-GPT-124m | 0.3804 | 0.3887 | alpha=0.5, Ipex 2.1 |
356+ | MBZUAI/LaMini-GPT-774m | 0.5048 | 0.5057 | alpha=0.5, Ipex 2.1 |
357+ | MBZUAI/LaMini-GPT-1.5b | 0.5443 | 0.5436 | alpha=0.5, Ipex 2.1 |
358+ | mosaicml/mpt-7b-chat | 0.655 | 0.6499 | alpha=0.7, Ipex 2.1 |
359+ | stabilityai/stablelm-base-alpha-3b | 0.4172 | 0.4149 | alpha=0.6, Ipex 2.1 |
360+ | togethercomputer/RedPajama-INCITE-Base-3B-v1 | 0.6542 | 0.6735 | alpha=0.5, Ipex 2.1 |
361+ | togethercomputer/RedPajama-INCITE-Chat-3B-v1 | 0.6718 | 0.6740 | alpha=0.5, Ipex 2.0 |
362+ | togethercomputer/RedPajama-INCITE-Instruct-3B-v1 | 0.6569 | 0.6621 | alpha=0.5, Ipex 2.0 |
363+ | togethercomputer/RedPajama-INCITE-Base-7B-v0.1 | 0.7143 | 0.7221 | alpha=0.5, Ipex 2.0 |
364+ | togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1 | 0.6895 | 0.6953 | alpha=0.5, Ipex 2.0 |
365+ | databricks/dolly-v1-6b | 0.6866 | 0.6895 | alpha=0.8, Ipex 2.1 |
366+ | databricks/dolly-v2-3b | 0.6297 | 0.6247 | alpha=0.5, Ipex 2.1 |
367+ | tiiuae/falcon-7b-instruct | 0.6437 | 0.6392 | alpha=0.7, Pytorch |
342368
343369## Example
344370
0 commit comments