You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/pytorch/nlp/huggingface_models/language-modeling/quantization/ptq_weight_only/README.md
+19-2Lines changed: 19 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,7 @@ sh run_quant.sh --topology=gpt_j_wikitext_weight_only --input_model=EleutherAI/g
37
37
>
38
38
> `weight_only_bits`, `weight_only_group`, `weight_only_scheme`, and `weight_only_algorithm` can be modified by user. For details, please refer to [README](../../../../../../../docs/source/quantization_weight_only.md).
Notes: for per channel quantization, set group_size to **-1**, otherwise 32, 128, etc. More comprehensive details about user-defined arguments are available at our [weight_onlly quantization documentations](https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#quantization-capability)
61
61
62
+
### Run general examples on a wide variety of LLMs using GPTQ
63
+
We also support GPTQ algorithm on various language models (OPTs, Blooms, LLaMAs, MPTs, Falcons, ChatGLMs, etc.) in a generalized code. Please refer to script *run-gptq-llm.py* for more information.
64
+
65
+
You can simply use following command to do quantization (please refer to *run-gptq-llm.sh*).
Copy file name to clipboardExpand all lines: examples/pytorch/nlp/huggingface_models/language-modeling/quantization/ptq_weight_only/run_gptj_mlperf_int4.py
Copy file name to clipboardExpand all lines: examples/pytorch/nlp/huggingface_models/language-modeling/quantization/ptq_weight_only/run_gptj_mlperf_int4.sh
0 commit comments