Update NPU GenAI guide #27788

helena-intel · 2024-11-28T09:59:39Z

fix optimum-cli typo
update optimum install instructions
clarify model caching
other small updates

For the optimum install I chose to use upgrade even though that is not necessary in a clean env, but many people will not create the clean env and then using --upgrade-strategy eager prevents issues.
I renamed "preferred" to "supported" for symmetric mode, because preferred may give the impression that asym is not ideal, but not a requirement, but many models don't work at all with asym mode.

- fix optimum-cli typo - update optimum install instructions - clarify model caching - other small updates

dmatveev · 2024-12-05T11:52:35Z

docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst

@@ -44,7 +43,7 @@ You select one of the methods by setting the ``--group-size`` parameter to eithe
      .. code-block:: console
         :name: group-quant

-         optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group_size 128 TinyLlama-1.1B-Chat-v1.0
+         optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group-size 128 TinyLlama-1.1B-Chat-v1.0


@TolyaTalamanov 👀👀👀

dmatveev · 2024-12-05T11:53:44Z

docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst

-   pip install nncf==2.12 onnx==1.16.1 optimum-intel==1.19.0
-   pip install openvino==2024.5 openvino-tokenizers==2024.5 openvino-genai==2024.5
+   pip install --upgrade --upgrade-strategy eager optimum[openvino] openvino-genai>=2024.5


Unfortunately, models converted without the aforementioned components fixed to certain versions are not guaranteed to work

A downside of these pinnings is that the docs usually do not get updated. With optimum-intel 1.19.0 some models that are supported on NPU cannot be exported. Optimum Intel 1.19 does not officially support OpenVINO 2024.5, and it only supports transformers up to 4.44. We can never completely guarantee that commands don't break, but in other documentation we do not limit versions because usually newer versions have more upsides (fixes) than downsides. Optimum-intel has a bunch of dependencies and as time goes by, chances increase that with older versions one of these dependencies breaks (a dependency of a dependency does not support Python 3.12 for example), and of there can be security issues with older versions.

What about keeping the instructions more general, but also mentioning that models are verified with these specific versions?

Since it's targeted to 2024/5 branch it should be openvino-genai==2024.5, shouldn't it?

I assume we still need to specify nncf==2.12 (CC: @dmatveev )

P.S I'd prefer not changing this part at all...

dmatveev · 2024-12-05T11:54:09Z

docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst

-            1. Update NNCF: ``pip install nncf==2.13``
+            1. Update NNCF: ``pip install --upgrade nncf``


We have exact component versions here for a reason

Is the idea that users must absolutely install NNCF 2.12 for everything (as mentioned in prerequisites), but then switch to 2.13 for channel wise quantization? And then switch back to 2.12 if they want to use group quantization? If the reason for these specific versions is that that was tested at some point, then see my comment above.

--scale_estimation has been added in 2.13 release. So yes, the general recommendation was to use 2.12 and upgrade to 2.13 when scale_estimation is needed.

Also wouldn't change this part...

dmatveev · 2024-12-05T11:58:40Z

docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst

@@ -27,7 +26,7 @@ such as Llama-2-7B, Mistral-0.2-7B, and Qwen-2-7B.
 Export an LLM model via Hugging Face Optimum-Intel
 ##################################################

-Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make
+Since **symmetrically-quantized 4-bit (INT4) models are supported for inference on NPU**, make
 sure to export the model with the proper conversion and optimization settings.

 | You may export LLMs via Optimum-Intel, using one of two compression methods:


So let's please rework this text as:

**group quantization** - recommended for smaller models (<4B parameters) **channel-wise quantization** - recommended for larger models (>4B parameters)

Thank you, updated.

dmatveev · 2024-12-09T11:38:20Z

Assigned this PR on @TolyaTalamanov now as he's back.
One more item I'd suggest is to have "channel-wise" quantization tab opened first as it is still the recommended option:

TolyaTalamanov · 2024-12-09T12:30:46Z

Assigned this PR on @TolyaTalamanov now as he's back. One more item I'd suggest is to have "channel-wise" quantization tab opened first as it is still the recommended option:

@helena-intel could you add this, please?

Update NPU GenAI guide

da3ac54

- fix optimum-cli typo - update optimum install instructions - clarify model caching - other small updates

helena-intel requested a review from a team as a code owner November 28, 2024 09:59

helena-intel requested review from akopytko and removed request for a team November 28, 2024 09:59

github-actions bot added the category: docs OpenVINO documentation label Nov 28, 2024

dmatveev reviewed Dec 5, 2024

View reviewed changes

Update genai-guide-npu.rst

038e200

dmatveev assigned TolyaTalamanov Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update NPU GenAI guide #27788

Update NPU GenAI guide #27788

helena-intel commented Nov 28, 2024

dmatveev Dec 5, 2024

dmatveev Dec 5, 2024

helena-intel Dec 5, 2024

TolyaTalamanov Dec 9, 2024

dmatveev Dec 5, 2024

helena-intel Dec 5, 2024

TolyaTalamanov Dec 9, 2024

dmatveev Dec 5, 2024

helena-intel Dec 5, 2024

dmatveev commented Dec 9, 2024 •

edited

Loading

TolyaTalamanov commented Dec 9, 2024

		1. Update NNCF: ``pip install nncf==2.13``
		1. Update NNCF: ``pip install --upgrade nncf``

Update NPU GenAI guide #27788

Are you sure you want to change the base?

Update NPU GenAI guide #27788

Conversation

helena-intel commented Nov 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmatveev commented Dec 9, 2024 • edited Loading

TolyaTalamanov commented Dec 9, 2024

dmatveev commented Dec 9, 2024 •

edited

Loading