Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NPU GenAI guide #27788

Open
wants to merge 2 commits into
base: releases/2024/5
Choose a base branch
from

Conversation

helena-intel
Copy link
Contributor

  • fix optimum-cli typo
  • update optimum install instructions
  • clarify model caching
  • other small updates

For the optimum install I chose to use upgrade even though that is not necessary in a clean env, but many people will not create the clean env and then using --upgrade-strategy eager prevents issues.
I renamed "preferred" to "supported" for symmetric mode, because preferred may give the impression that asym is not ideal, but not a requirement, but many models don't work at all with asym mode.

- fix optimum-cli typo
- update optimum install instructions
- clarify model caching
- other small updates
@helena-intel helena-intel requested a review from a team as a code owner November 28, 2024 09:59
@helena-intel helena-intel requested review from akopytko and removed request for a team November 28, 2024 09:59
@github-actions github-actions bot added the category: docs OpenVINO documentation label Nov 28, 2024
@@ -44,7 +43,7 @@ You select one of the methods by setting the ``--group-size`` parameter to eithe
.. code-block:: console
:name: group-quant

optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group_size 128 TinyLlama-1.1B-Chat-v1.0
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group-size 128 TinyLlama-1.1B-Chat-v1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TolyaTalamanov 👀👀👀

pip install nncf==2.12 onnx==1.16.1 optimum-intel==1.19.0
pip install openvino==2024.5 openvino-tokenizers==2024.5 openvino-genai==2024.5
pip install --upgrade --upgrade-strategy eager optimum[openvino] openvino-genai>=2024.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, models converted without the aforementioned components fixed to certain versions are not guaranteed to work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A downside of these pinnings is that the docs usually do not get updated. With optimum-intel 1.19.0 some models that are supported on NPU cannot be exported. Optimum Intel 1.19 does not officially support OpenVINO 2024.5, and it only supports transformers up to 4.44. We can never completely guarantee that commands don't break, but in other documentation we do not limit versions because usually newer versions have more upsides (fixes) than downsides. Optimum-intel has a bunch of dependencies and as time goes by, chances increase that with older versions one of these dependencies breaks (a dependency of a dependency does not support Python 3.12 for example), and of there can be security issues with older versions.

What about keeping the instructions more general, but also mentioning that models are verified with these specific versions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's targeted to 2024/5 branch it should be openvino-genai==2024.5, shouldn't it?

I assume we still need to specify nncf==2.12 (CC: @dmatveev )

P.S I'd prefer not changing this part at all...

Comment on lines -65 to +64
1. Update NNCF: ``pip install nncf==2.13``
1. Update NNCF: ``pip install --upgrade nncf``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have exact component versions here for a reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that users must absolutely install NNCF 2.12 for everything (as mentioned in prerequisites), but then switch to 2.13 for channel wise quantization? And then switch back to 2.12 if they want to use group quantization? If the reason for these specific versions is that that was tested at some point, then see my comment above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--scale_estimation has been added in 2.13 release. So yes, the general recommendation was to use 2.12 and upgrade to 2.13 when scale_estimation is needed.

Also wouldn't change this part...

@@ -27,7 +26,7 @@ such as Llama-2-7B, Mistral-0.2-7B, and Qwen-2-7B.
Export an LLM model via Hugging Face Optimum-Intel
##################################################

Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make
Since **symmetrically-quantized 4-bit (INT4) models are supported for inference on NPU**, make
sure to export the model with the proper conversion and optimization settings.

| You may export LLMs via Optimum-Intel, using one of two compression methods:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's please rework this text as:

**group quantization** - recommended for smaller models (<4B parameters)
**channel-wise quantization** - recommended for larger models (>4B parameters)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, updated.

@dmatveev
Copy link
Contributor

dmatveev commented Dec 9, 2024

Assigned this PR on @TolyaTalamanov now as he's back.
One more item I'd suggest is to have "channel-wise" quantization tab opened first as it is still the recommended option:

image

@TolyaTalamanov
Copy link
Contributor

Assigned this PR on @TolyaTalamanov now as he's back. One more item I'd suggest is to have "channel-wise" quantization tab opened first as it is still the recommended option:

image

@helena-intel could you add this, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: docs OpenVINO documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants