From 6fea405e95e61be7eceb0560c57ffe4ac16e7bf1 Mon Sep 17 00:00:00 2001 From: Chen Peter Date: Sun, 20 Oct 2024 20:18:42 +0800 Subject: [PATCH] Apply suggestions from code review Co-authored-by: Tatiana Savina --- .../llm_inference_guide/llm-inference-hf.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst index fb509f806ece95..a26b670b5314d0 100644 --- a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst +++ b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst @@ -304,16 +304,16 @@ mentioned above. Execution on CPU device ########################## -As mentioned on :ref:`Composability of different threading runtimes `, OpenVINO default threading runtime -oneTBB keeps CPU cores actively for a while after inference done. When using Optimum Intel Python API, -it will call Torch (via HF transformers) for postprocessing (for example beam search or gready search). -Torch uses OpenMP for threading, OpenMP will need to wait for CPU cores which are being kept actively by -oneTBB. OpenMP by default has the `busy-wait `__ which can delay the next OpenVINO inference as well. +As mentioned in the :ref:`Composability of different threading runtimes ` section, OpenVINO's default threading runtime, +oneTBB, keeps CPU cores active for a while after inference is done. When using Optimum Intel Python API, +it calls Torch (via HF transformers) for postprocessing, such as beam search or gready search. +Torch uses OpenMP for threading, OpenMP needs to wait for CPU cores that are kept active by +oneTBB. By default, OpenMP uses the `busy-wait `__ which can delay the next OpenVINO inference as well. -The recommendation +It is recommended to: -* Limit the CPU thread number of Torch. `torch.set_num_threads `__ -* Set environment variable `OMP_WAIT_POLICY `__ to PASSIVE which will disable OpenMP `busy-wait `__ +* Limit the number of CPU threads used by Torch with `torch.set_num_threads `__. +* Set the environment variable `OMP_WAIT_POLICY `__ to `PASSIVE`, which disables OpenMP `busy-wait `__. Additional Resources #####################