From 6fea405e95e61be7eceb0560c57ffe4ac16e7bf1 Mon Sep 17 00:00:00 2001
From: Chen Peter <peter.chen@intel.com>
Date: Sun, 20 Oct 2024 20:18:42 +0800
Subject: [PATCH] Apply suggestions from code review

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
---
 .../llm_inference_guide/llm-inference-hf.rst     | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
index fb509f806ece95..a26b670b5314d0 100644
--- a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
+++ b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
@@ -304,16 +304,16 @@ mentioned above.
 Execution on CPU device
 ##########################
 
-As mentioned on :ref:`Composability of different threading runtimes <Composability_of_different_threading_runtimes>`, OpenVINO default threading runtime
-oneTBB keeps CPU cores actively for a while after inference done. When using Optimum Intel Python API,
-it will call Torch (via HF transformers) for postprocessing (for example beam search or gready search).
-Torch uses OpenMP for threading, OpenMP will need to wait for CPU cores which are being kept actively by
-oneTBB. OpenMP by default has the `busy-wait <https://gcc.gnu.org/onlinedocs/libgomp/GOMP_005fSPINCOUNT.html>`__ which can delay the next OpenVINO inference as well.
+As mentioned in the :ref:`Composability of different threading runtimes <Composability_of_different_threading_runtimes>` section, OpenVINO's default threading runtime,
+oneTBB, keeps CPU cores active for a while after inference is done. When using Optimum Intel Python API,
+it calls Torch (via HF transformers) for postprocessing, such as beam search or gready search.
+Torch uses OpenMP for threading, OpenMP needs to wait for CPU cores that are kept active by
+oneTBB. By default, OpenMP uses the `busy-wait <https://gcc.gnu.org/onlinedocs/libgomp/GOMP_005fSPINCOUNT.html>`__ which can delay the next OpenVINO inference as well.
 
-The recommendation
+It is recommended to:
 
-* Limit the CPU thread number of Torch. `torch.set_num_threads <https://pytorch.org/docs/stable/generated/torch.set_num_threads.html>`__
-* Set environment variable `OMP_WAIT_POLICY <https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fWAIT_005fPOLICY.html>`__ to PASSIVE which will disable OpenMP `busy-wait <https://gcc.gnu.org/onlinedocs/libgomp/GOMP_005fSPINCOUNT.html>`__
+* Limit the number of CPU threads used by Torch with `torch.set_num_threads <https://pytorch.org/docs/stable/generated/torch.set_num_threads.html>`__.
+* Set the environment variable `OMP_WAIT_POLICY <https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fWAIT_005fPOLICY.html>`__ to `PASSIVE`, which disables OpenMP `busy-wait <https://gcc.gnu.org/onlinedocs/libgomp/GOMP_005fSPINCOUNT.html>`__.
 
 Additional Resources
 #####################