[OVQuantizer] Apply Fixes and Integrate into the Llama Example Workflow #9

anzr299 · 2025-08-26T14:42:08Z

Summary

OpenVINO Quantizer is refactored and mixed precision by manually setting ignored scope is added.

To use this openvino quantizer path, --pt2e_quantize openvino_8da4w can be used for INT4 weight compression and --pt2e_quantize openvino_8da8w for INT8 weight compression.

examples/models/llama/export_llama_lib.py

backends/openvino/quantizer/observers.py

backends/openvino/quantizer/quantizer.py

backends/openvino/quantizer/observers.py

extension/llm/export/quantizer_lib.py

examples/models/llama/export_llama_lib.py

backends/openvino/quantizer/quantizer.py

…4w and so on

…o Asymmetric

daniil-lyakhov · 2025-09-08T12:00:51Z

backends/openvino/quantizer/quantizer.py

+
+        :param target_node: FX node representing a weighted operation (e.g., Linear, Conv).
+        :param nncf_graph: NNCFGraph used to determine weight port indices.
+


Suggested change

daniil-lyakhov · 2025-09-08T12:01:27Z

backends/openvino/quantizer/quantizer.py

+    def _get_weight_edge(
+        target_node: torch.fx.Node,
+        nncf_graph: NNCFGraph,
+    ):


Suggested change

):

) -> tuple[torch.fx.Node, torch.fx.Node]:

daniil-lyakhov · 2025-09-08T12:02:11Z

backends/openvino/quantizer/quantizer.py

+        :param graph: The underlying FX graph.
+        :param nncf_graph: The corresponding NNCF graph.
+        :param node_vs_torch_annotation: A mapping of FX nodes to quantization annotations.
+


Suggested change

daniil-lyakhov · 2025-09-08T12:16:35Z

backends/openvino/quantizer/quantizer.py

+        model: torch.fx.GraphModule,
+        graph: torch.fx.Graph,
+        nncf_graph: NNCFGraph,
+        node_vs_torch_annotation: DefaultDict[torch.fx.Node, QuantizationAnnotation],


Could you please create the defaultdicts in each function separately and remove the node_vs_torch_annotation parameter?

daniil-lyakhov · 2025-09-08T12:17:34Z

backends/openvino/quantizer/observers.py

+        else:
+            return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype)


Suggested change

else:

return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype)

return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype)

daniil-lyakhov · 2025-09-08T12:28:30Z

backends/openvino/quantizer/observers.py

+        q_weight: torch.Tensor,
+        original_weight: torch.Tensor,
+    ) -> BaseWeightsDecompressor:
+        if zero_point is not None:


What if we invert the condition here? IMHO is None is clearer than is not None :)

Suggested change

if zero_point is not None:

if zero_point is None:

daniil-lyakhov · 2025-09-08T12:29:41Z

backends/openvino/quantizer/observers.py

+        q_weight: torch.Tensor,
+        original_weight: torch.Tensor,
+    ) -> BaseWeightsDecompressor:
+        if zero_point is not None:


The same comment as above regarding the condition

daniil-lyakhov · 2025-09-08T12:32:45Z

backends/openvino/quantizer/quantizer.py

+        observer: Type[UniformQuantizationObserverBase]
+
+        extra_args: Dict[str, Any] = {}


Let's use the wc_param as an actual keyword here. A dict is not needed here

Suggested change

observer: Type[UniformQuantizationObserverBase]

extra_args: Dict[str, Any] = {}

observer: Type[WeightObserverBase]

Alright, Done

daniil-lyakhov · 2025-09-08T12:33:33Z

backends/openvino/quantizer/quantizer.py

+            )
+        return QuantizationSpec(
+            dtype=dtype,
+            observer_or_fake_quant_ctr=observer.with_args(**extra_args),


Can we call the constructor directly here?

daniil-lyakhov · 2025-09-08T12:35:10Z

extension/llm/export/quantizer_lib.py

    return qnn_quantizer, quant_dtype


+def get_ov_quantizer(


The ignored scope in this function is very-very model specific. I suggest to name this function get_ov_quantizer_for_modelname and to add a small docstring to it

Co-authored-by: Daniil Lyakhov <daniil.lyakhov@intel.com>

examples/models/llama/export_llama_lib.py

…xecutorch into an/ovquantizer

anzr299 added 4 commits August 26, 2025 12:31

openvino quantizer refactored

30a1a25

fixes

4cc7694

support all_layers, backup mode in OVQuantizer

5da40a5

clean up and use new nncf method for obtaining compression parameters

9e65a7e

daniil-lyakhov suggested changes Aug 29, 2025

View reviewed changes

anzr299 added 3 commits September 1, 2025 10:39

review changes & update method names according to wc algo

53e0f4c

review changes

bf95930

review changes

2d4bec7

daniil-lyakhov suggested changes Sep 1, 2025

View reviewed changes

anzr299 added 10 commits September 3, 2025 20:48

Update export_llama_lib.py

0a2e361

use new transformations

c8ea777

add comment for manual MP allocation

a6b605f

remove nncf_compression from export llama lib

9614fc4

change pt2e quantize flag to use openvino_4wo instead of openvino_8da…

45007cf

…4w and so on

follow up to last commit

9d49414

update quantizer lib with openvino_4wo

d6727cf

split qspec function into 2 parts; 1 for WC and other for PTQ qspecs

4a0a781

micro fix

f6a1ee3

udpate mixed precision layers for higher accuracy. Change INT4 mode t…

d285fcc

…o Asymmetric

daniil-lyakhov suggested changes Sep 8, 2025

View reviewed changes

anzr299 and others added 4 commits September 8, 2025 18:12

Apply suggestions from code review

4e66df1

Co-authored-by: Daniil Lyakhov <daniil.lyakhov@intel.com>

Review changes

e850e41

review changes in quantizer

204043f

revert extra args changes

ae6b089

cavusmustafa reviewed Sep 8, 2025

View reviewed changes

examples/models/llama/export_llama_lib.py Outdated Show resolved Hide resolved

anzr299 added 5 commits September 9, 2025 11:17

Merge branch 'openvino_llama_support' of https://github.com/anzr299/e…

a6f036c

…xecutorch into an/ovquantizer

precommit fixes

2de5693

revert _calculate_qparams back to calculate_qparams

0e10f28

remove manual ignored nodes

05f5a92

add ratio to quantizer initialization

fbe0e21

anzr299 added 2 commits September 11, 2025 23:04

Update export_llama_lib.py

6bff1cd

Update quantizer_lib.py

d744ae9

suryasidd merged commit 21c43fe into cavusmustafa:openvino_llama_support Sep 11, 2025
2 of 109 checks passed


		:param target_node: FX node representing a weighted operation (e.g., Linear, Conv).
		:param nncf_graph: NNCFGraph used to determine weight port indices.

		else:
		return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype)

		observer: Type[UniformQuantizationObserverBase]

		extra_args: Dict[str, Any] = {}

[OVQuantizer] Apply Fixes and Integrate into the Llama Example Workflow #9

[OVQuantizer] Apply Fixes and Integrate into the Llama Example Workflow #9

Uh oh!

Conversation

anzr299 commented Aug 26, 2025

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniil-lyakhov Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daniil-lyakhov Sep 8, 2025 •

edited

Loading