Skip to content

Conversation

anzr299
Copy link

@anzr299 anzr299 commented Aug 26, 2025

Summary

OpenVINO Quantizer is refactored and mixed precision by manually setting ignored scope is added.

To use this openvino quantizer path, --pt2e_quantize openvino_8da4w can be used for INT4 weight compression and --pt2e_quantize openvino_8da8w for INT8 weight compression.

:param target_node: FX node representing a weighted operation (e.g., Linear, Conv).
:param nncf_graph: NNCFGraph used to determine weight port indices.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def _get_weight_edge(
target_node: torch.fx.Node,
nncf_graph: NNCFGraph,
):
Copy link

@daniil-lyakhov daniil-lyakhov Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
):
) -> tuple[torch.fx.Node, torch.fx.Node]:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

:param graph: The underlying FX graph.
:param nncf_graph: The corresponding NNCF graph.
:param node_vs_torch_annotation: A mapping of FX nodes to quantization annotations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

model: torch.fx.GraphModule,
graph: torch.fx.Graph,
nncf_graph: NNCFGraph,
node_vs_torch_annotation: DefaultDict[torch.fx.Node, QuantizationAnnotation],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please create the defaultdicts in each function separately and remove the node_vs_torch_annotation parameter?

Comment on lines 196 to 197
else:
return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else:
return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype)
return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

q_weight: torch.Tensor,
original_weight: torch.Tensor,
) -> BaseWeightsDecompressor:
if zero_point is not None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we invert the condition here? IMHO is None is clearer than is not None :)

Suggested change
if zero_point is not None:
if zero_point is None:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

q_weight: torch.Tensor,
original_weight: torch.Tensor,
) -> BaseWeightsDecompressor:
if zero_point is not None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment as above regarding the condition

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +424 to +426
observer: Type[UniformQuantizationObserverBase]

extra_args: Dict[str, Any] = {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the wc_param as an actual keyword here. A dict is not needed here

Suggested change
observer: Type[UniformQuantizationObserverBase]
extra_args: Dict[str, Any] = {}
observer: Type[WeightObserverBase]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, Done

)
return QuantizationSpec(
dtype=dtype,
observer_or_fake_quant_ctr=observer.with_args(**extra_args),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call the constructor directly here?

return qnn_quantizer, quant_dtype


def get_ov_quantizer(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ignored scope in this function is very-very model specific. I suggest to name this function get_ov_quantizer_for_modelname and to add a small docstring to it

@suryasidd suryasidd merged commit 21c43fe into cavusmustafa:openvino_llama_support Sep 11, 2025
2 of 109 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants