-
Notifications
You must be signed in to change notification settings - Fork 0
[OVQuantizer] Apply Fixes and Integrate into the Llama Example Workflow #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OVQuantizer] Apply Fixes and Integrate into the Llama Example Workflow #9
Conversation
:param target_node: FX node representing a weighted operation (e.g., Linear, Conv). | ||
:param nncf_graph: NNCFGraph used to determine weight port indices. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
def _get_weight_edge( | ||
target_node: torch.fx.Node, | ||
nncf_graph: NNCFGraph, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
): | |
) -> tuple[torch.fx.Node, torch.fx.Node]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
:param graph: The underlying FX graph. | ||
:param nncf_graph: The corresponding NNCF graph. | ||
:param node_vs_torch_annotation: A mapping of FX nodes to quantization annotations. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
model: torch.fx.GraphModule, | ||
graph: torch.fx.Graph, | ||
nncf_graph: NNCFGraph, | ||
node_vs_torch_annotation: DefaultDict[torch.fx.Node, QuantizationAnnotation], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please create the defaultdicts in each function separately and remove the node_vs_torch_annotation
parameter?
else: | ||
return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else: | |
return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype) | |
return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
q_weight: torch.Tensor, | ||
original_weight: torch.Tensor, | ||
) -> BaseWeightsDecompressor: | ||
if zero_point is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we invert the condition here? IMHO is None
is clearer than is not None
:)
if zero_point is not None: | |
if zero_point is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
q_weight: torch.Tensor, | ||
original_weight: torch.Tensor, | ||
) -> BaseWeightsDecompressor: | ||
if zero_point is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comment as above regarding the condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
observer: Type[UniformQuantizationObserverBase] | ||
|
||
extra_args: Dict[str, Any] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the wc_param
as an actual keyword here. A dict is not needed here
observer: Type[UniformQuantizationObserverBase] | |
extra_args: Dict[str, Any] = {} | |
observer: Type[WeightObserverBase] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, Done
) | ||
return QuantizationSpec( | ||
dtype=dtype, | ||
observer_or_fake_quant_ctr=observer.with_args(**extra_args), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call the constructor directly here?
return qnn_quantizer, quant_dtype | ||
|
||
|
||
def get_ov_quantizer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ignored scope in this function is very-very model specific. I suggest to name this function get_ov_quantizer_for_modelname
and to add a small docstring to it
Co-authored-by: Daniil Lyakhov <daniil.lyakhov@intel.com>
21c43fe
into
cavusmustafa:openvino_llama_support
Summary
OpenVINO Quantizer is refactored and mixed precision by manually setting ignored scope is added.
To use this openvino quantizer path,
--pt2e_quantize openvino_8da4w
can be used for INT4 weight compression and--pt2e_quantize openvino_8da8w
for INT8 weight compression.