Skip to content

Comments

[ONNX] Compress quantize weights transformation#3662

Merged
andrey-churkin merged 12 commits intoopenvinotoolkit:developfrom
andrey-churkin:ac/compress_quantize_weights_transformation
Oct 6, 2025
Merged

[ONNX] Compress quantize weights transformation#3662
andrey-churkin merged 12 commits intoopenvinotoolkit:developfrom
andrey-churkin:ac/compress_quantize_weights_transformation

Conversation

@andrey-churkin
Copy link
Contributor

@andrey-churkin andrey-churkin commented Sep 19, 2025

Changes

Added the compress_quantize_weights_transformation() method that transforms the model by folding QuantizeLinear nodes with constant inputs into precomputed, quantized initializers.

Reason for changes

  • Models after NNCF PTQ should be saved with INT8 weights for the ONNX backend.

Related tickets

Ref: 101733

Tests

  • tests/cross_fw/examples/test_examples.py[post_training_quantization_onnx_mobilenet_v2]
  • tests/onnx/test_passes.py

@andrey-churkin andrey-churkin requested a review from a team as a code owner September 19, 2025 08:09
@andrey-churkin andrey-churkin marked this pull request as draft September 19, 2025 08:09
@andrey-churkin andrey-churkin marked this pull request as ready for review September 23, 2025 07:34
@github-actions github-actions bot added the NNCF ONNX Pull requests that updates NNCF ONNX label Sep 23, 2025
@andrey-churkin
Copy link
Contributor Author

andrey-churkin commented Sep 23, 2025

@github-actions github-actions bot added the NNCF PTQ Pull requests that updates NNCF PTQ label Sep 25, 2025
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor

copied_parameters = AdvancedQuantizationParameters()
else:
copied_parameters = deepcopy(advanced_quantization_parameters)
copied_parameters.backend_params[BackendParameters.COMPRESS_WEIGHTS] = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we update this parameter here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to disable COMPRESS_WEIGHT here to properly remove Quantize-Dequantize pairs during the quantize_with_accuracy_control() pipeline. For reference, we do the same for the OpenVINO backend.

copied_parameters.backend_params[BackendParameters.COMPRESS_WEIGHTS] = False

)

check_operation_count(quantized_model, {"QuantizeLinear": 2, "DequantizeLinear": 2})
compress_quantize_weights_transformation(quantized_model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is covering the transformation, but I would suggest to additionally test the nncf.quantize with COMPRESS_WEIGHTS: True to check that the API is working as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is necessary here. We already have an end-to-end test (tests/cross_fw/examples/test_examples.py[post_training_quantization_onnx_mobilenet_v2] where we compare the model's compression rate with the reference. So we'll be able to catch the error there if COMPRESS_WEIGHTS: True doesn't work as expected.

@andrey-churkin andrey-churkin merged commit ff287ac into openvinotoolkit:develop Oct 6, 2025
34 of 36 checks passed
andrey-churkin added a commit that referenced this pull request Oct 6, 2025
### Changes

Revert the temporary test changes that were introduced in PR
#3662
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NNCF ONNX Pull requests that updates NNCF ONNX NNCF PTQ Pull requests that updates NNCF PTQ

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants