[ONNX] Compress quantize weights transformation#3662
Conversation
| copied_parameters = AdvancedQuantizationParameters() | ||
| else: | ||
| copied_parameters = deepcopy(advanced_quantization_parameters) | ||
| copied_parameters.backend_params[BackendParameters.COMPRESS_WEIGHTS] = False |
There was a problem hiding this comment.
Why should we update this parameter here?
There was a problem hiding this comment.
We need to disable COMPRESS_WEIGHT here to properly remove Quantize-Dequantize pairs during the quantize_with_accuracy_control() pipeline. For reference, we do the same for the OpenVINO backend.
| ) | ||
|
|
||
| check_operation_count(quantized_model, {"QuantizeLinear": 2, "DequantizeLinear": 2}) | ||
| compress_quantize_weights_transformation(quantized_model) |
There was a problem hiding this comment.
This test is covering the transformation, but I would suggest to additionally test the nncf.quantize with COMPRESS_WEIGHTS: True to check that the API is working as expected.
There was a problem hiding this comment.
I don't think it is necessary here. We already have an end-to-end test (tests/cross_fw/examples/test_examples.py[post_training_quantization_onnx_mobilenet_v2] where we compare the model's compression rate with the reference. So we'll be able to catch the error there if COMPRESS_WEIGHTS: True doesn't work as expected.
ff287ac
into
openvinotoolkit:develop
### Changes Revert the temporary test changes that were introduced in PR #3662
Changes
Added the
compress_quantize_weights_transformation()method that transforms the model by foldingQuantizeLinearnodes with constant inputs into precomputed, quantized initializers.Reason for changes
Related tickets
Ref: 101733
Tests