Description
Context
The current PaddlePaddle quantization implementation is different from ONNX,.
Same
-
PaddlePaddle translates
quantize_linear
anddequantize_linear
in the paddle frontend. -
ONNX just translates
quantize_linear
anddequantize_linear
without any transformation in the onnx frontend.
Difference
PaddlePaddle fuses the quantize_linear
and dequantize_linear
into FakeQuantize
using a custom pass( https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/paddle/src/internal/pass/transform_fakequantize.cpp)
but ONNX FE just doesn't.
It is hard to maintain the almost same logic. So need to refactor PaddlePaddle quantization like ONNX.
Also, more patterns in the model will affect transformation performance.
What needs to be done?
- Ignore the HALF_AWAY_FROM_ZERO round mode directly and aggressively. It is just for performance.
- Remove or refactor the custom pass.
LTP has done what custom pass(https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/paddle/src/internal/pass/transform_fakequantize.cpp) does. So, need to remove or refactor the custom pass. Prepare quantization pattern for LTP(Low Precision Transformation). - Refactor the PDPD FE to decrease the quantization pattern if need.
Example Pull Requests
Please refer to #14834 for more comments and background.
test case: #20689
Resources
- Contribution guide - start here!
- Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
- What is OpenVINO?
- User documentation
- Paddle Slim Model: https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/cn/quantize.md
Contact points
Ticket
104434
Metadata
Assignees
Type
Projects
Status
In Review