Qualcomm AI Engine Direct - Backend awareness quantizer#17665
Qualcomm AI Engine Direct - Backend awareness quantizer#17665shewu-quic wants to merge 2 commits intopytorch:mainfrom
Conversation
ummary:
- Refactor QnnQuantizer
- Add support for backend-specific annotation by implementing lazy
loading of `xxx_rules.py` in `registry_loader.py`.
- Enable validation for quantization annotation with `BackendOpInfo`
- Add the `backend` and `soc_model` parameters to allow
configuration of `BackendOpInfo`.
- Introduce a `strict` parameter. By default, it is enabled, causing
the validation stage to `raise ValueError` if quantization
constraints are not met. In this mode, all quantization constraints must
be satisfied to fully delegate to the QNN Backend. If disabled, the
process will only log warnings instead.
- Validation items include:
- Verify `htp_arch` for LPBQ support such as conv2d op.
- Verify `htp_arch` for 16a16w support such as matmul op.
- Ensure `SharedQuantizationSpec` is used for `is_math_invariant`
such as view op.
- Check `scale` and `zero_point` constraints for certain ops, such
as requiring `scale = 1 / (q_max - q_min + 1)` and `zero_point =
0` for sigmoid op.
- Confirm `qscheme` meets symmetric constraints.
- Validate that the `dtype` of input and output is supported.
- Add a file `backend_opinfo_adapter.py` which adapts `BackendOpInfo`
from the QNN SDK for use with ExecuTorch
- the `BackendOpInfo` API is supported starting from QNN SDK 2.41 and
above.
- The `BackendOpInfo` library contains a list of quantization
constraints for each operator. These quantization constraints refer
to the [operator definitions in QNN
documents](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-10/operations.html#backend-supplements).
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17665
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New Failures, 2 Unrelated FailuresAs of commit 78161c0 with merge base a5423eb ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
Hi @cccclai, |
Summary:
backend_opinfo_adapter.pywhich adaptsBackendOpInfofrom the QNN SDK for use with ExecuTorchBackendOpInfoAPI is supported starting from QNN SDK 2.41 and above.BackendOpInfowhich is a pybind library contains a list of quantization constraints for each operator. These quantization constraints refer to the operator definitions in QNN documents.xxx_rules.pyinregistry_loader.py.BackendOpInfoBackendOpInfoAPI is not supported in those versions..backendandsoc_modelparameters to allow configuration ofBackendOpInfo.strictparameter. By default, it is enabled, causing the validation stage toraise ValueErrorif quantization constraints are not met. In this mode, all quantization constraints must be satisfied to fully delegate to the QNN Backend. If disabled, the process will only log warnings instead.htp_archfor LPBQ support such as conv2d op.htp_archfor 16a16w support such as matmul op. - EnsureSharedQuantizationSpecis used foris_math_invariantsuch as view op. - Checkscaleandzero_pointconstraints for certain ops, such as requiringscale = 1 / (q_max - q_min + 1)andzero_point = 0for sigmoid op.qschememeets symmetric constraints.dtypeof input and output is supported.Test plan
Successfully tested
test_qnn_delegate.pyandstatic llamawith QNN version 2.41 and above.