Skip to content

Commit d09a1a6

Browse files
metascroykirklandsign
authored andcommitted
Update backends-xnnpack.md (#10024)
Update XNNPACK docs
1 parent ec0b16f commit d09a1a6

File tree

1 file changed

+13
-4
lines changed

1 file changed

+13
-4
lines changed

docs/source/backends-xnnpack.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ the core ExecuTorch runtime.
2828
To target the XNNPACK backend during the export and lowering process, pass an instance of the `XnnpackPartitioner` to `to_edge_transform_and_lower`. The example below demonstrates this process using the MobileNet V2 model from torchvision.
2929

3030
```python
31+
import torch
3132
import torchvision.models as models
3233
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
3334
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
@@ -47,10 +48,10 @@ with open("mv2_xnnpack.pte", "wb") as file:
4748

4849
### Partitioner API
4950

50-
The XNNPACK partitioner API allows for configuration of the model delegation to XNNPACK. Passing an `XnnpackPartitioner` instance with no additional parameters will run as much of the model as possible on the XNNPACK backend. This is the most common use-case. For advanced use cases, the partitioner exposes the following options via the [constructor](https://github.com/pytorch/executorch/blob/14ff52ff89a89c074fc6c14d3f01683677783dcd/backends/xnnpack/partition/xnnpack_partitioner.py#L31):
51+
The XNNPACK partitioner API allows for configuration of the model delegation to XNNPACK. Passing an `XnnpackPartitioner` instance with no additional parameters will run as much of the model as possible on the XNNPACK backend. This is the most common use-case. For advanced use cases, the partitioner exposes the following options via the [constructor](https://github.com/pytorch/executorch/blob/release/0.6/backends/xnnpack/partition/xnnpack_partitioner.py#L31):
5152

52-
- `configs`: Control which operators are delegated to XNNPACK. By default, all available operators all delegated. See [../config/\_\_init\_\_.py](https://github.com/pytorch/executorch/blob/14ff52ff89a89c074fc6c14d3f01683677783dcd/backends/xnnpack/partition/config/__init__.py#L66) for an exhaustive list of available operator configs.
53-
- `config_precisions`: Filter operators by data type. By default, delegate all precisions. One or more of `ConfigPrecisionType.FP32`, `ConfigPrecisionType.STATIC_QUANT`, or `ConfigPrecisionType.DYNAMIC_QUANT`. See [ConfigPrecisionType](https://github.com/pytorch/executorch/blob/14ff52ff89a89c074fc6c14d3f01683677783dcd/backends/xnnpack/partition/config/xnnpack_config.py#L24).
53+
- `configs`: Control which operators are delegated to XNNPACK. By default, all available operators all delegated. See [../config/\_\_init\_\_.py](https://github.com/pytorch/executorch/blob/release/0.6/backends/xnnpack/partition/config/__init__.py#L66) for an exhaustive list of available operator configs.
54+
- `config_precisions`: Filter operators by data type. By default, delegate all precisions. One or more of `ConfigPrecisionType.FP32`, `ConfigPrecisionType.STATIC_QUANT`, or `ConfigPrecisionType.DYNAMIC_QUANT`. See [ConfigPrecisionType](https://github.com/pytorch/executorch/blob/release/0.6/backends/xnnpack/partition/config/xnnpack_config.py#L24).
5455
- `per_op_mode`: If true, emit individual delegate calls for every operator. This is an advanced option intended to reduce memory overhead in some contexts at the cost of a small amount of runtime overhead. Defaults to false.
5556
- `verbose`: If true, print additional information during lowering.
5657

@@ -87,15 +88,23 @@ To perform 8-bit quantization with the PT2E flow, perform the following steps pr
8788
The output of `convert_pt2e` is a PyTorch model which can be exported and lowered using the normal flow. As it is a regular PyTorch model, it can also be used to evaluate the accuracy of the quantized model using standard PyTorch techniques.
8889

8990
```python
91+
import torch
92+
import torchvision.models as models
93+
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
9094
from executorch.backends.xnnpack.quantizer.xnnpack_quantizer import XNNPACKQuantizer
95+
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
96+
from executorch.exir import to_edge_transform_and_lower
9197
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
9298
from torch.ao.quantization.quantizer.xnnpack_quantizer import get_symmetric_quantization_config
9399

100+
model = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
101+
sample_inputs = (torch.randn(1, 3, 224, 224), )
102+
94103
qparams = get_symmetric_quantization_config(is_per_channel=True) # (1)
95104
quantizer = XNNPACKQuantizer()
96105
quantizer.set_global(qparams)
97106

98-
training_ep = torch.export.export_for_training(model, sample_inputs).module(), # (2)
107+
training_ep = torch.export.export_for_training(model, sample_inputs).module() # (2)
99108
prepared_model = prepare_pt2e(training_ep, quantizer) # (3)
100109

101110
for cal_sample in [torch.randn(1, 3, 224, 224)]: # Replace with representative model inputs

0 commit comments

Comments
 (0)