Skip to content

Commit d6e14fc

Browse files
Arm backend: Add ethos_u_minimal_example jupyter notebook (#9543)
This example provides a good base understanding of all steps needed to lower and run a Pytorch module through the EthousUBackend. It also works as a starting point for creating scripts for lowering a custom network.
1 parent 730a4d8 commit d6e14fc

File tree

2 files changed

+294
-0
lines changed

2 files changed

+294
-0
lines changed

examples/arm/README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,16 @@ $ source executorch/examples/arm/ethos-u-scratch/setup_path.sh
3232
$ executorch/examples/arm/run.sh --model_name=mv2 --target=ethos-u85-128 [--scratch-dir=same-optional-scratch-dir-as-before]
3333
```
3434

35+
### Ethos-U minimal example
36+
37+
See the jupyter notebook `ethos_u_minimal_example.ipynb` for an explained minimal example of the full flow for running a
38+
PyTorch module on the EthosUDelegate. The notebook runs directly in some IDE:s s.a. VS Code, otherwise it can be run in
39+
your browser using
40+
```
41+
pip install jupyter
42+
jupyter notebook ethos_u_minimal_example.ipynb
43+
```
44+
3545
### Online Tutorial
3646

3747
We also have a [tutorial](https://pytorch.org/executorch/stable/executorch-arm-delegate-tutorial.html) explaining the steps performed in these
Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": null,
6+
"metadata": {},
7+
"outputs": [],
8+
"source": [
9+
"# Copyright 2025 Arm Limited and/or its affiliates.\n",
10+
"#\n",
11+
"# This source code is licensed under the BSD-style license found in the\n",
12+
"# LICENSE file in the root directory of this source tree."
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"metadata": {},
18+
"source": [
19+
"# Ethos-U delegate flow example\n",
20+
"\n",
21+
"This guide demonstrates the full flow for running a module on Arm Ethos-U using ExecuTorch. \n",
22+
"Tested on Linux x86_64 and macOS aarch64. If something is not working for you, please raise a GitHub issue and tag Arm.\n",
23+
"\n",
24+
"Before you begin:\n",
25+
"1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n",
26+
"2. Install Arm cross-compilation toolchain and simulators using `examples/arm/setup.sh --i-agree-to-the-contained-eula`\n",
27+
"3. Add Arm cross-compilation toolchain and simulators to PATH using `examples/arm/ethos-u-scratch/setup_path.sh` \n",
28+
"\n",
29+
"With all commands executed from the base `executorch` folder.\n",
30+
"\n",
31+
"\n",
32+
"\n",
33+
"*Some scripts in this notebook produces long output logs: Configuring the 'Customizing Notebook Layout' settings to enable 'Output:scrolling' and setting 'Output:Text Line Limit' makes this more manageable*"
34+
]
35+
},
36+
{
37+
"cell_type": "markdown",
38+
"metadata": {},
39+
"source": [
40+
"## AOT Flow\n",
41+
"\n",
42+
"The first step is creating the PyTorch module and exporting it. Exporting converts the python code in the module into a graph structure. The result is still runnable python code, which can be displayed by printing the `graph_module` of the exported program. "
43+
]
44+
},
45+
{
46+
"cell_type": "code",
47+
"execution_count": null,
48+
"metadata": {},
49+
"outputs": [],
50+
"source": [
51+
"import torch\n",
52+
"\n",
53+
"class Add(torch.nn.Module):\n",
54+
" def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n",
55+
" return x + y\n",
56+
"\n",
57+
"example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n",
58+
"\n",
59+
"model = Add()\n",
60+
"model = model.eval()\n",
61+
"exported_program = torch.export.export_for_training(model, example_inputs)\n",
62+
"graph_module = exported_program.module()\n",
63+
"\n",
64+
"_ = graph_module.print_readable()"
65+
]
66+
},
67+
{
68+
"cell_type": "markdown",
69+
"metadata": {},
70+
"source": [
71+
"To run on Ethos-U the `graph_module` must be quantized using the `arm_quantizer`. Quantization can be done in multiple ways and it can be customized for different parts of the graph; shown here is the recommended path for the EthosUBackend. Quantization also requires calibrating the module with example inputs.\n",
72+
"\n",
73+
"Again printing the module, it can be seen that the quantization wraps the node in quantization/dequantization nodes which contain the computed quanitzation parameters."
74+
]
75+
},
76+
{
77+
"cell_type": "code",
78+
"execution_count": null,
79+
"metadata": {},
80+
"outputs": [],
81+
"source": [
82+
"from executorch.backends.arm.arm_backend import ArmCompileSpecBuilder\n",
83+
"from executorch.backends.arm.quantizer.arm_quantizer import (\n",
84+
" EthosUQuantizer,\n",
85+
" get_symmetric_quantization_config,\n",
86+
")\n",
87+
"from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e\n",
88+
"\n",
89+
"target = \"ethos-u55-128\"\n",
90+
"\n",
91+
"# Create a compilation spec describing the target for configuring the quantizer\n",
92+
"# Some args are used by the Arm Vela graph compiler later in the example. Refer to Arm Vela documentation for an \n",
93+
"# explanation of its flags: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md\n",
94+
"spec_builder = ArmCompileSpecBuilder().ethosu_compile_spec(\n",
95+
" target,\n",
96+
" system_config=\"Ethos_U55_High_End_Embedded\",\n",
97+
" memory_mode=\"Shared_Sram\",\n",
98+
" extra_flags=\"--output-format=raw --debug-force-regor\"\n",
99+
" )\n",
100+
"compile_spec = spec_builder.build()\n",
101+
"\n",
102+
"# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n",
103+
"quantizer = EthosUQuantizer(compile_spec) \n",
104+
"operator_config = get_symmetric_quantization_config(is_per_channel=False)\n",
105+
"quantizer.set_global(operator_config)\n",
106+
"\n",
107+
"# Post training quantization\n",
108+
"quantized_graph_module = prepare_pt2e(graph_module, quantizer) \n",
109+
"quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n",
110+
"quantized_graph_module = convert_pt2e(quantized_graph_module)\n",
111+
"\n",
112+
"_ = quantized_graph_module.print_readable()\n",
113+
"\n",
114+
"# Create a new exported program using the quantized_graph_module\n",
115+
"quantized_exported_program = torch.export.export_for_training(quantized_graph_module, example_inputs)"
116+
]
117+
},
118+
{
119+
"cell_type": "markdown",
120+
"metadata": {},
121+
"source": [
122+
"The quantization nodes created in the previous cell are not built by default with ExecuTorch but must be included in the .pte-file, and so they need to be built separately. `backends/arm/scripts/build_quantized_ops_aot_lib.sh` is a utility script which does this. "
123+
]
124+
},
125+
{
126+
"cell_type": "code",
127+
"execution_count": null,
128+
"metadata": {},
129+
"outputs": [],
130+
"source": [
131+
"import subprocess \n",
132+
"import os \n",
133+
"\n",
134+
"# Setup paths\n",
135+
"cwd_dir = os.getcwd()\n",
136+
"et_dir = os.path.join(cwd_dir, \"..\", \"..\")\n",
137+
"et_dir = os.path.abspath(et_dir)\n",
138+
"script_dir = os.path.join(et_dir, \"backends\", \"arm\", \"scripts\")\n",
139+
"\n",
140+
"# Run build_quantized_ops_aot_lib.sh\n",
141+
"subprocess.run(os.path.join(script_dir, \"build_quantized_ops_aot_lib.sh\"), shell=True, cwd=et_dir)"
142+
]
143+
},
144+
{
145+
"cell_type": "markdown",
146+
"metadata": {},
147+
"source": [
148+
"The lowering in the EthosUBackend happens in five steps:\n",
149+
"\n",
150+
"1. **Lowering to core Aten operator set**: Transform module to use a subset of operators applicable to edge devices. \n",
151+
"2. **Partitioning**: Find subgraphs which are supported for running on Ethos-U\n",
152+
"3. **Lowering to TOSA compatible operator set**: Perform transforms to make the Ethos-U subgraph(s) compatible with TOSA \n",
153+
"4. **Serialization to TOSA**: Compiles the graph module into a TOSA graph \n",
154+
"5. **Compilation to NPU**: Compiles the TOSA graph into an EthosU command stream using the Arm Vela graph compiler. This makes use of the `compile_spec` created earlier.\n",
155+
"Step 5 also prints a Network summary for each processed subgraph.\n",
156+
"\n",
157+
"All of this happens behind the scenes in `to_edge_transform_and_lower`. Printing the graph module shows that what is left in the graph is two quantization nodes for `x` and `y` going into an `executorch_call_delegate` node, followed by a dequantization node."
158+
]
159+
},
160+
{
161+
"cell_type": "code",
162+
"execution_count": null,
163+
"metadata": {},
164+
"outputs": [],
165+
"source": [
166+
"from executorch.backends.arm.ethosu_partitioner import EthosUPartitioner\n",
167+
"from executorch.exir import (\n",
168+
" EdgeCompileConfig,\n",
169+
" ExecutorchBackendConfig,\n",
170+
" to_edge_transform_and_lower,\n",
171+
")\n",
172+
"from executorch.extension.export_util.utils import save_pte_program\n",
173+
"import platform \n",
174+
"\n",
175+
"# Create partitioner from compile spec \n",
176+
"partitioner = EthosUPartitioner(compile_spec)\n",
177+
"\n",
178+
"# Lower the exported program to the Ethos-U backend\n",
179+
"edge_program_manager = to_edge_transform_and_lower(\n",
180+
" quantized_exported_program,\n",
181+
" partitioner=[partitioner],\n",
182+
" compile_config=EdgeCompileConfig(\n",
183+
" _check_ir_validity=False,\n",
184+
" ),\n",
185+
" )\n",
186+
"\n",
187+
"# Load quantization ops library\n",
188+
"os_aot_lib_names = {\"Darwin\" : \"libquantized_ops_aot_lib.dylib\", \n",
189+
" \"Linux\" : \"libquantized_ops_aot_lib.so\", \n",
190+
" \"Windows\": \"libquantized_ops_aot_lib.dll\"}\n",
191+
"aot_lib_name = os_aot_lib_names[platform.system()]\n",
192+
"\n",
193+
"libquantized_ops_aot_lib_path = os.path.join(et_dir, \"cmake-out-aot-lib\", \"kernels\", \"quantized\", aot_lib_name)\n",
194+
"torch.ops.load_library(libquantized_ops_aot_lib_path)\n",
195+
"\n",
196+
"# Convert edge program to executorch\n",
197+
"executorch_program_manager = edge_program_manager.to_executorch(\n",
198+
" config=ExecutorchBackendConfig(extract_delegate_segments=False)\n",
199+
" )\n",
200+
"\n",
201+
"executorch_program_manager.exported_program().module().print_readable()\n",
202+
"\n",
203+
"# Save pte file\n",
204+
"pte_base_name = \"simple_example\"\n",
205+
"pte_name = pte_base_name + \".pte\"\n",
206+
"pte_path = os.path.join(cwd_dir, pte_name)\n",
207+
"save_pte_program(executorch_program_manager, pte_name)\n",
208+
"assert os.path.exists(pte_path), \"Build failed; no .pte-file found\""
209+
]
210+
},
211+
{
212+
"cell_type": "markdown",
213+
"metadata": {},
214+
"source": [
215+
"## Build executor runtime\n",
216+
"\n",
217+
"After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced .pte-file using the Arm cross-compilation toolchain. This is done in three steps:\n",
218+
"1. Build the executorch library and EthosUDelegate.\n",
219+
"2. Build any external kernels required. In this example this is not needed as the graph is fully delegated, but its included for completeness.\n",
220+
"3. Build and link the `arm_executor_runner`."
221+
]
222+
},
223+
{
224+
"cell_type": "code",
225+
"execution_count": null,
226+
"metadata": {},
227+
"outputs": [],
228+
"source": [
229+
"# Build executorch \n",
230+
"subprocess.run(os.path.join(script_dir, \"build_executorch.sh\"), shell=True, cwd=et_dir)\n",
231+
"\n",
232+
"# Build portable kernels\n",
233+
"subprocess.run(os.path.join(script_dir, \"build_portable_kernels.sh\"), shell=True, cwd=et_dir)\n",
234+
"\n",
235+
"# Build executorch runner\n",
236+
"args = f\"--pte={pte_path} --target={target}\"\n",
237+
"subprocess.run(os.path.join(script_dir, \"build_executorch_runner.sh\") + \" \" + args, shell=True, cwd=et_dir)\n",
238+
"\n",
239+
"elf_path = os.path.join(cwd_dir, pte_base_name, \"cmake-out\", \"arm_executor_runner\")\n",
240+
"assert os.path.exists(elf_path), \"Build failed; no .elf-file found\""
241+
]
242+
},
243+
{
244+
"cell_type": "markdown",
245+
"metadata": {},
246+
"source": [
247+
"# Run on simulated model\n",
248+
"\n",
249+
"We can finally use the `backends/arm/scripts/run_fvp.sh` utility script to run the .elf-file on simulated Arm hardware. This Script runs the model with an input of ones, so the expected result of the addition should be close to 2."
250+
]
251+
},
252+
{
253+
"cell_type": "code",
254+
"execution_count": null,
255+
"metadata": {},
256+
"outputs": [],
257+
"source": [
258+
"args = f\"--elf={elf_path} --target={target}\"\n",
259+
"subprocess.run(os.path.join(script_dir, \"run_fvp.sh\") + \" \" + args, shell=True, cwd=et_dir)"
260+
]
261+
}
262+
],
263+
"metadata": {
264+
"kernelspec": {
265+
"display_name": "venv",
266+
"language": "python",
267+
"name": "python3"
268+
},
269+
"language_info": {
270+
"codemirror_mode": {
271+
"name": "ipython",
272+
"version": 3
273+
},
274+
"file_extension": ".py",
275+
"mimetype": "text/x-python",
276+
"name": "python",
277+
"nbconvert_exporter": "python",
278+
"pygments_lexer": "ipython3",
279+
"version": "3.10.15"
280+
}
281+
},
282+
"nbformat": 4,
283+
"nbformat_minor": 4
284+
}

0 commit comments

Comments
 (0)