Skip to content

Arm backend: Add ethos_u_minimal_example jupyter notebook #9543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions examples/arm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,16 @@ $ source executorch/examples/arm/ethos-u-scratch/setup_path.sh
$ executorch/examples/arm/run.sh --model_name=mv2 --target=ethos-u85-128 [--scratch-dir=same-optional-scratch-dir-as-before]
```

### Ethos-U minimal example

See the jupyter notebook `ethos_u_minimal_example.ipynb` for an explained minimal example of the full flow for running a
PyTorch module on the EthosUDelegate. The notebook runs directly in some IDE:s s.a. VS Code, otherwise it can be run in
your browser using
```
pip install jupyter
jupyter notebook ethos_u_minimal_example.ipynb
```

### Online Tutorial

We also have a [tutorial](https://pytorch.org/executorch/stable/executorch-arm-delegate-tutorial.html) explaining the steps performed in these
Expand Down
284 changes: 284 additions & 0 deletions examples/arm/ethos_u_minimal_example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of high level comments,

  • without CI this will be broken
  • why not google colab and keep it live?
  • how is it going to stay in sync with docs given there is a large-ish overlap

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, here are my thoughts:

  • We are going to test this in our internal CI, could definitely be worth it adding it to upstream CI as well.
  • What's the added benefit of using google colab? I am not that familiar with it but I haven't had any problems with using a regular notebook.
  • I don't think we have a settled "strategy" on documentation, my personal view is that it makes sense to have two types of slightly overlapping documentation, the detailed main docs, and user guides such as this one. Any other docs, README:s etc, should refer to these rather than repeating info. There will be some work needed to keep these two in sync, but an interactive notebook user guide should be easier to keep in sync compared to a doc since it can be tested in CI, in addition to it being a better user experience.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@digantdesai What do you think about this answer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the added benefit of using google colab? I am not that familiar with it but I haven't had any problems with using a regular notebook.

It can be live on the docs page. Checking in a notebook is also fine, it is just an extra step to actually run it.

to have two types of slightly overlapping documentation

It would be nice if they can originate from the same source, preferably a code in CI.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two advantages in leaving it as it is, first it is nice to not be reliant on an external service, and second it works as a sanity check to see that you have set up your environment correctly. I am open to having it on google collab as well but we would probably need to have an internal discussion about it as well.

I see now what you mean with overlapping documentation with Eriks pull request #9712 since that is also a type of user guide, I was thinking about the general docs about executorch concepts s.a quantization/dialects or similarly as the main docs. Since that PR mentions following an official backend-template I assume that that is something we will have to conform to.

Running code in CI and then generating a .md and a jupyter notebook seems like a bit of work to get correct, but it might be worth it if it is something used more broadly? Or if we can run the notebook in ci and generate only the .md using some tool (I just now found https://github.com/jupyter/nbconvert by googling for example), although that will probably not follow the template exactly. Yet another solution could be to run the code from the .md and the notebook in CI which would be a little easier but require double updating. Finally we can of course completely skip the notebook idea, I do think that it adds value compared to only .md however in that it is interactive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have better idea about pros and cons than me on this now. My job here is done. :)

I am Ok with this landing, this is not an easy problem, so hopefully we will converge to a better state.

"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Copyright 2025 Arm Limited and/or its affiliates.\n",
"#\n",
"# This source code is licensed under the BSD-style license found in the\n",
"# LICENSE file in the root directory of this source tree."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ethos-U delegate flow example\n",
"\n",
"This guide demonstrates the full flow for running a module on Arm Ethos-U using ExecuTorch. \n",
"Tested on Linux x86_64 and macOS aarch64. If something is not working for you, please raise a GitHub issue and tag Arm.\n",
"\n",
"Before you begin:\n",
"1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n",
"2. Install Arm cross-compilation toolchain and simulators using `examples/arm/setup.sh --i-agree-to-the-contained-eula`\n",
"3. Add Arm cross-compilation toolchain and simulators to PATH using `examples/arm/ethos-u-scratch/setup_path.sh` \n",
"\n",
"With all commands executed from the base `executorch` folder.\n",
"\n",
"\n",
"\n",
"*Some scripts in this notebook produces long output logs: Configuring the 'Customizing Notebook Layout' settings to enable 'Output:scrolling' and setting 'Output:Text Line Limit' makes this more manageable*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## AOT Flow\n",
"\n",
"The first step is creating the PyTorch module and exporting it. Exporting converts the python code in the module into a graph structure. The result is still runnable python code, which can be displayed by printing the `graph_module` of the exported program. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"\n",
"class Add(torch.nn.Module):\n",
" def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n",
" return x + y\n",
"\n",
"example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n",
"\n",
"model = Add()\n",
"model = model.eval()\n",
"exported_program = torch.export.export_for_training(model, example_inputs)\n",
"graph_module = exported_program.module()\n",
"\n",
"_ = graph_module.print_readable()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To run on Ethos-U the `graph_module` must be quantized using the `arm_quantizer`. Quantization can be done in multiple ways and it can be customized for different parts of the graph; shown here is the recommended path for the EthosUBackend. Quantization also requires calibrating the module with example inputs.\n",
"\n",
"Again printing the module, it can be seen that the quantization wraps the node in quantization/dequantization nodes which contain the computed quanitzation parameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from executorch.backends.arm.arm_backend import ArmCompileSpecBuilder\n",
"from executorch.backends.arm.quantizer.arm_quantizer import (\n",
" EthosUQuantizer,\n",
" get_symmetric_quantization_config,\n",
")\n",
"from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e\n",
"\n",
"target = \"ethos-u55-128\"\n",
"\n",
"# Create a compilation spec describing the target for configuring the quantizer\n",
"# Some args are used by the Arm Vela graph compiler later in the example. Refer to Arm Vela documentation for an \n",
"# explanation of its flags: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md\n",
"spec_builder = ArmCompileSpecBuilder().ethosu_compile_spec(\n",
" target,\n",
" system_config=\"Ethos_U55_High_End_Embedded\",\n",
" memory_mode=\"Shared_Sram\",\n",
" extra_flags=\"--output-format=raw --debug-force-regor\"\n",
" )\n",
"compile_spec = spec_builder.build()\n",
"\n",
"# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n",
"quantizer = EthosUQuantizer(compile_spec) \n",
"operator_config = get_symmetric_quantization_config(is_per_channel=False)\n",
"quantizer.set_global(operator_config)\n",
"\n",
"# Post training quantization\n",
"quantized_graph_module = prepare_pt2e(graph_module, quantizer) \n",
"quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n",
"quantized_graph_module = convert_pt2e(quantized_graph_module)\n",
"\n",
"_ = quantized_graph_module.print_readable()\n",
"\n",
"# Create a new exported program using the quantized_graph_module\n",
"quantized_exported_program = torch.export.export_for_training(quantized_graph_module, example_inputs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The quantization nodes created in the previous cell are not built by default with ExecuTorch but must be included in the .pte-file, and so they need to be built separately. `backends/arm/scripts/build_quantized_ops_aot_lib.sh` is a utility script which does this. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import subprocess \n",
"import os \n",
"\n",
"# Setup paths\n",
"cwd_dir = os.getcwd()\n",
"et_dir = os.path.join(cwd_dir, \"..\", \"..\")\n",
"et_dir = os.path.abspath(et_dir)\n",
"script_dir = os.path.join(et_dir, \"backends\", \"arm\", \"scripts\")\n",
"\n",
"# Run build_quantized_ops_aot_lib.sh\n",
"subprocess.run(os.path.join(script_dir, \"build_quantized_ops_aot_lib.sh\"), shell=True, cwd=et_dir)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The lowering in the EthosUBackend happens in five steps:\n",
"\n",
"1. **Lowering to core Aten operator set**: Transform module to use a subset of operators applicable to edge devices. \n",
"2. **Partitioning**: Find subgraphs which are supported for running on Ethos-U\n",
"3. **Lowering to TOSA compatible operator set**: Perform transforms to make the Ethos-U subgraph(s) compatible with TOSA \n",
"4. **Serialization to TOSA**: Compiles the graph module into a TOSA graph \n",
"5. **Compilation to NPU**: Compiles the TOSA graph into an EthosU command stream using the Arm Vela graph compiler. This makes use of the `compile_spec` created earlier.\n",
"Step 5 also prints a Network summary for each processed subgraph.\n",
"\n",
"All of this happens behind the scenes in `to_edge_transform_and_lower`. Printing the graph module shows that what is left in the graph is two quantization nodes for `x` and `y` going into an `executorch_call_delegate` node, followed by a dequantization node."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from executorch.backends.arm.ethosu_partitioner import EthosUPartitioner\n",
"from executorch.exir import (\n",
" EdgeCompileConfig,\n",
" ExecutorchBackendConfig,\n",
" to_edge_transform_and_lower,\n",
")\n",
"from executorch.extension.export_util.utils import save_pte_program\n",
"import platform \n",
"\n",
"# Create partitioner from compile spec \n",
"partitioner = EthosUPartitioner(compile_spec)\n",
"\n",
"# Lower the exported program to the Ethos-U backend\n",
"edge_program_manager = to_edge_transform_and_lower(\n",
" quantized_exported_program,\n",
" partitioner=[partitioner],\n",
" compile_config=EdgeCompileConfig(\n",
" _check_ir_validity=False,\n",
" ),\n",
" )\n",
"\n",
"# Load quantization ops library\n",
"os_aot_lib_names = {\"Darwin\" : \"libquantized_ops_aot_lib.dylib\", \n",
" \"Linux\" : \"libquantized_ops_aot_lib.so\", \n",
" \"Windows\": \"libquantized_ops_aot_lib.dll\"}\n",
"aot_lib_name = os_aot_lib_names[platform.system()]\n",
"\n",
"libquantized_ops_aot_lib_path = os.path.join(et_dir, \"cmake-out-aot-lib\", \"kernels\", \"quantized\", aot_lib_name)\n",
"torch.ops.load_library(libquantized_ops_aot_lib_path)\n",
"\n",
"# Convert edge program to executorch\n",
"executorch_program_manager = edge_program_manager.to_executorch(\n",
" config=ExecutorchBackendConfig(extract_delegate_segments=False)\n",
" )\n",
"\n",
"executorch_program_manager.exported_program().module().print_readable()\n",
"\n",
"# Save pte file\n",
"pte_base_name = \"simple_example\"\n",
"pte_name = pte_base_name + \".pte\"\n",
"pte_path = os.path.join(cwd_dir, pte_name)\n",
"save_pte_program(executorch_program_manager, pte_name)\n",
"assert os.path.exists(pte_path), \"Build failed; no .pte-file found\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build executor runtime\n",
"\n",
"After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced .pte-file using the Arm cross-compilation toolchain. This is done in three steps:\n",
"1. Build the executorch library and EthosUDelegate.\n",
"2. Build any external kernels required. In this example this is not needed as the graph is fully delegated, but its included for completeness.\n",
"3. Build and link the `arm_executor_runner`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Build executorch \n",
"subprocess.run(os.path.join(script_dir, \"build_executorch.sh\"), shell=True, cwd=et_dir)\n",
"\n",
"# Build portable kernels\n",
"subprocess.run(os.path.join(script_dir, \"build_portable_kernels.sh\"), shell=True, cwd=et_dir)\n",
"\n",
"# Build executorch runner\n",
"args = f\"--pte={pte_path} --target={target}\"\n",
"subprocess.run(os.path.join(script_dir, \"build_executorch_runner.sh\") + \" \" + args, shell=True, cwd=et_dir)\n",
"\n",
"elf_path = os.path.join(cwd_dir, pte_base_name, \"cmake-out\", \"arm_executor_runner\")\n",
"assert os.path.exists(elf_path), \"Build failed; no .elf-file found\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Run on simulated model\n",
"\n",
"We can finally use the `backends/arm/scripts/run_fvp.sh` utility script to run the .elf-file on simulated Arm hardware. This Script runs the model with an input of ones, so the expected result of the addition should be close to 2."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"args = f\"--elf={elf_path} --target={target}\"\n",
"subprocess.run(os.path.join(script_dir, \"run_fvp.sh\") + \" \" + args, shell=True, cwd=et_dir)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.15"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading