-
Notifications
You must be signed in to change notification settings - Fork 585
Arm backend: Add ethos_u_minimal_example jupyter notebook #9543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,284 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Copyright 2025 Arm Limited and/or its affiliates.\n", | ||
"#\n", | ||
"# This source code is licensed under the BSD-style license found in the\n", | ||
"# LICENSE file in the root directory of this source tree." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Ethos-U delegate flow example\n", | ||
"\n", | ||
"This guide demonstrates the full flow for running a module on Arm Ethos-U using ExecuTorch. \n", | ||
"Tested on Linux x86_64 and macOS aarch64. If something is not working for you, please raise a GitHub issue and tag Arm.\n", | ||
"\n", | ||
"Before you begin:\n", | ||
"1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n", | ||
"2. Install Arm cross-compilation toolchain and simulators using `examples/arm/setup.sh --i-agree-to-the-contained-eula`\n", | ||
"3. Add Arm cross-compilation toolchain and simulators to PATH using `examples/arm/ethos-u-scratch/setup_path.sh` \n", | ||
"\n", | ||
"With all commands executed from the base `executorch` folder.\n", | ||
"\n", | ||
"\n", | ||
"\n", | ||
"*Some scripts in this notebook produces long output logs: Configuring the 'Customizing Notebook Layout' settings to enable 'Output:scrolling' and setting 'Output:Text Line Limit' makes this more manageable*" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## AOT Flow\n", | ||
"\n", | ||
"The first step is creating the PyTorch module and exporting it. Exporting converts the python code in the module into a graph structure. The result is still runnable python code, which can be displayed by printing the `graph_module` of the exported program. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import torch\n", | ||
"\n", | ||
"class Add(torch.nn.Module):\n", | ||
" def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n", | ||
" return x + y\n", | ||
"\n", | ||
"example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n", | ||
"\n", | ||
"model = Add()\n", | ||
"model = model.eval()\n", | ||
"exported_program = torch.export.export_for_training(model, example_inputs)\n", | ||
"graph_module = exported_program.module()\n", | ||
"\n", | ||
"_ = graph_module.print_readable()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"To run on Ethos-U the `graph_module` must be quantized using the `arm_quantizer`. Quantization can be done in multiple ways and it can be customized for different parts of the graph; shown here is the recommended path for the EthosUBackend. Quantization also requires calibrating the module with example inputs.\n", | ||
"\n", | ||
"Again printing the module, it can be seen that the quantization wraps the node in quantization/dequantization nodes which contain the computed quanitzation parameters." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from executorch.backends.arm.arm_backend import ArmCompileSpecBuilder\n", | ||
"from executorch.backends.arm.quantizer.arm_quantizer import (\n", | ||
" EthosUQuantizer,\n", | ||
" get_symmetric_quantization_config,\n", | ||
")\n", | ||
"from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e\n", | ||
"\n", | ||
"target = \"ethos-u55-128\"\n", | ||
"\n", | ||
"# Create a compilation spec describing the target for configuring the quantizer\n", | ||
"# Some args are used by the Arm Vela graph compiler later in the example. Refer to Arm Vela documentation for an \n", | ||
"# explanation of its flags: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md\n", | ||
"spec_builder = ArmCompileSpecBuilder().ethosu_compile_spec(\n", | ||
" target,\n", | ||
" system_config=\"Ethos_U55_High_End_Embedded\",\n", | ||
" memory_mode=\"Shared_Sram\",\n", | ||
" extra_flags=\"--output-format=raw --debug-force-regor\"\n", | ||
" )\n", | ||
"compile_spec = spec_builder.build()\n", | ||
"\n", | ||
"# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n", | ||
"quantizer = EthosUQuantizer(compile_spec) \n", | ||
"operator_config = get_symmetric_quantization_config(is_per_channel=False)\n", | ||
"quantizer.set_global(operator_config)\n", | ||
"\n", | ||
"# Post training quantization\n", | ||
"quantized_graph_module = prepare_pt2e(graph_module, quantizer) \n", | ||
"quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n", | ||
"quantized_graph_module = convert_pt2e(quantized_graph_module)\n", | ||
"\n", | ||
"_ = quantized_graph_module.print_readable()\n", | ||
"\n", | ||
"# Create a new exported program using the quantized_graph_module\n", | ||
"quantized_exported_program = torch.export.export_for_training(quantized_graph_module, example_inputs)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The quantization nodes created in the previous cell are not built by default with ExecuTorch but must be included in the .pte-file, and so they need to be built separately. `backends/arm/scripts/build_quantized_ops_aot_lib.sh` is a utility script which does this. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import subprocess \n", | ||
"import os \n", | ||
"\n", | ||
"# Setup paths\n", | ||
"cwd_dir = os.getcwd()\n", | ||
"et_dir = os.path.join(cwd_dir, \"..\", \"..\")\n", | ||
"et_dir = os.path.abspath(et_dir)\n", | ||
"script_dir = os.path.join(et_dir, \"backends\", \"arm\", \"scripts\")\n", | ||
"\n", | ||
"# Run build_quantized_ops_aot_lib.sh\n", | ||
"subprocess.run(os.path.join(script_dir, \"build_quantized_ops_aot_lib.sh\"), shell=True, cwd=et_dir)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The lowering in the EthosUBackend happens in five steps:\n", | ||
"\n", | ||
"1. **Lowering to core Aten operator set**: Transform module to use a subset of operators applicable to edge devices. \n", | ||
"2. **Partitioning**: Find subgraphs which are supported for running on Ethos-U\n", | ||
"3. **Lowering to TOSA compatible operator set**: Perform transforms to make the Ethos-U subgraph(s) compatible with TOSA \n", | ||
"4. **Serialization to TOSA**: Compiles the graph module into a TOSA graph \n", | ||
"5. **Compilation to NPU**: Compiles the TOSA graph into an EthosU command stream using the Arm Vela graph compiler. This makes use of the `compile_spec` created earlier.\n", | ||
"Step 5 also prints a Network summary for each processed subgraph.\n", | ||
"\n", | ||
"All of this happens behind the scenes in `to_edge_transform_and_lower`. Printing the graph module shows that what is left in the graph is two quantization nodes for `x` and `y` going into an `executorch_call_delegate` node, followed by a dequantization node." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from executorch.backends.arm.ethosu_partitioner import EthosUPartitioner\n", | ||
"from executorch.exir import (\n", | ||
" EdgeCompileConfig,\n", | ||
" ExecutorchBackendConfig,\n", | ||
" to_edge_transform_and_lower,\n", | ||
")\n", | ||
"from executorch.extension.export_util.utils import save_pte_program\n", | ||
"import platform \n", | ||
"\n", | ||
"# Create partitioner from compile spec \n", | ||
"partitioner = EthosUPartitioner(compile_spec)\n", | ||
"\n", | ||
"# Lower the exported program to the Ethos-U backend\n", | ||
"edge_program_manager = to_edge_transform_and_lower(\n", | ||
" quantized_exported_program,\n", | ||
" partitioner=[partitioner],\n", | ||
" compile_config=EdgeCompileConfig(\n", | ||
" _check_ir_validity=False,\n", | ||
" ),\n", | ||
" )\n", | ||
"\n", | ||
"# Load quantization ops library\n", | ||
"os_aot_lib_names = {\"Darwin\" : \"libquantized_ops_aot_lib.dylib\", \n", | ||
" \"Linux\" : \"libquantized_ops_aot_lib.so\", \n", | ||
" \"Windows\": \"libquantized_ops_aot_lib.dll\"}\n", | ||
"aot_lib_name = os_aot_lib_names[platform.system()]\n", | ||
"\n", | ||
"libquantized_ops_aot_lib_path = os.path.join(et_dir, \"cmake-out-aot-lib\", \"kernels\", \"quantized\", aot_lib_name)\n", | ||
"torch.ops.load_library(libquantized_ops_aot_lib_path)\n", | ||
"\n", | ||
"# Convert edge program to executorch\n", | ||
"executorch_program_manager = edge_program_manager.to_executorch(\n", | ||
" config=ExecutorchBackendConfig(extract_delegate_segments=False)\n", | ||
" )\n", | ||
"\n", | ||
"executorch_program_manager.exported_program().module().print_readable()\n", | ||
"\n", | ||
"# Save pte file\n", | ||
"pte_base_name = \"simple_example\"\n", | ||
"pte_name = pte_base_name + \".pte\"\n", | ||
"pte_path = os.path.join(cwd_dir, pte_name)\n", | ||
"save_pte_program(executorch_program_manager, pte_name)\n", | ||
"assert os.path.exists(pte_path), \"Build failed; no .pte-file found\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Build executor runtime\n", | ||
"\n", | ||
"After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced .pte-file using the Arm cross-compilation toolchain. This is done in three steps:\n", | ||
"1. Build the executorch library and EthosUDelegate.\n", | ||
"2. Build any external kernels required. In this example this is not needed as the graph is fully delegated, but its included for completeness.\n", | ||
"3. Build and link the `arm_executor_runner`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Build executorch \n", | ||
"subprocess.run(os.path.join(script_dir, \"build_executorch.sh\"), shell=True, cwd=et_dir)\n", | ||
"\n", | ||
"# Build portable kernels\n", | ||
"subprocess.run(os.path.join(script_dir, \"build_portable_kernels.sh\"), shell=True, cwd=et_dir)\n", | ||
"\n", | ||
"# Build executorch runner\n", | ||
"args = f\"--pte={pte_path} --target={target}\"\n", | ||
"subprocess.run(os.path.join(script_dir, \"build_executorch_runner.sh\") + \" \" + args, shell=True, cwd=et_dir)\n", | ||
"\n", | ||
"elf_path = os.path.join(cwd_dir, pte_base_name, \"cmake-out\", \"arm_executor_runner\")\n", | ||
"assert os.path.exists(elf_path), \"Build failed; no .elf-file found\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Run on simulated model\n", | ||
"\n", | ||
"We can finally use the `backends/arm/scripts/run_fvp.sh` utility script to run the .elf-file on simulated Arm hardware. This Script runs the model with an input of ones, so the expected result of the addition should be close to 2." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"args = f\"--elf={elf_path} --target={target}\"\n", | ||
"subprocess.run(os.path.join(script_dir, \"run_fvp.sh\") + \" \" + args, shell=True, cwd=et_dir)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.15" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of high level comments,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, here are my thoughts:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@digantdesai What do you think about this answer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be live on the docs page. Checking in a notebook is also fine, it is just an extra step to actually run it.
It would be nice if they can originate from the same source, preferably a code in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two advantages in leaving it as it is, first it is nice to not be reliant on an external service, and second it works as a sanity check to see that you have set up your environment correctly. I am open to having it on google collab as well but we would probably need to have an internal discussion about it as well.
I see now what you mean with overlapping documentation with Eriks pull request #9712 since that is also a type of user guide, I was thinking about the general docs about executorch concepts s.a quantization/dialects or similarly as the main docs. Since that PR mentions following an official backend-template I assume that that is something we will have to conform to.
Running code in CI and then generating a .md and a jupyter notebook seems like a bit of work to get correct, but it might be worth it if it is something used more broadly? Or if we can run the notebook in ci and generate only the .md using some tool (I just now found https://github.com/jupyter/nbconvert by googling for example), although that will probably not follow the template exactly. Yet another solution could be to run the code from the .md and the notebook in CI which would be a little easier but require double updating. Finally we can of course completely skip the notebook idea, I do think that it adds value compared to only .md however in that it is interactive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have better idea about pros and cons than me on this now. My job here is done. :)
I am Ok with this landing, this is not an easy problem, so hopefully we will converge to a better state.