pytorch · GregoryComer · Feb 22, 2025 · Feb 21, 2025 · Feb 21, 2025 · Feb 21, 2025
diff --git a/docs/source/android-prebuilt-library.md b/docs/source/android-prebuilt-library.md
diff --git a/docs/source/backends-arm-ethos-u.md b/docs/source/backends-arm-ethos-u.md
@@ -7,8 +7,8 @@
 :::{grid-item-card}  Tutorials we recommend you complete before this:
 :class-card: card-prerequisites
 * [Introduction to ExecuTorch](./intro-how-it-works.md)
-* [Setting up ExecuTorch](./getting-started-setup.md)
-* [Building ExecuTorch with CMake](./runtime-build-and-cross-compilation.md)
+* [Getting Started](./getting-started.md)
+* [Building ExecuTorch with CMake](./using-executorch-building-from-source.md)
 :::
 
 :::{grid-item-card}  What you will learn in this tutorial:
@@ -280,7 +280,7 @@ The `generate_pte_file` function in `run.sh` script produces the `.pte` files ba
 
 ExecuTorch's CMake build system produces a set of build pieces which are critical for us to include and run the ExecuTorch runtime with-in the bare-metal environment we have for Corstone FVPs from Ethos-U SDK.
 
-[This](./runtime-build-and-cross-compilation.md) document provides a detailed overview of each individual build piece. For running either variant of the `.pte` file, we will need a core set of libraries. Here is a list,
+[This](./using-executorch-building-from-source.md) document provides a detailed overview of each individual build piece. For running either variant of the `.pte` file, we will need a core set of libraries. Here is a list,
 
 - `libexecutorch.a`
 - `libportable_kernels.a`

diff --git a/docs/source/backends-cadence.md b/docs/source/backends-cadence.md
@@ -17,9 +17,9 @@ On top of being able to run on the Xtensa HiFi4 DSP, another goal of this tutori
 :::
 :::{grid-item-card}  Tutorials we recommend you complete before this:
 :class-card: card-prerequisites
-* [Introduction to ExecuTorch](intro-how-it-works.md)
-* [Setting up ExecuTorch](getting-started-setup.md)
-* [Building ExecuTorch with CMake](runtime-build-and-cross-compilation.md)
+* [Introduction to ExecuTorch](./intro-how-it-works.md)
+* [Getting Started](./getting-started.md)
+* [Building ExecuTorch with CMake](./using-executorch-building-from-source.md)
 :::
 ::::
 

diff --git a/docs/source/backends-coreml.md b/docs/source/backends-coreml.md
@@ -11,9 +11,9 @@ Core ML delegate uses Core ML APIs to enable running neural networks via Apple's
 :::
 :::{grid-item-card}  Tutorials we recommend you complete before this:
 :class-card: card-prerequisites
-* [Introduction to ExecuTorch](intro-how-it-works.md)
-* [Setting up ExecuTorch](getting-started-setup.md)
-* [Building ExecuTorch with CMake](runtime-build-and-cross-compilation.md)
+* [Introduction to ExecuTorch](./intro-how-it-works.md)
+* [Getting Started](./getting-started.md)
+* [Building ExecuTorch with CMake](./using-executorch-building-from-source.md)
 * [ExecuTorch iOS Demo App](demo-apps-ios.md)
 :::
 ::::

diff --git a/docs/source/backends-mediatek.md b/docs/source/backends-mediatek.md
@@ -11,9 +11,9 @@ MediaTek backend empowers ExecuTorch to speed up PyTorch models on edge devices
 :::
 :::{grid-item-card}  Tutorials we recommend you complete before this:
 :class-card: card-prerequisites
-* [Introduction to ExecuTorch](intro-how-it-works.md)
-* [Setting up ExecuTorch](getting-started-setup.md)
-* [Building ExecuTorch with CMake](runtime-build-and-cross-compilation.md)
+* [Introduction to ExecuTorch](./intro-how-it-works.md)
+* [Getting Started](./getting-started.md)
+* [Building ExecuTorch with CMake](./using-executorch-building-from-source.md)
 :::
 ::::
 
@@ -91,4 +91,4 @@ cd executorch
 
    ```bash
    export LD_LIBRARY_PATH=<path_to_usdk>:<path_to_neuron_backend>:$LD_LIBRARY_PATH
-   ```
+   ```
diff --git a/docs/source/backends-mps.md b/docs/source/backends-mps.md
@@ -12,9 +12,9 @@ The MPS backend device maps machine learning computational graphs and primitives
 :::
 :::{grid-item-card}  Tutorials we recommend you complete before this:
 :class-card: card-prerequisites
-* [Introduction to ExecuTorch](intro-how-it-works.md)
-* [Setting up ExecuTorch](getting-started-setup.md)
-* [Building ExecuTorch with CMake](runtime-build-and-cross-compilation.md)
+* [Introduction to ExecuTorch](./intro-how-it-works.md)
+* [Getting Started](./getting-started.md)
+* [Building ExecuTorch with CMake](./using-executorch-building-from-source.md)
 * [ExecuTorch iOS Demo App](demo-apps-ios.md)
 * [ExecuTorch iOS LLaMA Demo App](llm/llama-demo-ios.md)
 :::

diff --git a/docs/source/backends-overview.md b/docs/source/backends-overview.md
@@ -0,0 +1,20 @@
+# Backend Overview
+
+ExecuTorch backends provide hardware acceleration for a specific hardware target. In order to achieve maximum performance on target hardware, ExecuTorch optimizes the model for a specific backend during the export and lowering process. This means that the resulting .pte file is specialized for the specific hardware. In order to deploy to multiple backends, such as Core ML on iOS and Arm CPU on Android, it is common to generate a dedicated .pte file for each.
+
+The choice of hardware backend is informed by the hardware that the model is intended to be deployed on. Each backend has specific hardware requires and level of model support. See the documentation for each hardware backend for more details.
+
+As part of the .pte file creation process, ExecuTorch identifies portions of the model (partitions) that are supported for the given backend. These sections are processed by the backend ahead of time to support efficient execution. Portions of the model that are not supported on the delegate, if any, are executed using the portable fallback implementation on CPU. This allows for partial model acceleration when not all model operators are supported on the backend, but may have negative performance implications. In addition, multiple partitioners can be specified in order of priority. This allows for operators not supported on GPU to run on CPU via XNNPACK, for example.
+
+### Available Backends
+
+Commonly used hardware backends are listed below. For mobile, consider using XNNPACK for Android and XNNPACK or Core ML for iOS. To create a .pte file for a specific backend, pass the appropriate partitioner class to `to_edge_transform_and_lower`. See the appropriate backend documentation for more information.
+
+- [XNNPACK (Mobile CPU)](backends-xnnpack.md)
+- [Core ML (iOS)](backends-coreml.md)
+- [Metal Performance Shaders (iOS GPU)](backends-mps.md)
+- [Vulkan (Android GPU)](backends-vulkan.md)
+- [Qualcomm NPU](backends-qualcomm.md)
+- [MediaTek NPU](backends-mediatek.md)
+- [Arm Ethos-U NPU](backends-arm-ethos-u.md)
+- [Cadence DSP](backends-cadence.md)
diff --git a/docs/source/backends-qualcomm.md b/docs/source/backends-qualcomm.md
@@ -14,9 +14,9 @@ Qualcomm AI Engine Direct is also referred to as QNN in the source and documenta
 :::
 :::{grid-item-card}  Tutorials we recommend you complete before this:
 :class-card: card-prerequisites
-* [Introduction to ExecuTorch](intro-how-it-works.md)
-* [Setting up ExecuTorch](getting-started-setup.md)
-* [Building ExecuTorch with CMake](runtime-build-and-cross-compilation.md)
+* [Introduction to ExecuTorch](./intro-how-it-works.md)
+* [Getting Started](./getting-started.md)
+* [Building ExecuTorch with CMake](./using-executorch-building-from-source.md)
 :::
 ::::
 

diff --git a/docs/source/getting-started.md b/docs/source/getting-started.md
@@ -14,7 +14,7 @@ Pip is the recommended way to install the ExecuTorch python package. This packag
 pip install executorch
 ```
 
-To build the framework from source, see [Building From Source](TODO).
+To build the framework from source, see [Building From Source](using-executorch-building-from-source.md).
 
 Backend delegates may require additional dependencies. See the appropriate backend documentation for more information.
 
@@ -29,7 +29,9 @@ The following are required to install the ExecuTorch host libraries, needed to e
 <hr/>
 
 ## Preparing the Model
-Exporting is the process of taking a PyTorch model and converting it to the .pte file format used by the ExecuTorch runtime. This is done using Python APIs. PTE files for common models can be found on HuggingFace (TODO add link).
+Exporting is the process of taking a PyTorch model and converting it to the .pte file format used by the ExecuTorch runtime. This is done using Python APIs. PTE files for common models, such as Llama 3.2, can be found on HuggingFace under [ExecuTorch Community](https://huggingface.co/executorch-community). These models have been exported and lowered for ExecuTorch, and can be directly deployed without needing to go through the lowering process.
+
+A complete example of exporting, lowering, and verifying MobileNet V2 is available as a [Colab notebook](https://colab.research.google.com/drive/1qpxrXC3YdJQzly3mRg-4ayYiOjC6rue3?usp=sharing).
 
 ### Requirements
 - A PyTorch model.
@@ -39,7 +41,7 @@ Exporting is the process of taking a PyTorch model and converting it to the .pte
 ### Selecting a Backend
 ExecuTorch provides hardware acceleration for a wide variety of hardware. The most commonly used backends are XNNPACK, for Arm and x86 CPU, Core ML (for iOS), Vulkan (for Android GPUs), and Qualcomm (for Qualcomm-powered Android phones).
 
-For mobile use cases, consider using XNNPACK for Android and Core ML or XNNPACK for iOS as a first step. See [Delegates](/TODO.md) for a description of available backends.
+For mobile use cases, consider using XNNPACK for Android and Core ML or XNNPACK for iOS as a first step. See [Hardware Backends](backends-overview.md) for more information.
 
 ### Exporting
 Exporting is done using Python APIs. ExecuTorch provides a high degree of customization during the export process, but the typical flow is as follows:
@@ -50,13 +52,13 @@ model = MyModel() # The PyTorch model to export
 example_inputs = (torch.randn(1,3,64,64),) # A tuple of inputs
 
 et_program =
- executorch.exir.to_edge_transform_and_lower(
- torch.export.export(model, example_inputs)
+    executorch.exir.to_edge_transform_and_lower(
+    torch.export.export(model, example_inputs)
 partitioner=[XnnpackPartitioner()]
 ).to_executorch()
 
 with open(“model.pte”, “wb”) as f:
-	f.write(et_program.buffer)
+    f.write(et_program.buffer)
 ```
 
 If the model requires varying input sizes, you will need to specify the varying dimensions and bounds as part of the `export` call. See [Model Export and Lowering](using-executorch-export.md) for more information.
@@ -96,7 +98,7 @@ Quick Links:
 
 #### Installation
 ExecuTorch provides Java bindings for Android usage, which can be consumed from both Java and Kotlin. 
-To add the library to your app, download the AAR, and add it to the gradle build rule. TODO Replace with Maven/Gradle package management when available.
+To add the library to your app, download the AAR, and add it to the gradle build rule.
 
 ```
 mkdir -p app/libs
@@ -113,39 +115,39 @@ dependencies {
 #### Runtime APIs
 Models can be loaded and run using the `Module` class:
 ```java
-import org.pytorch.executorch.EValue
-import org.pytorch.executorch.Module
-import org.pytorch.executorch.Tensor
+import org.pytorch.executorch.EValue;
+import org.pytorch.executorch.Module;
+import org.pytorch.executorch.Tensor;
 
 // …
 
-Module model = Module.load(“/path/to/model.pte”)
-// TODO Add input setup
-EValue output = model.forward(input_evalue);
+Module model = Module.load(“/path/to/model.pte”);
+
+Tensor input_tensor = Tensor.fromBlob(float_data, new long[] { 1, 3, height, width });
+EValue input_evalue = EValue.from(input_tensor);
+EValue[] output = model.forward(input_evalue);
+float[] scores = output[0].toTensor().getDataAsFloatArray();
 ```
 
-For more information on Android development, including building from source, a full description of the Java APIs, and information on using ExecuTorch from Android native code, see [Using ExecuTorch on Android](/TODO.md).
+For a full example of running a model on Android, see the [ExecuTorch Android Demo App](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/ExecuTorchDemo/app/src/main/java/com/example/executorchdemo/ClassificationActivity.java). For more information on Android development, including building from source, a full description of the Java APIs, and information on using ExecuTorch from Android native code, see [Using ExecuTorch on Android](using-executorch-android.md).
 
 ### iOS
 
 #### Installation
-ExecuTorch supports both iOS and MacOS via C++ and Objective-C bindings, as well as hardware backends for CoreML, MPS, and CPU. The iOS runtime library is provided as a collection of .xcframework targets and are made available as a Swift PM package.
+ExecuTorch supports both iOS and MacOS via C++, as well as hardware backends for CoreML, MPS, and CPU. The iOS runtime library is provided as a collection of .xcframework targets and are made available as a Swift PM package.
 
-To get started with Xcode, go to File > Add Package Dependencies. Paste the URL of the ExecuTorch repo into the search bar and select it. Make sure to change the branch name to the desired ExecuTorch version in format “swiftpm-”, (e.g. “swiftpm-0.5.0”).  The ExecuTorch dependency can also be added to the package file manually. See [Using ExecuTorch on iOS](/TODO.md) for more information.
+To get started with Xcode, go to File > Add Package Dependencies. Paste the URL of the ExecuTorch repo into the search bar and select it. Make sure to change the branch name to the desired ExecuTorch version in format “swiftpm-”, (e.g. “swiftpm-0.5.0”).  The ExecuTorch dependency can also be added to the package file manually. See [Using ExecuTorch on iOS](using-executorch-ios.md) for more information.
 
 #### Runtime APIs
-Models can be loaded and run from Swift as follows:
-```swift
-// TODO Code sample
-```
+Models can be loaded and run from Objective-C using the C++ APIs.
 
-For more information on iOS integration, including an API reference, logging setup, and building from source, see [Using ExecuTorch on iOS](/TODO.md).
+For more information on iOS integration, including an API reference, logging setup, and building from source, see [Using ExecuTorch on iOS](using-executorch-ios.md).
 
 ### C++
 ExecuTorch provides C++ APIs, which can be used to target embedded or mobile devices. The C++ APIs provide a greater level of control compared to other language bindings, allowing for advanced memory management, data loading, and platform integration.
 
 #### Installation
-CMake is the preferred build system for the ExecuTorch C++ runtime. To use with CMake, clone the ExecuTorch repository as a subdirectory of your project, and use CMake's `add_subdirectory("executorch")` to include the dependency. The `executorch` target, as well as kernel and backend targets will be made available to link against. The runtime can also be built standalone to support diverse toolchains. See [Using ExecuTorch with C++](/TODO.md) for a detailed description of build integration, targets, and cross compilation.
+CMake is the preferred build system for the ExecuTorch C++ runtime. To use with CMake, clone the ExecuTorch repository as a subdirectory of your project, and use CMake's `add_subdirectory("executorch")` to include the dependency. The `executorch` target, as well as kernel and backend targets will be made available to link against. The runtime can also be built standalone to support diverse toolchains. See [Using ExecuTorch with C++](using-executorch-cpp.md) for a detailed description of build integration, targets, and cross compilation.
 
 ```
 git clone -b release/0.5 https://github.com/pytorch/executorch.git
@@ -199,9 +201,9 @@ For more information on the C++ APIs, see [Running an ExecuTorch Model Using the
 ExecuTorch provides a high-degree of customizability to support diverse hardware targets. Depending on your use cases, consider exploring one or more of the following pages:
 
 - [Export and Lowering](using-executorch-export.md) for advanced model conversion options.
-- [Delegates](/TODO.md) for available backends and configuration options.
-- [Using ExecuTorch on Android](/TODO.md) and [Using ExecuTorch on iOS](TODO.md) for mobile runtime integration.
-- [Using ExecuTorch with C++](/TODO.md) for embedded and mobile native development.
-- [Troubleshooting, Profiling, and Optimization](/TODO.md) for developer tooling and debugging.
-- [API Reference](/TODO.md) for a full description of available APIs.
-- [Examples](https://github.com/pytorch/executorch/tree/main/examples) for demo apps and example code.
+- [Backend Overview](backends-overview.md) for available backends and configuration options.
+- [Using ExecuTorch on Android](using-executorch-android.md) and [Using ExecuTorch on iOS](using-executorch-ios.md) for mobile runtime integration.
+- [Using ExecuTorch with C++](using-executorch-cpp.md) for embedded and mobile native development.
+- [Profiling and Debugging](using-executorch-troubleshooting.md) for developer tooling and debugging.
+- [API Reference](export-to-executorch-api-reference.md) for a full description of available APIs.
+- [Examples](https://github.com/pytorch/executorch/tree/main/examples) for demo apps and example code.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -91,15 +91,16 @@ Topics in this section will help you get started with ExecuTorch.
    using-executorch-cpp
    using-executorch-runtime-integration
    using-executorch-troubleshooting
-   using-executorch-faqs
    using-executorch-building-from-source
+   using-executorch-faqs
 
 .. toctree::
    :glob:
    :maxdepth: 1
    :caption: Backends
    :hidden:
 
+   backends-overview
    backends-xnnpack
    backends-coreml
    backends-mps
@@ -139,6 +140,9 @@ Topics in this section will help you get started with ExecuTorch.
    :hidden:
 
    runtime-overview
+   extension-module
+   extension-tensor
+   running-a-model-cpp-tutorial
    runtime-backend-delegate-implementation-and-linking
    runtime-platform-abstraction-layer
    portable-cpp-programming