A unified ML software stack within the PyTorch platform for edge devices. It defines new compiler entry points as well as a state-of-art runtime.
Compared to the legacy Lite Interpreter, there are some major benefits:
- Performance wins compared to Lite Interpreter
- Faster (orders of magnitude lower framework tax in both [DSP] (https://fb.workplace.com/notes/156263446923296) and CPU)
- Much smaller binary size, ~1.5 MB vs. ~30 KB without operators
- Smaller memory footprint because we do ahead of time memory planning in ExecuTorch and also have clear granular control over where the runtime allocations are done.
- Long term alignment with the direction of PyTorch infrastructure
- Lite Interpreter relies on TorchScript, which is being phased out; ExecuTorch is the planned replacement for Lite Interpreter.
- Model Authoring & Productivity gains
- More and better defined entry points to perform model, device, and/or use-case specific optimizations (e.g. better backend delegation, user-defined compiler transformations, default or user-defined memory planning, etc)
- Ability to lower constructs like dynamic control flow to run on device.
See the Using PyTorch > Executorch wiki for pointers to internal workplace groups, how-tos, and other resources.
- Executorch stack diagram
- High-level design doc
- Planning docs
- H22022 roadmap
- H12022 roadmap, summary runtime, summary EXIR
- Coding guidelines
- BE Tasks -- Please add "[executorch][BE]" in the task title
- Minimal binary size (< 50KB not including kernels)
- Minimal framework tax: loading program, initializing executor, kernel and backend-delegate dispatch, runtime memory utilization
- Portable (cross-compile across many toolchains)
- Executes ATen kernels (or ATen custom kernels)
- Executes custom op kernels
- Supports inter op asynchronous execution
- Supports static memory allocation (heapless)
- Supports custom allocation across memory hierarchies
- Supports control flow needed by models
- Allows selective build of kernels
- Allows backend delegation with lightweight interface
ATen mode uses the ATen (pytorch core) implementation of Tensor (at::Tensor
)
along with related types (ScalarType, etc.)
at::Tensor
is big and complex, and often allocates memory with new/malloc- The ATen kernels, which rely on the full
at::Tensor
API, are usable in this configuration - Those kernels also tend to do dynamic memory allocation, and often have extra flexibility (and thus overhead) to handle things not needed by mobile/embedded clients: e.g., CUDA support, sparse tensor support, dtype promotion
Lean mode uses Executorch's smaller torch::executor::Tensor
(aka ETensor)
implementation, along with related types (torch::executor::ScalarType
, etc.)
- ETensor's API is a source-compatible subset of
at::Tensor
. Code that is written against ETensor can also build againstat::Tensor
. - "lean mode kernels" are any operator implementations that are written to be
compatible with ETensor. But that means that the can also build against
at::Tensor
if desired, and used in the same model as ATen kernels. - ETensor does not own or allocate memory on its own
- (TODO(T133200526): NOTE: Dynamic shapes are not yet supported. Remove this warning when they are.) To support dynamic shapes, kernels can allocate Tensor data using the MemoryAllocator provided by the client.
See //executorch/kernels/portable/README.md for technical details.
Portable kernels, which live under //executorch/kernels/portable
, are:
- Lean mode kernels
- Compatible with ATen operator signatures
- Written in portable C++ so that they can build for any target
- Written as reference implementations, prioritizing clarity and simplicity over optimization
- Generally much smaller in code size than ATen kernels
- Written to avoid dynamically allocating memory using new/malloc
- (TODO(T133200526): NOTE: Dynamic shapes are not yet supported. Remove this warning when they are.) To support dynamic shapes, some kernels may allocate Tensor data using the MemoryAllocator provided by the client.
buck2 test fbcode//executorch/...
- Uses the lean Executorch
Tensor
class and related types - Uses the kernels under
//executorch/kernels/portable
instead of the ATen kernels
buck2 run fbcode//executorch/test:executor_runner -- --model_path=fbcode/executorch/test/models/linear_out.ff
- Instead of the lean Executorch
Tensor
, using ATen tensor so that all ATen kernels can be leveraged - Note there can be significant size regression in ATen mode
buck2 run fbcode//executorch/test:executor_runner_aten -- --model_path=fbcode/executorch/test/models/linear_out.ff
In xplat:
buck2 build @fbandroid/mode/opt @fbandroid/mode/ndk_libcxx -c user.ndk_cxxflags="-frtti -fexceptions" fbsource//xplat/executorch/test:executor_runner
In xplat:
buck2 build @arvr/mode/android/linux/opt-stripped -c ndk.custom_libcxx=false fbsource//xplat/executorch/test:executor_runner