Release v0.2.0 · pytorch/executorch

Full Changelog: v0.1.0...v0.2.0

Foundational Improvements

Large generative AI model support

Support generative AI models like Meta Llama 3 8B and Llama 2 7B on Android and iOS phones
4-bit group-wise weight quantization
XNNPACK Delegate and kernels for best performance on CPU (WIP on other backends)
KV Cache support through PyTorch mutable buffer
Custom ops for SDPA, with kv cache and multi-query attention
ExecuTorch Runtime + tokenizer and sampler

Core ExecuTorch improvements

Simplified setup experience
Support for PyTorch mutable buffers
Support for multi-gigabyte models
Constant data moved to its own .pte segment for more efficient serialization
Better kernel coverage in portable lib, XNNPACK, ARM, CoreML, MPS and HTP delegates.
SDK - better profiling and debugging within delegates
API improvements/simplification
Dozens of fixes to fuzzer-identified .pte file-parsing issues
Vulkan delegate for mobile GPU
Data-type based selective build for optimizing binary size
Compatibility with torchtune
More models supported across different backends
Python code now available as the "executorch" pip package in PyPI

Hardware Acceleration Improvements

Arm

Significant boost in operator test coverage thought the use of TOSA reference model, as well as improved CI coverage
Added support for quantization with the ArmQuantizer
Added support for MobileNet v2 TOSA generation
Working towards MobileNet v2 execution on Ethos-U
Added support for multiple new operators on Ethos-U compiler
Added NCHW/NHWC conversion for Ethos-U targets until NHWC is supported by ExecuTorch
Arm backend example now works on MacOS

Apple Core ML

[SDK] ExecuTorch SDK Integration for better debugging and profiling experience
[SDK] ExecuTorch SDK integration using the new MLComputePlan API released in iOS 17.4 and macOS 14.4
[SDK] A model lowered to the CoreML backend can be profiled using the ExecuTorch Inspector without additional setup
[SDK] Profiling surfaces Core ML specific information for each operation in the model, including: supported compute devices, preferred compute device, and estimated cost for each compute device.
[SDK] The Core ML delegate backend also supports logging intermediate tensors for model debugging.
[Partitioner] Enables a developer to lower a model even if Core ML doesn’t support all the operations in the model.
[Partitioner] A developer will now be able to specify the operations that should be skipped by the Core ML backend when lowering the model.
[Quantizer] Leverages PyTorch 2.0 export-based quantization APIs.
[Quantizer] Encodes specific quantization rules in order to optimize the model for execution on Apple silicon
[Quantizer] Integrated with ExecuTorch Core ML delegate conversion pipeline

Apple MPS

Support for over 100 ops (parity with PyTorch MPS backend supported ops)
Support for iOS/iPadOS>=14.4+ / macOS>=12.4
Support for MPSPartitioner
Support for following dtypes: fp16, fp32, bfloat16, int8, int16, int32, int64, uint8, bool
Support for profiling (etrecord, etdump) through Inspector API
Full unit testing coverage for AOT and runtime for all supported operators
Enabled storiesllama (floating point) on MPS

Qualcomm

Support for Snapdragon 8 Gen 3 is added.
Enabled on-device compilation. (aka QNN online-prepare)
Enabled 4-bit and 16-bit quantization.
Qualcomm AI Studio QNN Profiling is integrated into ExecuTorch flow.
Enabled storiesllama on HTP-fp16 (but this effort is mainly thanks to Chen Lai from Meta being the main contributor for this)
Added more operators support
Additional models validated since v0.1.0:
- FbNet
- W2l (Wav2LetterModel)
- SSD300_VGG16
- ViT
- Quantized MobileBert (Quantized MobileBert contribution was submitted prior to v0.1.0 timeline, but merged afterwards)

Cadence HiFi

Expanded operator support for Cadence HiFi targets
Added first small model (RNNT-emformer predictor) to the Cadence HiFi examples

Model Support

Validated with one or more delegates


Meta Llama 2 7B	LearningToPaint	resnet50
Meta Llama 3 8B	lennard_jones	shufflenet_v2_x1_0
Conformer	LSTM	squeezenet1_1
dcgan	maml_omniglot	SqueezeSAM
Deeplab_v3	mnasnet1_0	timm_efficientnet
Edsr	Mobilebert	Torchvision_vit
Emformer_rnnt	Mobilenet_v2	Wav2letter
functorch_dp_cifar10	Mobilenet_v3	Yolo v5
Inception_v3	phlippe_resnet
Inception_v4	resnet18

Tested with `torch.export` but not optimized for performance


Aquila 1 7B	GPT-2	PLaMo 13B
Aquila 2 7B	GPT-J 6B	Qwen 1.5 7B
Baichuan 1 7B	InternLM2 7B	Refact
BioGPT	Koala	RWKV 5 world 1B5
BLOOM 7B1	MiniCPM 2B sft	Stable LM 2 1.6B
Chinese Alpaca 2 7B	Mistral 7B	Stable LM 3B
Chinese LLaMA 2 7B	Mixtral 8x7B MoE	Starcoder
CodeShell	Persimmon 8B chat	Starcoder 2
Deepseek	Phi 1	Vigogne (French)
GPT Neo 1.3B	Phi 1.5	Yi 6B
GPT NeoX 20B	Phi 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Foundational Improvements

Large generative AI model support

Core ExecuTorch improvements

Hardware Acceleration Improvements

Arm

Apple Core ML

Apple MPS

Qualcomm

Cadence HiFi

Model Support

Validated with one or more delegates

Tested with `torch.export` but not optimized for performance

v0.2.0

Foundational Improvements

Large generative AI model support

Core ExecuTorch improvements

Hardware Acceleration Improvements

Arm

Apple Core ML

Apple MPS

Qualcomm

Cadence HiFi

Model Support

Validated with one or more delegates

Tested with torch.export but not optimized for performance

Tested with `torch.export` but not optimized for performance