Open
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16718
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 Cancelled JobsAs of commit 240b241 with merge base 3319157 ( CANCELLED JOBS - The following jobs were cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
5e84aa5 to
1867cfc
Compare
6541f23 to
f296724
Compare
5eef84c to
6e41f05
Compare
e0e015c to
240b241
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an MLX backend for ExecuTorch, enabling Metal-accelerated inference on Apple Silicon. It runs Llama, Qwen, Gemma, Whisper, Voxtral, and Parakeet models end-to-end, with 637 passing op tests and multithreaded execution support. For many models, it offers best performance among all ExecuTorch backends on Apple Silicon, offering 2-6x speedups over what was previously possible with ExecuTorch, and up to 30% smaller model sizes compared to XNNPACK due to BF16 support and tied quantized embedding support.
The PR is large due to extensive op coverage, testing, and documentation, but almost all changes are confined to
backends/mlx/. The design is described inbackends/mlx/README.md.Suggested review approach:
backends/mlx/carefully — these integrate with ExecuTorch's build system and are the most likely to need changes.backends/mlx/, focus on structural design (see README) and test coverage (CI job is .github/workflows/mlx.yml)Prerequisite PRs
These fixes were developed alongside the MLX backend. Once merged, this PR can be rebased to remove the duplicated changes:
remove_noop_passTests
CI is defined in
.github/workflows/mlx.yml:The 1 failing operator test is a test-side issue being fixed in #17539. The 3 failing model tests will be addressed in follow-ups — they are not an initial focus compared to the GenAI models above.