-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disclaimer #3376
disclaimer #3376
Commits on Apr 9, 2024
-
Summary: Pull Request resolved: #2944 Need change from D55354487 to get mutable buffer + pt2e working Reviewed By: JacobSzwejbka Differential Revision: D55922254 fbshipit-source-id: 5ea4471eb0e22149a0dbb4e921fe447cceb13bf1
Configuration menu - View commit details
-
Copy full SHA for cb6ddae - Browse repository at this point
Copy the full SHA cb6ddaeView commit details -
aten.convolution (Transpose) (#2883)
Summary: Pull Request resolved: #2883 ## Summary (cases handled) We introduce support for the convolution cases covered by ATen-VK's transpose implementation. This is achieved by - reusing the existing [`conv_transpose2d.glsl`](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/glsl/conv_transpose2d.glsl), and - [moving special weights prepacking from CPU](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/ops/Convolution.cpp#L134-L235) to the GPU in `conv_transpose2d_prepack_weights.glsl`. We also include resizing support for dynamic shapes. Note that only height and width of the input can vary. ## Cases not handled The implementation is on-par with ATen-VK's Transpose. This means the following cases are missing: 1. **Groups G > 1.** 2. **Batch (input) N > 1.** 3. **Dilation > 1.** ghstack-source-id: 221721754 exported-using-ghexport bypass-github-export-checks Reviewed By: copyrightly, SS-JIA Differential Revision: D55667336 fbshipit-source-id: 3b7b7c912ef947610624e2e1c5b753de393234a0
Configuration menu - View commit details
-
Copy full SHA for 8a6427e - Browse repository at this point
Copy the full SHA 8a6427eView commit details -
aten.convolution (Depthwise) (#2884)
Summary: Pull Request resolved: #2884 ## Summary We introduce support for the convolution cases covered by [ATen-VK's default Depthwise implementation](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/ops/Convolution.cpp#L68). This is achieved by - reusing the [existing `conv2d_dw.glsl`](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/glsl/conv2d_dw.glsl), and - [moving special weights prepacking from CPU](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/ops/Convolution.cpp#L80-L132) to the GPU in `conv2d_dw_prepack_weights.glsl`. The implementation is on-par with ATen-VK's Depthwise. This means it only covers: - `in_channels == groups`, `out_channels == groups` A full implementation would cover, for any positive integer K: - `in_channels == groups`, `out_channels == groups * K` ghstack-source-id: 221721752 exported-using-ghexport bypass-github-export-checks Reviewed By: SS-JIA Differential Revision: D55813511 fbshipit-source-id: c0726798bd36cc5ff2326836c28a5f7d23494f5e
Configuration menu - View commit details
-
Copy full SHA for c4ac14c - Browse repository at this point
Copy the full SHA c4ac14cView commit details -
Fix Validation Layer warnings about wrong image layout (#2854)
Summary: Pull Request resolved: #2854 ## Context Currently, when executing a `ComputeGraph` with prepacked tensors with [Vulkan Validation Layers](https://github.com/KhronosGroup/Vulkan-ValidationLayers) turned on, the following Validation Errors can be observed. Note that Validation Layers can be turned on by running Vulkan binaries on Mac with the `vkconfig` app opened. ``` UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x7fb76dbbf988, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x7fb76dbbf988[] expects VkImage 0xd79c8a0000000f09[] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_GENERAL--instead, current layout is VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL. Objects: 1 [0] 0x7fb76dbbf988, type: 6, name: NULL ``` The reason for this is that prepacked textures are written to with `WRITE` memory access during packing, which means they will be in the `VK_IMAGE_LAYOUT_GENERAL` layout. However, they will subsequently be read from during `graph.execute()`, meaning the texture will have transitioned to `VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL`, but will be bound using the `VK_IMAGE_LAYOUT_GENERAL` layout. Subsequent calls to `execute()` will therefore see that the prepacked texture has been bound with the wrong layout, since after the first graph execution the texture will have the `VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL` layout. The solution is to submit a no-op shader dispatch during prepacking to trigger a transition to the `READ_ONLY_OPTIMAL` layout. ghstack-source-id: 221871426 bypass-github-pytorch-ci-checks Reviewed By: jorgep31415 Differential Revision: D55772003 fbshipit-source-id: f9c69e6e571ca0d0d28a6c25716766af98e82d41
Configuration menu - View commit details
-
Copy full SHA for 4599650 - Browse repository at this point
Copy the full SHA 4599650View commit details -
Introduce convenience constexpr for
StorageType
s and `GPUMemoryLayo……ut`s (#2948) Summary: Pull Request resolved: #2948 ## Context Introduce the following convenience `constexpr`: * `api::kBuffer`, `api::kTexture3D`, and `api::kTexture2D` * `api::kWidthPacked`, `api::kHeightPacked`, and `api::kChannelsPacked` Also remove the `api::StorageType::UNKNOWN` enum entry as it doesn't really serve any purpose. ghstack-source-id: 221871428 bypass-github-pytorch-ci-checks Reviewed By: copyrightly, jorgep31415 Differential Revision: D55811278 fbshipit-source-id: 26dc1706ac2605c13f247d08a21863ff3ef94488
Configuration menu - View commit details
-
Copy full SHA for b26eee8 - Browse repository at this point
Copy the full SHA b26eee8View commit details -
Use __ET_UNLIKELY in assertion macros (#2949)
Summary: Pull Request resolved: #2949 It is supposed to be unlikely for assert/check conditions to fail; let's tell the compiler about that. Reviewed By: mergennachin Differential Revision: D55929730 fbshipit-source-id: 5677c19cd8342cbd77a9c0b973059ed3d5ee800b
Configuration menu - View commit details
-
Copy full SHA for 6cb6051 - Browse repository at this point
Copy the full SHA 6cb6051View commit details -
s/heirarchies/hierarchies/ (#2772)
Summary: Pull Request resolved: #2772 Just a spelling mistake. Reviewed By: JacobSzwejbka Differential Revision: D55542731 fbshipit-source-id: c12bcab53661561bf0d8223d5cae9ed92b39e599
Configuration menu - View commit details
-
Copy full SHA for 3661a11 - Browse repository at this point
Copy the full SHA 3661a11View commit details -
Fix indentation in selective build example code (#2773)
Summary: Pull Request resolved: #2773 Noticed this page didn't line up right. Now it does. Reviewed By: mergennachin, kirklandsign Differential Revision: D55542836 fbshipit-source-id: a25a376ce9e77f3bc360e9ab6cf15c9ae9ecc7bf
Configuration menu - View commit details
-
Copy full SHA for 02f565e - Browse repository at this point
Copy the full SHA 02f565eView commit details -
aten.convolution (Depthwise Output-Tile) (#2885)
Summary: Pull Request resolved: #2885 We port an optimization from ATen-VK for specific weight sizes: [`conv2d_dw_output_tile.glsl`](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/glsl/conv2d_dw_output_tile.glsl) ghstack-source-id: 221887576 exported-using-ghexport bypass-github-export-checks Reviewed By: SS-JIA Differential Revision: D55814588 fbshipit-source-id: 86a85d122abbcebfed41466bc0a4907a6ddc80f9
Configuration menu - View commit details
-
Copy full SHA for f00afe7 - Browse repository at this point
Copy the full SHA f00afe7View commit details -
aten.convolution (Pointwise) (#2886)
Summary: Pull Request resolved: #2886 We port an optimization from ATen-VK for specific weight sizes: [`conv2d_pw.glsl`](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/glsl/conv2d_pw.glsl) ghstack-source-id: 221887670 exported-using-ghexport bypass-github-export-checks Reviewed By: SS-JIA Differential Revision: D55814587 fbshipit-source-id: 419d82ddcf2dce59b2d1ec5abf313356fce074e6
Configuration menu - View commit details
-
Copy full SHA for 99c4f4e - Browse repository at this point
Copy the full SHA 99c4f4eView commit details
Commits on Apr 10, 2024
-
Make minor updates to LLM guide setup instructions (#2940)
Summary: Minor updates to the prerequisite section of the LLM getting started guide. Passing -s to pyenv install prevents a prompt if python 3.10 is already installed (it will just silently continue in this case when the flag is passed). Additionally, under pyenv, we should be using python, not python3. I also added a little bit of wording on env management. Pull Request resolved: #2940 Test Plan: Ran LLM guide prerequisite section on an m1 mac with pyenv-virtualenv. Reviewed By: byjlw Differential Revision: D55913382 Pulled By: GregoryComer fbshipit-source-id: 7f04262b025db83b8621c972c90d3cdc3f029377
Configuration menu - View commit details
-
Copy full SHA for 218f643 - Browse repository at this point
Copy the full SHA 218f643View commit details -
resolve_buck.py: Add an entry for darwin-x86_64 (#2868)
Summary: Version hash reported by https://github.com/facebook/buck2/releases/download/2024-02-15/buck2-x86_64-apple-darwin.zst Pull Request resolved: #2868 Reviewed By: Olivia-liu Differential Revision: D55914146 Pulled By: GregoryComer fbshipit-source-id: b9882900acfd4cb6f74eda90a7c99bdb119ec122
Configuration menu - View commit details
-
Copy full SHA for de7fdaa - Browse repository at this point
Copy the full SHA de7fdaaView commit details -
Compute graph print readable (#2825)
Summary: Pull Request resolved: #2825 Add capability to print the node list with arguments to allow better debugging. Reviewed By: SS-JIA Differential Revision: D55510335 fbshipit-source-id: 151e3a6f249417dfe644172c1b5f0e83a3b110dd
Configuration menu - View commit details
-
Copy full SHA for 564c276 - Browse repository at this point
Copy the full SHA 564c276View commit details -
aten.convolution (Bias=False) (#2887)
Summary: Pull Request resolved: #2887 The final touches to get ET-VK convolution on-par with ATen-VK's convolution. ## Idea In our shaders, we add the bias to our sum. ``` ${VEC4_T[DTYPE]} sum = texelFetch(bias_in, ivec2(pos.z, 0), 0); ``` To keep our shaders as is, we implement having no bias by allocating a buffer of zeros. Then, our shader adds zero to our sum. ## Issue If `Bias=False`, dummy buffer of zeros is not serialized with the graph. The bias ValueRef is deserialized in the runtime as `TypeTag::NONE`, not `TypeTag::TENSORREF`. ## Solution If `TypeTag::NONE` is given, (1) create the `vTensor` using the `out_channels` value from the weights, (2) allocate a StagingBuffer of that size, and (3) `memset` its data to zero. Failure to do (3) will result in undefined behavior. ghstack-source-id: 221926167 exported-using-ghexport bypass-github-export-checks Reviewed By: SS-JIA Differential Revision: D55814589 fbshipit-source-id: ce7b82c31bb11540ed2d98ab14131841fcee93e4
Configuration menu - View commit details
-
Copy full SHA for 8aaf2c5 - Browse repository at this point
Copy the full SHA 8aaf2c5View commit details -
Add convolution cases to codegen (#2920)
Summary: Pull Request resolved: #2920 TSIA ghstack-source-id: 221926168 exported-using-ghexport bypass-github-export-checks Reviewed By: SS-JIA Differential Revision: D55829466 fbshipit-source-id: 48b4f15c41141093dd061c43e6b769eb4c25c81b
Configuration menu - View commit details
-
Copy full SHA for f0bfc3c - Browse repository at this point
Copy the full SHA f0bfc3cView commit details -
Summary: Pull Request resolved: #2807 The operator `aten.sum.dim_IntList` could take an empty list as the parameter for `dims`. We modify `vulkan_graph_builder.py` to accommodate the empty list. Moreover, the op `aten.sum.default` is implemented as a [decomposition](https://www.internalfb.com/code/fbsource/[96e496f9db8f92967b4394bd4f60e39ab916740b]/xplat/caffe2/torch/_decomp/decompositions.py?lines=4676) into `aten.sum.dim_IntList` with empty `dims`. So we will support `aten.sum.default` with the changes. Context: `torch.sum(x, ())` and `torch.sum(x)` are two ways to compute the sum of all elements in tensor `x`. Reviewed By: SS-JIA, jorgep31415 Differential Revision: D55630993 fbshipit-source-id: 923d276118e893ff6885b92eb7b4c7cb7a95b374
Configuration menu - View commit details
-
Copy full SHA for b145701 - Browse repository at this point
Copy the full SHA b145701View commit details -
Fix failing CI jobs caused by #2934 (#2961)
Summary: Pull Request resolved: #2961 Fix these 3 CI job failures caused by #2934 (D55907752): * Apple / build-frameworks-ios / macos-job * trunk / test-arm-backend-delegation / linux-job * trunk / test-coreml-delegate / macos-job Reviewed By: kirklandsign Differential Revision: D55950023 fbshipit-source-id: 6166d9112e6d971d042df1400442395d8044c3b3
Configuration menu - View commit details
-
Copy full SHA for d993797 - Browse repository at this point
Copy the full SHA d993797View commit details -
Replace
std::stringstream
withstd::string
for Shader names (#2964)Summary: Pull Request resolved: #2964 ## Context Some research into efficient string concatenation suggests that streams in C++ are not quite efficient. The best way to concatenate strings seems to be creating a `std::string` and reserving sufficient capacity for the `std::string`. This diff deprecates the usage of `std::stringstream` when constructing kernel names in favor of using `std::string` directly. Reviewed By: copyrightly Differential Revision: D55951475 fbshipit-source-id: a1a584669e80984b85d11b7d6d4f7593290e562b
Configuration menu - View commit details
-
Copy full SHA for a983ebc - Browse repository at this point
Copy the full SHA a983ebcView commit details -
Refine the LLM manual (focus on the debugging and profiling part) (#2952
) Summary: Pull Request resolved: #2952 * Some auto-formatting by my VSCode (remove extra spaces) * Remove imports that have been imported in previous part of the doc * Other minor changes to keep consistency across the doc * Link a screenshot instead of using the raw table because the original table is illegible: {F1482781056} Reviewed By: GregoryComer Differential Revision: D55938344 fbshipit-source-id: 699abb9ebe1196ab73d90a3d08d60be7aa0d8688
Configuration menu - View commit details
-
Copy full SHA for e733f2d - Browse repository at this point
Copy the full SHA e733f2dView commit details -
Android demo app tutorial fix for XNNPACK and QNN (#2962)
Summary: * Update tutorial due to recent changes. * Clean up setup.sh for app helper lib build. Pull Request resolved: #2962 Reviewed By: cccclai Differential Revision: D55951189 Pulled By: kirklandsign fbshipit-source-id: 2c95e8580145b039f503e7cd99a4003867f8dbb0
Configuration menu - View commit details
-
Copy full SHA for 26365f1 - Browse repository at this point
Copy the full SHA 26365f1View commit details -
Qualcomm AI Engine Direct - Enable per channel linear op (#2822)
Summary: - Add per channel weight quantization for linear op - Bias quantization for per channel weight Linear op is not support yet Pull Request resolved: #2822 Reviewed By: kirklandsign Differential Revision: D55731629 Pulled By: cccclai fbshipit-source-id: 831a47c897b34e1a749325df56a8bbd0acda80e1
Configuration menu - View commit details
-
Copy full SHA for 554cd27 - Browse repository at this point
Copy the full SHA 554cd27View commit details -
Custom ops API small fixes (#2936)
Summary: Pull Request resolved: #2936 Fix the way we use `at::from_blob()` and add proper namespace to `CompileTimeFunctionPointer` so to not confused with `at::CompileTimeFunctionPointer`. bypass-github-pytorch-ci-checks bypass-export-ci-checks Reviewed By: lucylq Differential Revision: D55907751 fbshipit-source-id: ad793e30ec72f48e7300d75820209035d42cae6c
Configuration menu - View commit details
-
Copy full SHA for 8f8d969 - Browse repository at this point
Copy the full SHA 8f8d969View commit details -
Consolidate EXECUTORCH_BUILD_CUSTOM option (#2935)
Summary: Pull Request resolved: #2935 Currently `EXECUTORCH_BUILD_CUSTOM` is not being respected properly. If this option is false, we should not build `llama2/custom_ops` anywhere. If this option is true, we should build `llama2/custom_ops` in both llama runner binary and pybind. This PR consolidates it. bypass-github-pytorch-ci-checks bypass-export-ci-checks Reviewed By: lucylq Differential Revision: D55907750 fbshipit-source-id: 03a7a8cbd499c734060de385d6edb193cf35470d
Configuration menu - View commit details
-
Copy full SHA for d209e41 - Browse repository at this point
Copy the full SHA d209e41View commit details -
Consolidate tokenizer interface (#2954)
Summary: Pull Request resolved: #2954 Change the tokenizer APIs to: ``` Result<std::vector<uint64_t>> encode(const std::string& input, int8_t bos, int8_t eos); Result<std::string> decode(uint64_t prev_token, uint64_t token); ``` Notice that: we use `uint64_t` for token id just to be safe. We return a std::vector of tokens for encode() API. Reviewed By: lucylq Differential Revision: D55944780 fbshipit-source-id: 9b44437e7061424526f4e0b049a3449129f0ba53
Configuration menu - View commit details
-
Copy full SHA for 948760a - Browse repository at this point
Copy the full SHA 948760aView commit details -
Summary: Pull Request resolved: #2033 Update the OSS Xtensa repo with more up to date compiler and quantizer things. Introduce a test folder and a conv1d test. Reviewed By: tarun292, cccclai Differential Revision: D54034581 fbshipit-source-id: c2bf0c43897a2ef7dff291698370d2583433a6ba
Configuration menu - View commit details
-
Copy full SHA for 859e924 - Browse repository at this point
Copy the full SHA 859e924View commit details -
Add the missing import generate_etrecord to doc Getting Started with …
Configuration menu - View commit details
-
Copy full SHA for cb9caa3 - Browse repository at this point
Copy the full SHA cb9caa3View commit details
Commits on Apr 11, 2024
-
Summary: Pull Request resolved: #2981 As titled, a quick follow up of D55907750 Reviewed By: lucylq Differential Revision: D55996735 fbshipit-source-id: f535b013b7b900c5a2c2ed79f6b6738dcf1f91ec
Configuration menu - View commit details
-
Copy full SHA for 75c27c3 - Browse repository at this point
Copy the full SHA 75c27c3View commit details -
Forward fix macOS job after test-infra #5086 (#2980)
Summary: After pytorch/test-infra#5086, the working directory is now set correctly, so `pushd` isn't needed anymore. More importantly, trying to change the directory ends up failing all macOS CI jobs because that subdirectory doesn't exist. Pull Request resolved: #2980 Reviewed By: larryliu0820 Differential Revision: D55996299 Pulled By: huydhn fbshipit-source-id: 05758603d7628cc0a01fd577a49202d45c84e6c5
Configuration menu - View commit details
-
Copy full SHA for 2fc99b0 - Browse repository at this point
Copy the full SHA 2fc99b0View commit details -
Add a mock perf test for llama2 on Android (#2963)
Summary: I'm trying to setup a simple perf test when running llama2 on Android. It's naively sent a prompt and record the TPS. Open for comment about the test here before setting this up on CI. ### Testing Copy the exported model and the tokenizer as usual, then cd to the app and run `./gradlew :app:connectAndroidTest`. The test will fail if the model is failed to load or if the TPS is lower than 7 as measure by https://github.com/pytorch/executorch/tree/main/examples/models/llama2 Pull Request resolved: #2963 Reviewed By: kirklandsign Differential Revision: D55951637 Pulled By: huydhn fbshipit-source-id: 34c189aefd7e31514fcf49103352ef3cf8e5b2c9
Configuration menu - View commit details
-
Copy full SHA for d761f99 - Browse repository at this point
Copy the full SHA d761f99View commit details -
Core ML Has Added
Index_Put
Support, No Need to Skip Anymore (#2975)Summary: It was a workaround to skip `aten.index_put` op in Core ML delegation, at the cost of partitioning the Llama model into 13 pieces. For better performance, we prefer to delegate the whole model to Core ML. Since Core ML has added the [necessary support](apple/coremltools#2190), it is time to revert this workaround Pull Request resolved: #2975 Reviewed By: kirklandsign Differential Revision: D56002979 Pulled By: cccclai fbshipit-source-id: e7a7c8c43706cb57eba3e6f720b3d713bec5065b
Configuration menu - View commit details
-
Copy full SHA for 7d4bafc - Browse repository at this point
Copy the full SHA 7d4bafcView commit details -
Summary: It's not obvious that there are two different versions of the documentation. Reviewed By: iseeyuan Differential Revision: D56018543 fbshipit-source-id: 09e5facf3c2f2faaf216ebc76cd5c21697dbcb37
Configuration menu - View commit details
-
Copy full SHA for 7c71970 - Browse repository at this point
Copy the full SHA 7c71970View commit details -
Add llama2 readme in examples/README (#2992)
Summary: Pull Request resolved: #2992 We should promote the llama2 page more in https://github.com/pytorch/executorch/tree/main/examples/ bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: iseeyuan Differential Revision: D56018978 fbshipit-source-id: cbbc7bd2ea4ce55e564bd6b4a2900f623599dde6
Configuration menu - View commit details
-
Copy full SHA for e641ffc - Browse repository at this point
Copy the full SHA e641ffcView commit details -
Use new API to register custom ops for llama model (#2916)
Summary: Pull Request resolved: #2916 Retry of D55713944 Use `EXECUTORCH_LIBRARY` to register custom kernel to ExecuTorch runtime. Reviewed By: lucylq Differential Revision: D55856491 fbshipit-source-id: 0e17ea18a7cd0b0b45a8e56e9d09ab5e2f8eb95e
Configuration menu - View commit details
-
Copy full SHA for 6e43135 - Browse repository at this point
Copy the full SHA 6e43135View commit details -
Configuration menu - View commit details
-
Copy full SHA for c7fd394 - Browse repository at this point
Copy the full SHA c7fd394View commit details -
Update name from xtensa to cadence (#2982)
Summary: Pull Request resolved: #2982 As titled. Reviewed By: cccclai Differential Revision: D55998135 fbshipit-source-id: a57bd233afe170290c7def4406d6d6e769d467ed
Configuration menu - View commit details
-
Copy full SHA for 7b8343b - Browse repository at this point
Copy the full SHA 7b8343bView commit details -
Use new API to register custom ExecuTorch kernels into ATen (#2937)
Summary: Pull Request resolved: #2937 Retry of D55713944 Use `WRAP_TO_ATEN` to register custom ExecuTorch kernel to PyTorch. This PR added installation logic to `libcustom_ops_aot_lib.so` in `setup.py`. This is to make sure we can build `libcustom_ops_aot_lib.so` and install it to the correct position (`<site-packages>/executorch/examples/models/llama2/custom_ops/libcustom_ops_aot_lib.so`) and then it can be loaded by `torch.ops.load_library`. Reviewed By: lucylq Differential Revision: D55907749 fbshipit-source-id: 6b7f9af3c68b31f6df780a041291684eb6ddd90f
Configuration menu - View commit details
-
Copy full SHA for c322685 - Browse repository at this point
Copy the full SHA c322685View commit details -
Summary: Pull Request resolved: #2843 et-view should always copy the data pointer. Reviewed By: JacobSzwejbka Differential Revision: D55715318 fbshipit-source-id: 9745cfc3a84e40cfc29fe6c6a4cbe4151d14d68c
Configuration menu - View commit details
-
Copy full SHA for 1f6f711 - Browse repository at this point
Copy the full SHA 1f6f711View commit details -
Replace view copy with view (3/3) (#2463)
Summary: Pull Request resolved: #2463 Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib This stack replaces view_copy nodes with memory.view nodes. In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node. This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node). Note that this pass combined with dead-code elimination removes redundant view copies. This is because a redundant view copy will have no users have this pass. In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes. A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission. A memory.view node has a special TensorSpec of type _MemoryViewSpec. This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec. Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec. Not all view_copy nodes are converted to memory.view nodes. Only static nodes that are memory planned are converted. Not all static nodes are memory planned in ExecuTorch. For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned. Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted. We can expand this list over time. In the third diff (D54827438), I implement the actual view_copy elimination. In the ExecutorchBackendConfig, there is a new option remove_static_view_copy. If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today). Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass. The first two steps are the just the first and second diff described above. In config.to_out_var_pass, the memory.view nodes are skipped. In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base. Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it. Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected. (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node. This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.) Finally, during emission the memory.view is emitted as an evalue. There are two more diffs on the stack D54866523 and D54866539. The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination. The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination. Reviewed By: larryliu0820 Differential Revision: D54827438 fbshipit-source-id: ed29b9b2653f512ef3b4006e159d225f835ebbf6
Configuration menu - View commit details
-
Copy full SHA for 62a4dd3 - Browse repository at this point
Copy the full SHA 62a4dd3View commit details -
Skip annotate boolean input (#2957)
Summary: Pull Request resolved: #2957 ghstack-source-id: 222200589 exported-using-ghexport It only makes sense to quantize fp tensor, but not boolean. Add a check to make sure only fp tensor are annotated in quantizer Reviewed By: jerryzh168 Differential Revision: D55946526 fbshipit-source-id: d94bfee38ab2d29fc9672ab631b4d5d0c5239d25
Configuration menu - View commit details
-
Copy full SHA for ce344bc - Browse repository at this point
Copy the full SHA ce344bcView commit details -
Fix build-framework-ios CI job (#2996)
Summary: As titled. `build_apple_frameworks.sh` is copying all the exported headers out and in #2934 `//executorch/schema:program` is being moved to `exported_deps` and causing `build_apple_frameworks.sh` to not able to copy generated headers `program_generated.h` and `scalar_type_generated.h`. This PR fixes it by moving it back to `deps`. Pull Request resolved: #2996 Reviewed By: kirklandsign Differential Revision: D56028952 Pulled By: larryliu0820 fbshipit-source-id: 2cd4999154877b0ac7b49cd1f54d518cba34b2f2
Configuration menu - View commit details
-
Copy full SHA for 3b727a7 - Browse repository at this point
Copy the full SHA 3b727a7View commit details -
Extend constant prop pass to work with int/float/etc scalars and fix …
…input specs. (#2950) Summary: Pull Request resolved: #2950 1. Cleanup / Refactor constant prop pass. 2. Enable constant propagation for ops with constant scalar arguments -- int/float/dtype/bool/str. Nodes of type `Op(constant_tensor, some_int, some_float, some_dtype, ...)` can now be constant propagated. 3. Fix order of input spec to match the expected spec in `ExportGraphSignature` class. parameters->buffers->constants->user_inputs. Before this diff, input_specs for the newly added constant tensors were appended to graph_signature, which would cause failures. Reviewed By: dulinriley Differential Revision: D55891278 fbshipit-source-id: fe1867cb6a99d0140d6a2e076027688cb1ddc0cd
Configuration menu - View commit details
-
Copy full SHA for 5ef8427 - Browse repository at this point
Copy the full SHA 5ef8427View commit details
Commits on Apr 12, 2024
-
Introduce
vTensorPtr
to prevent reference invalidation and remove `……get_val()` API (#2978) Summary: Pull Request resolved: #2978 ## Context Currently when writing operators developers will save a reference to a `vTensor` retrieved from a `ComputeGraph`'s list of `values_` like so: ``` vTensor& vten = graph.get_val(vref).toTensor(); ``` However, this is dangerous since if any values are added once the reference has been stored, `values_` which is a `std::vector` may have been resized and therefore have its contents moved, meaning the reference is now invalid. To protect against this, this changeset introduces the `vTensorPtr` class which is a wrapper around a `vTensor*`. When constructed, it will increment a counter in the `ComputeGraph` instance, and when destroyed it will decrement the counter. `ComputeGraph` cannot add any values while the counter is not zero. Since `Value` can be converted to other non-trivial types, this changeset also removes the `get_val` function entirely to guard against unsafe behaviour. ghstack-source-id: 222224052 exported-using-ghexport Reviewed By: jorgep31415 Differential Revision: D55984187 fbshipit-source-id: 22c619f651b5b3783c7626263694ca46b9f84723
Configuration menu - View commit details
-
Copy full SHA for 76d8513 - Browse repository at this point
Copy the full SHA 76d8513View commit details -
Add Tiktoken in python (#2986)
Summary: Tiktoken by OpenAI is a popular tokenizer. Pull Request resolved: #2986 Reviewed By: lucylq Differential Revision: D56004355 Pulled By: larryliu0820 fbshipit-source-id: 5656eba6fc6e550fc1d7356162da1d1897e43e78
Configuration menu - View commit details
-
Copy full SHA for 46cf1c7 - Browse repository at this point
Copy the full SHA 46cf1c7View commit details -
Summary: Pull Request resolved: #2442 Only need to look at tester.py file for the tester changes. Change is from `.run_method().compare_outputs() ` to `.run_method_and_compare_outputs()` now if Tester is initialized with dynamic inputs, we will generate random dynamic inputs (according to the specification of the dynamic shapes) to run on the model. This allows us to test that the inputs fed into the model can be dynamic. We ad a num_runs to run_method_and_compare_outputs so that we can choose to run a number of different dynamic inputs with dynamic shapes. Reviewed By: digantdesai, kirklandsign Differential Revision: D54650121 fbshipit-source-id: a813816cf19850219ec0962aaf6592f1047e85c8
Configuration menu - View commit details
-
Copy full SHA for 65be9b4 - Browse repository at this point
Copy the full SHA 65be9b4View commit details -
dynamic qd8-fc test with 2 batch dims (#2441)
Summary: Pull Request resolved: #2441 Adding the first dynamic input test, in which we test DQ Linear where it's inputs have rank = 3. Reviewed By: digantdesai, kirklandsign Differential Revision: D54665767 fbshipit-source-id: 3c6c7eb0a10b32f390effeb9ae88b74df21e823f
Configuration menu - View commit details
-
Copy full SHA for bf59da6 - Browse repository at this point
Copy the full SHA bf59da6View commit details -
Summary: Pull Request resolved: #2440 adding dynamism to mobilenetv2 and testing Reviewed By: kirklandsign Differential Revision: D54666427 fbshipit-source-id: 5699636bbd18598ab26adb5054824c5a38534396
Configuration menu - View commit details
-
Copy full SHA for 1f5a833 - Browse repository at this point
Copy the full SHA 1f5a833View commit details -
Summary: Pull Request resolved: #2475 Test to verify dynamic mv3 Reviewed By: digantdesai, kirklandsign Differential Revision: D54972684 fbshipit-source-id: c3573f17bd26dc391d249b7c15217b7e500e9adf
Configuration menu - View commit details
-
Copy full SHA for fec9c2f - Browse repository at this point
Copy the full SHA fec9c2fView commit details -
Summary: Pull Request resolved: #2474 Test for dynamic resnet. ResNet has some restrictions on the input shape, so we create a dynamic version by bilinear resizing the input to resnet's fixed shape. Thus we test that dynamic bilinear resize correctly resizes to fixed shape Reviewed By: digantdesai, kirklandsign Differential Revision: D54972682 fbshipit-source-id: f8a1128437ca9c562ccc3eb5ff03545455b548fa
Configuration menu - View commit details
-
Copy full SHA for 33f41bd - Browse repository at this point
Copy the full SHA 33f41bdView commit details -
Summary: Pull Request resolved: #2476 Tests for Dynamic ViT We make ViT dynamic by bilinear resizing the input before feeding to ViT Reviewed By: digantdesai, kirklandsign Differential Revision: D54972681 fbshipit-source-id: 626195d07d45c05112dfd251005c407a6444a87b
Configuration menu - View commit details
-
Copy full SHA for d1bc794 - Browse repository at this point
Copy the full SHA d1bc794View commit details -
Summary: Pull Request resolved: #2965 Reviewed By: larryliu0820 Differential Revision: D55953027 fbshipit-source-id: 1e5f60e46daf3591167b8c703e5452b3125b7904
Configuration menu - View commit details
-
Copy full SHA for ab323a5 - Browse repository at this point
Copy the full SHA ab323a5View commit details -
Add exir.save and exir.load with export_serialize (#3000)
Summary: Pull Request resolved: #3000 Adding exir.save and exir.load similar to torch.export.save and torch.export.load for saving and loading edge exported program's. Reviewed By: cccclai Differential Revision: D56037593 fbshipit-source-id: dc2a11b836baf479fcf6e23f33b345cb239f3ac5
Configuration menu - View commit details
-
Copy full SHA for 6acc86f - Browse repository at this point
Copy the full SHA 6acc86fView commit details -
Summary: * Apple / build-frameworks-ios / macos-job We removed libcustom_ops_lib.a in #2916 so need to remove it from `build_apple_frameworks.sh`. * Lint / lintrunner / linux-job Remove extra line in backends/qualcomm/quantizer/utils.py * pull / unittest / macos (buck2) / macos-job Fix it by using `executorch_no_prim_ops` instead of `executorch` in MPS and CoreML. Pull Request resolved: #3006 Reviewed By: lucylq Differential Revision: D56048430 Pulled By: larryliu0820 fbshipit-source-id: 9dcb476eea446ea3aba566d595167c691fb00eec
2Configuration menu - View commit details
-
Copy full SHA for 5b7c4ba - Browse repository at this point
Copy the full SHA 5b7c4baView commit details -
Add util to print out ops and frequency (#2983)
Summary: Pull Request resolved: #2983 As titled. Reviewed By: cccclai Differential Revision: D56001227 fbshipit-source-id: cefef12662e03171136f03138fb814d61a28a0f3
Configuration menu - View commit details
-
Copy full SHA for b1edc3d - Browse repository at this point
Copy the full SHA b1edc3dView commit details -
Decouple custom ops in llama_transformer.py Part 1/N (#3005)
Summary: This is a no-op Pull Request resolved: #3005 Test Plan: CI Run with `python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -kv --use_sdpa_with_kv_cache -X` and with `python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -kv -X` Make sure both work Reviewed By: cccclai Differential Revision: D56048177 Pulled By: mergennachin fbshipit-source-id: 3ac9ac5c34f6fe215de1cfe8b5ddc7aae3635359
Configuration menu - View commit details
-
Copy full SHA for 488afc5 - Browse repository at this point
Copy the full SHA 488afc5View commit details -
Decouple custom ops in llama_transformer.py Part 2/N (#3007)
Summary: Pull Request resolved: #3007 Keep llama_transformer.py to look like stock implementation, so that it can be reused everywhere. Do module swap Reviewed By: cccclai Differential Revision: D56048640 fbshipit-source-id: 76de1b09b7f5d79422bb3b32bc830a9a7ecd935c
Configuration menu - View commit details
-
Copy full SHA for 74eb8b3 - Browse repository at this point
Copy the full SHA 74eb8b3View commit details -
Summary: Pull Request resolved: #3012 Reviewed By: mergennachin Differential Revision: D56074130 Pulled By: jerryzh168 fbshipit-source-id: 53e8a1db6ef802789469f1e5ba6c79c03a16e5e1
Configuration menu - View commit details
-
Copy full SHA for 0f379ba - Browse repository at this point
Copy the full SHA 0f379baView commit details -
add more instructions and examples on Delegation (#2973)
Summary: Pull Request resolved: #2973 as title. Reviewed By: vmpuri, byjlw Differential Revision: D55988177 fbshipit-source-id: 8cdc953118ecd22e8e9a809f0dd716a30a7fc117
Configuration menu - View commit details
-
Copy full SHA for 17c64a3 - Browse repository at this point
Copy the full SHA 17c64a3View commit details -
Run LlamaDemo app on AWS Device Farm (#3004)
Summary: This upload the built LlamaDemo app to S3 and use them to run the test on Device Farm Pull Request resolved: #3004 Reviewed By: kirklandsign Differential Revision: D56073767 Pulled By: huydhn fbshipit-source-id: 088a1af2463f035dcc8b06ec96d83162746f2df1
Configuration menu - View commit details
-
Copy full SHA for cd248b4 - Browse repository at this point
Copy the full SHA cd248b4View commit details -
Remove RemoveRedundantViewCopyPass (#2464)
Summary: Pull Request resolved: #2464 The RemoveRedundantViewCopyPass is unnecessary and can be replaced by NormalizeViewCopyBasePass + dead code elimintation. Reviewed By: larryliu0820 Differential Revision: D54866523 fbshipit-source-id: 106b8c4a15cf2e68014ccc6a85027e47517195ef
Configuration menu - View commit details
-
Copy full SHA for c075eea - Browse repository at this point
Copy the full SHA c075eeaView commit details -
Change tokenizer name to bpe_tokenizer and extract a base class (#3009)
Summary: Pull Request resolved: #3009 We want to be able to support more than 1 implementation of tokenizer. Currently `tokenizer.cpp` is adopted from `llama2.c` but we also wanted to support `Tiktoken` (will be added in next PR). This PR extract out a base class `Tokenizer` and make it extendable by different implementations. Reviewed By: mergennachin Differential Revision: D56052583 fbshipit-source-id: bd9143957165211b1f600f781233b9ceff440cc1
Configuration menu - View commit details
-
Copy full SHA for 21fdc4e - Browse repository at this point
Copy the full SHA 21fdc4eView commit details
Commits on Apr 13, 2024
-
Update README.md and add submodule update (#3029)
Summary: Without the submodule update, install_requitements would not work. Add this step in the documentation's README.md Pull Request resolved: #3029 Reviewed By: lucylq Differential Revision: D56087389 Pulled By: iseeyuan fbshipit-source-id: fd96530b44f81b6dfcea07faccef06f6348fa373
Configuration menu - View commit details
-
Copy full SHA for cd32712 - Browse repository at this point
Copy the full SHA cd32712View commit details -
Throw in VK_GET_OP_FN if op is not found (#3028)
Summary: Pull Request resolved: #3028 Make yipjustin happy. Forgot this safeguard when I originally wrote the `OperatorRegistry` class. Reviewed By: SS-JIA Differential Revision: D56085588 fbshipit-source-id: ba116eab8054e3610011fd0c8ffc0aabe61ae8ea
Configuration menu - View commit details
-
Copy full SHA for 4d7dd03 - Browse repository at this point
Copy the full SHA 4d7dd03View commit details -
update the pinned pytorch hash (#2824)
Summary: This PR is auto-generated nightly by [this action](https://github.com/pytorch/executorch/blob/main/.github/workflows/nightly.yml). Update the pinned pytorch hash. Pull Request resolved: #2824 Reviewed By: mergennachin Differential Revision: D55814757 Pulled By: guangy10 fbshipit-source-id: cea55d3468ae7155906a44d038e25e53c207dcef
Configuration menu - View commit details
-
Copy full SHA for c095046 - Browse repository at this point
Copy the full SHA c095046View commit details
Commits on Apr 14, 2024
-
Summary: Previously this code conformed from clang-format 12. Reviewed By: igorsugak Differential Revision: D56065247 fbshipit-source-id: f5a985dd8f8b84f2f9e1818b3719b43c5a1b05b3
Configuration menu - View commit details
-
Copy full SHA for c61ef44 - Browse repository at this point
Copy the full SHA c61ef44View commit details -
oss: Upgrade
clap
, addstring
feature (#3035)Summary: Pull Request resolved: #3035 ^ Reviewed By: stepancheg Differential Revision: D56115188 fbshipit-source-id: 67b1293d26adc77973a7c17808fb2d958da2d04f
Configuration menu - View commit details
-
Copy full SHA for 57dd7f1 - Browse repository at this point
Copy the full SHA 57dd7f1View commit details
Commits on Apr 15, 2024
-
Reviewed By: zertosh Differential Revision: D56139356 fbshipit-source-id: a740606db6e308ed133caa3f0756c2a53d7dce7b
Configuration menu - View commit details
-
Copy full SHA for 057e432 - Browse repository at this point
Copy the full SHA 057e432View commit details -
Fix handling constant inputs when delegating (#3031)
Summary: Pull Request resolved: #3031 Reviewed By: mcr229 Differential Revision: D56089279 fbshipit-source-id: 15f0b621b2efbc317c25f8b75907ff6c28ac2c6d
Configuration menu - View commit details
-
Copy full SHA for 7616d42 - Browse repository at this point
Copy the full SHA 7616d42View commit details -
Fix lint in clang-format (#3041)
Summary: Pull Request resolved: #3041 We are updating to clang-formatter 18. The current clang-format in coreml code has duplicate key. Deleting one of them. See context D56139356 bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: cccclai Differential Revision: D56139927 fbshipit-source-id: 937f58092abd6f695304ee2a5dd38bc4b8412ec0
Configuration menu - View commit details
-
Copy full SHA for 7c81155 - Browse repository at this point
Copy the full SHA 7c81155View commit details -
generation.py with kv cache (#3030)
Summary: python e2e generation, using tiktoken tokenizer. using text_completion, haven't tried chat_completion. Pull Request resolved: #3030 Test Plan: Imported from GitHub, without a `Test Plan:` line. Command, with prompt "Hello, I am" and seq_len = 10 ``` python -m examples.models.llama2.runner.generation --pte llama_4ckpts_x.pte --tokenizer tokenizer.model --prompt="Hello I am" --temperature=0 --params ../llama-models/llama3/params_less.json --max_gen_len=10 ``` fp32, xnn, kv fp32, xnn same results: ``` Result: [{'generation': ' a 25 year old woman. I am a'}] ``` fp32, xnn, int4 ``` Result: [{'generation': ' interested in the following products: - 1 x'}] ``` fp32, xnn, kv, sdpa (need investigation) ``` Result: [{'generation': 'ฉopteraenthalenthalenthalenthalenthalenthalenthalenthal'}] ``` Reviewed By: larryliu0820 Differential Revision: D56087430 Pulled By: lucylq fbshipit-source-id: 31c73fe87af8646bf2512e1a6aadc8804a101719
Configuration menu - View commit details
-
Copy full SHA for 645256d - Browse repository at this point
Copy the full SHA 645256dView commit details -
Clean up shader library and introduce some new conventions (#3024)
Summary: Pull Request resolved: #3024 ## Context This changeset introduces some fairly mechnical improvements to the Vulkan compute graph shader library in order to introduce some new conventions. **Note that backwards compatibility with existing shader authoring methods is preserved**. ### Only List `VALUE` in the `.yaml` files Previously, to generate variants for a combination of vales, the YAML file will contain ``` PACKING: - VALUE: CHANNELS_PACKED SUFFIX: C_packed - VALUE: WIDTH_PACKED SUFFIX: W_packed - VALUE: HEIGHT_PACKED SUFFIX: H_packed ``` however, the shader code generation script will use the `VALUE` as the `SUFFIX` if no `SUFFIX` is provided. Therefore, only the below is needed: ``` PACKING: - VALUE: C_packed - VALUE: W_packed - VALUE: H_packed ``` ### Change indexing utility macros to lowercase Indexing utility macros have been changed to lowercase, and the packing identifiers have been changed due to the change in YAML files. The change to lowercase is to make calls to the macro read more like functions (and indeed they are typically used as functions) in order to help make the code more readable. ``` POS_TO_COORD_${PACKING} -> pos_to_coord_${PACKING} ``` ### Use convention of defining macros in order to reduce Python code blocks usage Previously python code blocks were used in the GLSL code itself in order to vary the shader between different settings. However, usage of Python code blocks negatively impact code readability. Therefore, this diff seeks to introduce a convention of defining macros near the top of the shader to reduce the usage of Python code blocks, i.e. ``` #define pos_to_coord pos_to_coord_${PACKING} #define get_packed_dim get_packed_dim_${PACKING} #define get_packed_stride get_packed_stride_${PACKING} ``` ### Improve GLSL type definitions Previously, the following Python code blocks were used to determine appropriate vectorized and scalar types: ``` ${VEC4_T[DTYPE}} texel = ... ${T[DTYPE]} scalar = ... ``` This changeset replaces that with: ``` #define BUF_T ${buffer_scalar_type(DTYPE)} #define VEC4_T ${texel_type(DTYPE)} #define SCALAR_T ${texel_component_type(DTYPE)} layout(set = 0, binding = 1) buffer PRECISION restrict readonly Buffer { BUF_T data[]; } buffer_in; VEC4_T texel = ... SCALAR_T scalar = ... ``` The main differences are as such: * `buffer_scalar_type()` produces the same result as `T[DTYPE]` * `texel_type()` is not determined from a mapping with `DTYPE`, but is determined indirectly based on the image format that is associated with the `DTYPE`. * `texel_component_type()` is based on the result of `texel_type(DTYPE)` Essentially, the mapping is more in-line with what happens in code. The reason for this change is to enable FP16 support and is a bit complicated. Basically, we need a way to distinguish the scalar type used for buffer storage, vs the scalar type used to store a component of a vec4 type (hence `BUF_T` vs `SCALAR_T`). The reason this is required is that to support half-precision tensors, the buffer representation will use a 16-bit float type but textures will still extract to `vec4` (i.e. 4x34bit floats). ghstack-source-id: 222551445 Reviewed By: jorgep31415 Differential Revision: D56082461 fbshipit-source-id: 49fb8ff5fb0d8c48d0fadd8fd24184cc20db2147
Configuration menu - View commit details
-
Copy full SHA for 59023ed - Browse repository at this point
Copy the full SHA 59023edView commit details -
Move compile spec to ArmTester interface (#2991)
Summary: * Create compile spec builder * Added default compile spec for unit tests * Cleaned up some redundant parameters Pull Request resolved: #2991 Reviewed By: mergennachin Differential Revision: D56143727 Pulled By: digantdesai fbshipit-source-id: c34a7f1f6f073b558cca056eeaa4c810df6e25c6
Configuration menu - View commit details
-
Copy full SHA for 64497b7 - Browse repository at this point
Copy the full SHA 64497b7View commit details -
remove duplicate generate_lib_aten target under aten kernel (#2951)
Summary: Pull Request resolved: #2951 generate_lib and generate_lib_aten are exactly the same under executorch/kernels/aten. Remove the generate_lib_aten for better understanding. Reviewed By: larryliu0820 Differential Revision: D55937122 fbshipit-source-id: 5e7e7c06efbd4876874880627b67934d782473a2
Configuration menu - View commit details
-
Copy full SHA for 075fe40 - Browse repository at this point
Copy the full SHA 075fe40View commit details -
native_layer_norm (for width dim) (#3001)
Summary: Pull Request resolved: #3001 We implement `native_layer_norm` which has 3 outputs - normalization of the input tensor according to the given `normalized_shape` - mean - 1/sqrt(var + eps) ``` func: native_layer_norm(Tensor input, SymInt[] normalized_shape, Tensor? weight, Tensor? bias, float eps) -> (Tensor, Tensor, Tensor) ``` According to SS-JIA's suggestion, a model specific implementation is more performant and preferred to a generic one. So we implemented the op in the following optimized way - our current use case has `normalized_shape` of len 1, namely we do the normalization through computing the mean and var at the last width dim - we do the computation in just one shader `native_layer_norm.glsl` without invoking the shaders to compute mean and var respectively - we use [Welford's online algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm) to compute mean and variance in one pass Reviewed By: SS-JIA, jorgep31415 Differential Revision: D56005629 fbshipit-source-id: 096c2e2f04b95f1f5c9205c4827091169771978c
Configuration menu - View commit details
-
Copy full SHA for 74576e8 - Browse repository at this point
Copy the full SHA 74576e8View commit details -
Summary: Pull Request resolved: #3013 We implement [`aten.full.default`](https://pytorch.org/docs/stable/generated/torch.full.html) which has the following signature. ``` func: full(SymInt[] size, Scalar fill_value, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor ``` In order to bypass graph build error, we simply create null value for the following arg types: - torch.device - torch.dtype - torch.layout since they don't have any effect to our operator implementation on Vulkan. (Note that [`torch.layout`](https://pytorch.org/docs/stable/tensor_attributes.html#torch.layout) is a totally different concept from `GPUMemoryLayout` on Vulkan.) Reviewed By: jorgep31415 Differential Revision: D56049674 fbshipit-source-id: dc2a27b4e702829e077e874ccf697f6c4196756d
Configuration menu - View commit details
-
Copy full SHA for eb44e88 - Browse repository at this point
Copy the full SHA eb44e88View commit details -
Summary: Pull Request resolved: #3015 C++ implementation of Tiktoken. Added unit tests. Reviewed By: lucylq Differential Revision: D56053255 fbshipit-source-id: 3d2f6e30a2a16d6311506fe17176d412fca7222e
Configuration menu - View commit details
-
Copy full SHA for 49d1f02 - Browse repository at this point
Copy the full SHA 49d1f02View commit details -
Summary: Pull Request resolved: #3044 Test Plan: Imported from GitHub, without a `Test Plan:` line. ``` python -m examples.models.llama2.eval_llama --pte llama3_4_ckpts_x.pte -p ../llama-models/llama3/params_less.json -t ../llama-models/llama3/tokenizer.model --max_seq_len=127 --limit 5 wikitext: {'word_perplexity,none': 22.00035213493939, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.8289244201951567, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.8709954573378033, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} ``` Reviewed By: larryliu0820 Differential Revision: D56163999 Pulled By: lucylq fbshipit-source-id: db255a6e49a3e9b6db92c9f94fe9e7fcb475c924
Configuration menu - View commit details
-
Copy full SHA for 780ed25 - Browse repository at this point
Copy the full SHA 780ed25View commit details
Commits on Apr 16, 2024
-
Update pytorch commit pin to 04/15 (#3047)
Summary: Pull Request resolved: #3047 Reviewed By: lucylq Differential Revision: D56166332 fbshipit-source-id: d98f2c18e63e15a78bbd5c893ef9c5aa5e1ddd5f
Configuration menu - View commit details
-
Copy full SHA for 15f141b - Browse repository at this point
Copy the full SHA 15f141bView commit details -
Summary: Pull Request resolved: #2976 Conv1d uses static reshape operator, in order to convert 3d tensor to 4d tensor so xnnpack can operate using conv2d. For dynamism, reshape only accepts a single dynamic dimension, which is denoted as dynamic with a dim of 0. Reviewed By: digantdesai, kirklandsign Differential Revision: D55815092 fbshipit-source-id: a3c96bc5c86c130291c1d54f8174a6ff5d25a6b8
Configuration menu - View commit details
-
Copy full SHA for 7b375fe - Browse repository at this point
Copy the full SHA 7b375feView commit details -
Fix iOS build by excluding external CoreML SDK dependencies (#3043)
Summary: Pull Request resolved: #3043 CoreML delegate SDK integration broke the app build. Getting the SDK integration to work properly internally will require buckifying the third-party targets on which the CoreML delegate SDK itself depends (not to be confused with the third-party dependencies from ET itself). Running the `install_requirements.sh` script (CoreML's, not the generic ET one) clones a bunch of Git repos, XCode-specific tooling, and generates Protobuf headers on which their SDK integration relies. To avoid this, we simply add the `BUILD_SDK` flag and set it to false and disable building the SDK and exclude references to generated headers. Reviewed By: kirklandsign Differential Revision: D55456558 fbshipit-source-id: 6ab931b39298ee0a4a4b238699c64c84952e180e
Configuration menu - View commit details
-
Copy full SHA for d0208d0 - Browse repository at this point
Copy the full SHA d0208d0View commit details -
Summary: Pull Request resolved: #3033 Port over the `select.int` shaders to ET. 1. Since in ET, tensor-shape reasoning happens in AOT, therefore we can simplify the c++ caller code by a lot. 2. In this diff, we also try to use the same buffer object for passing arguments to all shaders. Not worry about perf cost, since cost difference between passing int and ivec4 is very minor. Reviewed By: SS-JIA Differential Revision: D56082483 fbshipit-source-id: f3a28712714034375eb86f6f5c6b6a3e23d525e8
Configuration menu - View commit details
-
Copy full SHA for 458d743 - Browse repository at this point
Copy the full SHA 458d743View commit details -
4b quantized embedding table operator (#3050)
Summary: Pull Request resolved: #3050 4b quantized embedding table operator Reviewed By: mikekgfb Differential Revision: D56123408 fbshipit-source-id: 26293e2b09f93ccb8f14462de7ae0969efc7acc5
Configuration menu - View commit details
-
Copy full SHA for 3b31eff - Browse repository at this point
Copy the full SHA 3b31effView commit details -
Fix test_llama_runner by hiding tiktoken (#3055)
Summary: Pull Request resolved: #3055 We don't always want to build tiktoken dependencies (re2 and abseil) so this PR only build it if the option is on. Reviewed By: iseeyuan Differential Revision: D56178928 fbshipit-source-id: 8021d1526ad6e89c929183f368c0fb25a4808b6f
Configuration menu - View commit details
-
Copy full SHA for 473c98c - Browse repository at this point
Copy the full SHA 473c98cView commit details -
Bump Vulkan API requirement to 1.1 and enable 16 bit and 8 bit types …
…in buffer storage (#3058) Summary: Pull Request resolved: #3058 ## Context Enable use of explicit fp16 and int8 types in GPU storage buffers via the following extensions: * [VK_KHR_16bit_storage](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_16bit_storage.html) * [VK_KHR_8bit_storage](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_8bit_storage.html) * [VK_KHR_shader_float16_int8](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_shader_float16_int8.html) The first two enables usage of 16-bit and 8-bit types in storage buffers, while the last one enables using those types in arithmetic operations. By enabling these extensions and checking that the device supports the required features, explicit fp16 and int8 types can be used in compute shaders, as demonstrated by the added test. Vulkan 1.1 is required in order to access `vkGetPhysicalDeviceFeatures2`, which is required to query whether the device support 16bit and 8bit types. This should be a fairly straightforward version bump as Vulkan 1.1 is supported by the vast majority of Android devices. ghstack-source-id: 222727208 exported-using-ghexport Reviewed By: jorgep31415 Differential Revision: D56164239 fbshipit-source-id: 879804567ff08201933a220c9f168f435af80019
Configuration menu - View commit details
-
Copy full SHA for d481c11 - Browse repository at this point
Copy the full SHA d481c11View commit details -
Enable FP16 type in operators (#3059)
Summary: Pull Request resolved: #3059 ## Context Enable half precision shader computation using the `GL_EXT_shader_16bit_storage` extension that was enabled in the change just below this stack. ghstack-source-id: 222727209 Reviewed By: jorgep31415 Differential Revision: D56189470 fbshipit-source-id: 0eb5990651ad34e5a2ada601a0d3944dfe2ae9ea
Configuration menu - View commit details
-
Copy full SHA for ab62707 - Browse repository at this point
Copy the full SHA ab62707View commit details -
Fix formatting issues in executorch/test/size_test.cpp (#3065)
Summary: Pull Request resolved: #3065 Required for LLVM-17. Fixes a mismatch between what the format string expects and the type supplied. Reviewed By: tarun292 Differential Revision: D56206887 fbshipit-source-id: f52883cb43840b34b5d5b25711f73bc71979da30
Configuration menu - View commit details
-
Copy full SHA for 9931301 - Browse repository at this point
Copy the full SHA 9931301View commit details -
ETRecord ser/de handling "None" outputs and more (#3039)
Summary: Pull Request resolved: #3039 For the ease of communication, let me assign nicknames to the files related to this diff: * File A: *caffe2/torch/_export/serde/serialize.py* * File B: *executorch/exir/serde/serialize.py* * File C: *executorch/exir/serde/export_serialize.py* Recently, we noticed that error `torch._export.serde.serialize.SerializeError: Unable to deserialize output node Argument(as_none=[])` (P1210590561) was thrown from File B when deserializing ETRecord. It's possible that the error has been there since the beginning, but we've just never tested that logic path. In this diff, I made a fix on File B to resolve this particular issue. Also adding handling for "None" output case in sdk logic. ***Keep on reading if you don't think the code changes make sense:*** I explored the history of file changes. In chronological order: 1. D48258552, `deserialize_graph_output()` was copied from File A to File B, with some modifications made. The `deserialize_graph_output()` in File B overrides that in File A due to polymorphism. 2. D52446586, File C was created by ***copying*** File A. As a result of this diff, the `deserialize_graph_output()` in File B now overrides that in File C. 3. Also in D52446586, the `deserialize_graph_output()` in File A had some significant changes; File C got the new version of `deserialize_graph_output()`. But this diff didn't update the `deserialize_graph_output()` in File B. 4. D55391674 added the handling for "None" outputs to File A. This diff brings (parts of) File C up-to-date with File A, and make `deserialize_graph_output()` in File B properly overrides that in File A. In the future, we should figure out how to keep File C and File A in sync. Recently, File C was broken because it didn't stay in sync with File A in D54855251 and had to be fixed by D55776877. There will be a design review session this Friday to discuss consolidating the serialization code for edge and export. Reviewed By: tarun292 Differential Revision: D56091104 fbshipit-source-id: 20c75ddc610c3be7ab2bb62943419d3b8b2be079
Configuration menu - View commit details
-
Copy full SHA for 89cfa73 - Browse repository at this point
Copy the full SHA 89cfa73View commit details -
Summary: Pull Request resolved: #3045 Reviewed By: clee2000 Differential Revision: D56201946 Pulled By: svekars fbshipit-source-id: 4212c24b02a1229ff06137b0d437b4e8c5dd454e
Configuration menu - View commit details
-
Copy full SHA for c73bfc0 - Browse repository at this point
Copy the full SHA c73bfc0View commit details -
Add int16 support to aten_bridge (#3069)
Summary: Pull Request resolved: #3069 Running an executorch program via pybindings requires the aten_bridge. This currently fails if the model uses the `int16` dtype. This diff adds support for the type by adding it to the conversion switch statements. Reviewed By: tarun292 Differential Revision: D56199304 fbshipit-source-id: 19a6815cf2885dda72febf247c3ca3bde91193a8
Configuration menu - View commit details
-
Copy full SHA for eb664a0 - Browse repository at this point
Copy the full SHA eb664a0View commit details -
fix linear recomposition (#3064)
Summary: Pull Request resolved: #3064 Fixes the torchat ci where we are failing with expand copy. Reviewed By: digantdesai, mikekgfb, kirklandsign Differential Revision: D56204667 fbshipit-source-id: 1d648460b59785884c33cdd479eb9c4c7d452a2a
Configuration menu - View commit details
-
Copy full SHA for 4b6d2c3 - Browse repository at this point
Copy the full SHA 4b6d2c3View commit details
Commits on Apr 17, 2024
-
Set kernel default visibility to hidden (#3060)
Summary: Pull Request resolved: #3060 When we compile the kernel into a shared library, we don't know whether the definition of kernel implementation symbol can be dropped or not based on op registry. The kernel itself is just a normal function and the user can find it. We set its visibility to hidden by default. Then these kernels are gone when we do `objdump -TC` This reduces binary size. --- This is not done in fbcode so far. When we compile in fbcode, seems that all dependency libraries is compiled into shared library, not static library. For example, op tests depends on op implementation through shared library. In that case, the hidden symbols are not exposed and could cause link time failure. In xplat, these dependencies are set to static libraries so it has no impact. Only when we explicitly build a shared library (for android), we hide the symbols and rely on op registry to store the impl. --- This applies to internal build only for now. We will re-visit this for OSS later. It's a step needed to make use of selective build for building shared library (android use case mainly) Reviewed By: dbort Differential Revision: D56167833 fbshipit-source-id: 98cd47836b616fc33dbc9af284d9e758b242b3a3
Configuration menu - View commit details
-
Copy full SHA for 54f9f3e - Browse repository at this point
Copy the full SHA 54f9f3eView commit details -
Fix Android llama2 demo app after #2962 (#3032)
Summary: This fixes the issue when the demo Android app fails to load llama2 model and returns an exit code 20. As this failure can be captured by running the instrumentation test suite on Android devices, I also add the test spec that I'm using there for future reference. ### Testing https://github.com/pytorch/executorch/actions/runs/8682469360/job/23808274556?pr=3032#step:12:80 loads the model successfully and shows the observed TSP now Pull Request resolved: #3032 Reviewed By: kirklandsign Differential Revision: D56124177 Pulled By: huydhn fbshipit-source-id: 7cc3987d186e670143f2ca739d29f02649091ec2
Configuration menu - View commit details
-
Copy full SHA for 9b55f48 - Browse repository at this point
Copy the full SHA 9b55f48View commit details -
Summary: Move noindex logic to the build job Pull Request resolved: #3071 Reviewed By: clee2000 Differential Revision: D56218857 Pulled By: svekars fbshipit-source-id: 69dff489d98eee046d69185a6c03d62fbae37a16
Configuration menu - View commit details
-
Copy full SHA for 5d7949d - Browse repository at this point
Copy the full SHA 5d7949dView commit details -
Handle empty (size=0) tensor in Inspector (#2998)
Summary: Pull Request resolved: #2998 Empty tensors are not handled so they throw errors. {F1484412951} Reviewed By: tarun292 Differential Revision: D56027102 fbshipit-source-id: a8dab52d9ba7eb0784a72493e9888cf63aefbb76
Configuration menu - View commit details
-
Copy full SHA for f14dc83 - Browse repository at this point
Copy the full SHA f14dc83View commit details -
Add quantized op support to llama runner (#3062)
Summary: Pull Request resolved: #3062 Reviewed By: lucylq, mikekgfb Differential Revision: D56197863 fbshipit-source-id: c564a99d10be70fb69e554687bd506d8ff13268e
Configuration menu - View commit details
-
Copy full SHA for 1f4b631 - Browse repository at this point
Copy the full SHA 1f4b631View commit details -
{executorch][llama] support mqa (#3080)
Summary: Pull Request resolved: #3080 This diff adds support for multi query attention for sdpa with kv cache bypass-github-export-checks Reviewed By: mikekgfb Differential Revision: D56228316 fbshipit-source-id: 29fdf78acf841b651476a39068940b616f076991
Configuration menu - View commit details
-
Copy full SHA for bae0387 - Browse repository at this point
Copy the full SHA bae0387View commit details -
Load missing state dict in edge program serialization (#3076)
Summary: Pull Request resolved: #3076 The state dict wasn't being passed in when ExportedProgram was being created after deserialization. Reviewed By: pssrawat Differential Revision: D56224054 fbshipit-source-id: 7c3f74999994b23616e626d7b9d68d1a9eeab0ae
Configuration menu - View commit details
-
Copy full SHA for 22dfc6a - Browse repository at this point
Copy the full SHA 22dfc6aView commit details -
Remove noindex from upload to gh-pages job (#3077)
Summary: Pull Request resolved: #3077 For some reason this wasn't removed in previous PR. Reviewed By: clee2000 Differential Revision: D56225136 fbshipit-source-id: bb18c5f36fd443dc01c2127d361911625be8352a
Configuration menu - View commit details
-
Copy full SHA for 65f2693 - Browse repository at this point
Copy the full SHA 65f2693View commit details -
forward fix ConstantArgument initialization (#3074)
Summary: Pull Request resolved: #3074 following up on https://www.internalfb.com/diff/D55506949 breaking an executorch call. Reviewed By: angelayi Differential Revision: D56220174 fbshipit-source-id: 041614c888ce2e55c08717d7da1430d4f787b816
Configuration menu - View commit details
-
Copy full SHA for ebde8e1 - Browse repository at this point
Copy the full SHA ebde8e1View commit details -
Fix llama2 README.md cmake instructions (#3096)
Summary: Pull Request resolved: #3096 As titled. The current instruction runs into issue due to our way of arranging `pthreadpool` and `cpuinfo` in CMake. Will need a bigger effort to clean them up. For now let's update the instruction to be able to run it. Reviewed By: mergennachin Differential Revision: D56251563 fbshipit-source-id: daf0b1ecb75abb90612efbd64108edc99a129efd
Configuration menu - View commit details
-
Copy full SHA for 980aaca - Browse repository at this point
Copy the full SHA 980aacaView commit details -
Fix build time warning (#3097)
Summary: Pull Request resolved: #3097 tensor.data_ptr() is deprecated. To avoid warning change it to tensor.const_data_ptr() Reviewed By: mergennachin Differential Revision: D56251975 fbshipit-source-id: c984ba33600c94da78a85060be5699042b12e83e
Configuration menu - View commit details
-
Copy full SHA for 5f9478d - Browse repository at this point
Copy the full SHA 5f9478dView commit details -
change call_delegate_autograd (#3073)
Summary: Pull Request resolved: #3073 Some changes angela told me to make 😂 Reviewed By: angelayi Differential Revision: D56222503 fbshipit-source-id: ab1e5194492df439effab550781f056d12eaba53
Configuration menu - View commit details
-
Copy full SHA for 20bf0db - Browse repository at this point
Copy the full SHA 20bf0dbView commit details -
remove exir.capture from dynamic_shape_propogation test (#3070)
Summary: Pull Request resolved: #3070 title Reviewed By: mergennachin Differential Revision: D56216416 fbshipit-source-id: 3ae317e3c2a8765ca3c2c460178526b0af4fb6ba
Configuration menu - View commit details
-
Copy full SHA for 73438a5 - Browse repository at this point
Copy the full SHA 73438a5View commit details -
Create __init__.py in example folder (#3093)
Summary: For my internal CentOS development env, `python -m examples/models/...` does not work with error message that module cannot be found. Adding this empty file fixes the issue. Pull Request resolved: #3093 Reviewed By: cccclai Differential Revision: D56242802 Pulled By: iseeyuan fbshipit-source-id: 33b98855682490ed1242b1cd2843e7963831915a
Configuration menu - View commit details
-
Copy full SHA for f729b2d - Browse repository at this point
Copy the full SHA f729b2dView commit details -
move mask as sdpa input instead of attribute (#3036)
Summary: Pull Request resolved: #3036 sdpa (https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) input is taking attention mask as input, refactor the sdpa module input closer to the sdpa input ghstack-source-id: 222650466 exported-using-ghexport Reviewed By: mergennachin Differential Revision: D56119739 fbshipit-source-id: d9adda66e540abc518b7ffb6a5ebd2aab1626b3b
Configuration menu - View commit details
-
Copy full SHA for b341223 - Browse repository at this point
Copy the full SHA b341223View commit details -
remove exir.capture from test_rpc.py (#3102)
Summary: Pull Request resolved: #3102 title Reviewed By: tarun292 Differential Revision: D56259168 fbshipit-source-id: be80eeb616d6634c563ff3f1746cc6dc4aad0b6a
Configuration menu - View commit details
-
Copy full SHA for 7e14c0e - Browse repository at this point
Copy the full SHA 7e14c0eView commit details -
Introduce
SpecVarList
to represent specialization constants (#3078)Summary: Pull Request resolved: #3078 ## Context Specialization constants are a useful tool to compile compute shaders with constants defined at runtime. The primary application of specialization constants is to define variables which may have an impact on how the code is compiled, for example: * the number of elements of an array * the range of a loop Compared to the shader codegen system, which produces a complete copy of the shader and for which variants must be defined at build time, specialization constants can be defined at runtime when the compute pipeline is built. Specialization constants are currently used to define local work group sizes in Vulkan, but the Compute API hard-codes the number of specialization constants accepted by the shader to 3. This changeset introduces the `SpecVar` and `SpecVarList` classes to manage specialization constants and enable additional specialization constants to be specified. ghstack-source-id: 222903462 exported-using-ghexport Reviewed By: copyrightly, jorgep31415 Differential Revision: D56225041 fbshipit-source-id: 88c94c09e380793c75edcb0a92c2987fac882431
Configuration menu - View commit details
-
Copy full SHA for 0815c2b - Browse repository at this point
Copy the full SHA 0815c2bView commit details -
Enable additional specialization constants in compute shaders (#3079)
Summary: Pull Request resolved: #3079 ## Context Building on top of the previous changeset in the stack, this changeset modifies shader dispatch APIs to accept additional specialization constants for a shader. ghstack-source-id: 222903463 Reviewed By: copyrightly, jorgep31415 Differential Revision: D56225042 fbshipit-source-id: 154c51f927116e4a658f224794ec354151398a8a
Configuration menu - View commit details
-
Copy full SHA for 78cb141 - Browse repository at this point
Copy the full SHA 78cb141View commit details -
Summary: Pull Request resolved: #3085 equivalent to `select.int` ghstack-source-id: 222407935 exported-using-ghexport Reviewed By: SS-JIA Differential Revision: D56092143 fbshipit-source-id: 2959069d87cef6f08aa0960e2f10a9416eb4109d
Configuration menu - View commit details
-
Copy full SHA for 49928bc - Browse repository at this point
Copy the full SHA 49928bcView commit details -
aten.permute_copy.default (#3086)
Summary: Pull Request resolved: #3086 Implementation adopted from LI, with clean-up. ghstack-source-id: 222906934 Reviewed By: copyrightly Differential Revision: D56093765 fbshipit-source-id: 0ed78ae06e5b106a92cf3c1fdc85179f1e829919
Configuration menu - View commit details
-
Copy full SHA for de00717 - Browse repository at this point
Copy the full SHA de00717View commit details -
Improve codegen for aten.permute (#3087)
Summary: Pull Request resolved: #3087 In the generated code, it uses CPU as reference implementation. Tricky part happens when CPU modify the stride for some indexing operations like `permute`, leading the return Tensor with a non-continous stride. When we create a `vk_out` tensor based on this non-continous tensor with `at::empty_like`, the `vk_out` tensor inherits the stride property. Leading to wrong answer when moving data back from staging. As a solution, we add `.continous()` to after `at::empty_like` to revert back to default stride. ghstack-source-id: 222417364 Reviewed By: SS-JIA Differential Revision: D56095204 fbshipit-source-id: d42777ec876e47465c892331b5f854203c9fb8ef
Configuration menu - View commit details
-
Copy full SHA for 28be9d6 - Browse repository at this point
Copy the full SHA 28be9d6View commit details -
make_seq_tensor in codegen (#3088)
Summary: Pull Request resolved: #3088 increasing sequence is very useful for development, particularly for "slicing" and "indexing" operations. ghstack-source-id: 222827546 Reviewed By: SS-JIA Differential Revision: D56095314 fbshipit-source-id: 1491bb2399581eb303472b572c74c070c833d654
Configuration menu - View commit details
-
Copy full SHA for 5fbd1f4 - Browse repository at this point
Copy the full SHA 5fbd1f4View commit details -
remove exir.capture from quant fusion test (#3106)
Summary: Pull Request resolved: #3106 title Reviewed By: jerryzh168 Differential Revision: D56264730 fbshipit-source-id: c434d3f9891063319fe78e52dfbc1b60b0c7e195
Configuration menu - View commit details
-
Copy full SHA for cca9f65 - Browse repository at this point
Copy the full SHA cca9f65View commit details -
Don't crash when execute_method fails (#3104)
Summary: Pull Request resolved: #3104 Currently, we hard crash the process when execute_method failed, and it's not catchable. Instead, we should return null to Java, so they can handle. Reviewed By: shoumikhin, cccclai Differential Revision: D56260831 fbshipit-source-id: 281aa53985e021e803444ea5ee1c89a1e4b66e6b
Configuration menu - View commit details
-
Copy full SHA for 9c2b41b - Browse repository at this point
Copy the full SHA 9c2b41bView commit details -
update readme to not use exir.capture (#3107)
Summary: Pull Request resolved: #3107 title Reviewed By: angelayi Differential Revision: D56265239 fbshipit-source-id: 3d2ed83bea645824819828a0a384970a736a688c
Configuration menu - View commit details
-
Copy full SHA for b3ac533 - Browse repository at this point
Copy the full SHA b3ac533View commit details -
remove exir.capture from example delegate test (#3101)
Summary: Pull Request resolved: #3101 title Reviewed By: cccclai Differential Revision: D56258614 fbshipit-source-id: 1f5d3a57926be2c54eba7d4f9df6d50f31fdbc63
Configuration menu - View commit details
-
Copy full SHA for 203ae40 - Browse repository at this point
Copy the full SHA 203ae40View commit details
Commits on Apr 18, 2024
-
throw Java exception when execution fails (#3112)
Summary: Pull Request resolved: #3112 Instead of logging, we throw a java exception and let user catch it. Reviewed By: dbcakadbc Differential Revision: D56270287 fbshipit-source-id: 9c581fb384c671ca14d2a4a8946654569ae953a6
Configuration menu - View commit details
-
Copy full SHA for b19d586 - Browse repository at this point
Copy the full SHA b19d586View commit details -
Handle missing data types. (#2984)
Summary: **Changes** - The runtime was failing if it encountered a datatype not supported by CoreML framework. The changes add support for all the datatypes that are supported by coremltools basically if `CoreMLBackend` can export a model then runtime would execute it. Complex types are not supported because `coremltools` doesn't support it. - Improves and cleans the multiarray copying code. - Adds portable ops to CoreML executor so that it can run a partitioned model. **Testing** - Tested partitioned model `coreml_stories.pte` - Added multiarray copying tests. Pull Request resolved: #2984 Reviewed By: kirklandsign Differential Revision: D56003795 Pulled By: shoumikhin fbshipit-source-id: fa1c7846f9510d68c359aed6761aedb2d10c6f46
Configuration menu - View commit details
-
Copy full SHA for d731866 - Browse repository at this point
Copy the full SHA d731866View commit details -
Documentation for Vulkan Delegate (#3113)
Summary: Pull Request resolved: #3113 imported-using-ghimport Test Plan: Imported from OSS Reviewed By: cccclai Differential Revision: D56279743 Pulled By: SS-JIA fbshipit-source-id: af55cdf2d8518c582b7d8deccb731c6bc442a1c9
Configuration menu - View commit details
-
Copy full SHA for 414cd05 - Browse repository at this point
Copy the full SHA 414cd05View commit details -
fix embedding_4bit resize (#3118)
Summary: Pull Request resolved: #3118 Reviewed By: larryliu0820 Differential Revision: D56282683 fbshipit-source-id: fa1f255bcc82929efeeeb1de1f259682bc11d8e5
Configuration menu - View commit details
-
Copy full SHA for 910f851 - Browse repository at this point
Copy the full SHA 910f851View commit details -
Delete llama_quantized lib (#3119)
Summary: Pull Request resolved: #3119 Delete llama_quantized lib, and move embedding_byte.dtype to exir pass Reviewed By: manuelcandales, mikekgfb Differential Revision: D56206703 fbshipit-source-id: 629a3c7c2d981a212dfb619ac9106ba9bf478b62
Configuration menu - View commit details
-
Copy full SHA for 6510625 - Browse repository at this point
Copy the full SHA 6510625View commit details -
Add quantized cmake option back to fix build-apple-framework (#3115)
Summary: As titled. Got too excited in #3062 and removed `EXECUTORCH_BUILD_QUANTIZED`. Looking at the CI job failure of `build-apple-framework` probably worth adding it back. Pull Request resolved: #3115 Test Plan: See that CI job pass Reviewed By: shoumikhin Differential Revision: D56281923 Pulled By: larryliu0820 fbshipit-source-id: e6ad411f763ff8e11d4fb1e0bc7037eb2cf69357
Configuration menu - View commit details
-
Copy full SHA for eb47c4e - Browse repository at this point
Copy the full SHA eb47c4eView commit details -
Fix typo in sub & clean up (#3100)
Summary: Pull Request resolved: #3100 Reviewed By: kirklandsign Differential Revision: D56255838 fbshipit-source-id: b6567320b557aeb287db66b43447db9caabebd13
Configuration menu - View commit details
-
Copy full SHA for e69a662 - Browse repository at this point
Copy the full SHA e69a662View commit details -
Free Vulkan delegate segments after compileModel (#3116)
Summary: Pull Request resolved: #3116 It's been a while since I had an impactful one-liner. :) Nothing innovative here, just reusing the same solution as [other backends](https://github.com/pytorch/executorch/blob/b19d5860568187f2567d93dd5e7cd5af32378d9f/backends/xnnpack/runtime/XNNPACKBackend.cpp#L47-L48). Reviewed By: yipjustin, copyrightly, SS-JIA Differential Revision: D56281665 fbshipit-source-id: 6b4c9d25ef085a394bcd2904903fff680b4f1794
Configuration menu - View commit details
-
Copy full SHA for e0b0647 - Browse repository at this point
Copy the full SHA e0b0647View commit details -
Summary: Pull Request resolved: #3121 Reviewed By: larryliu0820 Differential Revision: D56246346 fbshipit-source-id: ccf8c7ca0569a8c6381b54640dcf39adc2568773
Configuration menu - View commit details
-
Copy full SHA for 4c552d4 - Browse repository at this point
Copy the full SHA 4c552d4View commit details -
cherry-pick: Add required deps to pyproject.toml (#3117)
Summary: Cherry-pick 28f1c8c from release/0.2 into main These pip dependencies need to be present to build the pip wheel. Also, change the version to a stub that looks less like a real version, until we can hook up the logic to get the version from the git repo state. Pull Request resolved: #3117 Test Plan: Ran `./install_requirements.sh` in a new conda environment on my mac M1, and it built/installed the pip package successfully. Reviewed By: tugsbayasgalan Differential Revision: D56282487 Pulled By: dbort fbshipit-source-id: 81e575957ca4d1262eecb4dd5b480a88942371f6
Configuration menu - View commit details
-
Copy full SHA for f2e660b - Browse repository at this point
Copy the full SHA f2e660bView commit details -
Summary: Pull Request resolved: #3122 Reviewed By: mikekgfb Differential Revision: D56212361 fbshipit-source-id: 877f2d3d8b2c078e21b0ababdfbc4e447cd86374
Configuration menu - View commit details
-
Copy full SHA for 29faa2e - Browse repository at this point
Copy the full SHA 29faa2eView commit details -
fix llama-runner-linux-android (#3127)
Summary: Pull Request resolved: #3127 Reviewed By: larryliu0820, kirklandsign Differential Revision: D56306284 fbshipit-source-id: cb092c358cb2db021a368027e4efd78593bec9b4
Configuration menu - View commit details
-
Copy full SHA for ab02a9c - Browse repository at this point
Copy the full SHA ab02a9cView commit details -
Buck build - fix use_tiktoken config
Summary: Make it work bypass-github-export-checks Reviewed By: larryliu0820 Differential Revision: D56287998 fbshipit-source-id: 02b92c8110f7ea72055edd4c194858cc71b49093
Configuration menu - View commit details
-
Copy full SHA for 8d25288 - Browse repository at this point
Copy the full SHA 8d25288View commit details -
delete exir/experimental (#3109)
Summary: Pull Request resolved: #3109 unused so deleting Reviewed By: angelayi Differential Revision: D56271249 fbshipit-source-id: 79b624a0b45684ead4e89a410fc1e2267b5ad2a9
Configuration menu - View commit details
-
Copy full SHA for 944dd4c - Browse repository at this point
Copy the full SHA 944dd4cView commit details -
4b embedding quantizer (#3135)
Summary: Pull Request resolved: #3135 4b embedding quantizer Reviewed By: larryliu0820 Differential Revision: D56229021 fbshipit-source-id: 560911333b173b4d03c3c62769e6db4e2ab54c7b
Configuration menu - View commit details
-
Copy full SHA for 8fd92bc - Browse repository at this point
Copy the full SHA 8fd92bcView commit details -
Summary: Fix Android adb shell quotes. Tested prompt quote escapes locally. Pull Request resolved: #3094 Reviewed By: mergennachin Differential Revision: D56318301 Pulled By: digantdesai fbshipit-source-id: f9bf1b62a905006a8b440c57cf0bc29510a30637
Configuration menu - View commit details
-
Copy full SHA for 74204f4 - Browse repository at this point
Copy the full SHA 74204f4View commit details -
Adding Gotchas in README.md (#3138)
Summary: Pull Request resolved: #3138 Populating based on feedback from George from Arm Created from CodeHub with https://fburl.com/edit-in-codehub bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: digantdesai Differential Revision: D56319098 fbshipit-source-id: 6c15ef3c2cb3857b58c21d7b58a0cdf36077ee9d
Configuration menu - View commit details
-
Copy full SHA for 02ec589 - Browse repository at this point
Copy the full SHA 02ec589View commit details -
Update README.md for llama3 (#3141)
Summary: Pull Request resolved: #3141 Reviewed By: mergennachin Differential Revision: D56324924 Pulled By: orionr fbshipit-source-id: 7d3f2a7abec560d9d5cbeb767ce3f701b7db7e73
Configuration menu - View commit details
-
Copy full SHA for 523c2cb - Browse repository at this point
Copy the full SHA 523c2cbView commit details -
Summary: Pull Request resolved: #3129 aten.view_copy, supporting all packing. Using SS-JIA's idea to do a direct lookup. ghstack-source-id: 223111187 Reviewed By: SS-JIA Differential Revision: D56281400 fbshipit-source-id: 355493fc18c015523672665e7c1c37a4c92debdd
Configuration menu - View commit details
-
Copy full SHA for 1eed125 - Browse repository at this point
Copy the full SHA 1eed125View commit details
Commits on Apr 19, 2024
-
Update README.md on the evaluation parameters (#3139)
Summary: It's not clear how we got the perplexity numbers. Add the parameters we used to get those numbers. Pull Request resolved: #3139 Reviewed By: lucylq Differential Revision: D56319905 Pulled By: iseeyuan fbshipit-source-id: dc387cc84c2fe7a21e44642ff591000fd6728abb
Configuration menu - View commit details
-
Copy full SHA for 06beace - Browse repository at this point
Copy the full SHA 06beaceView commit details -
Add reference to the llama2 example for llama3 (#3142)
Summary: In conjunction with iseeyuan's changes, add an `examples/models/llama3/README.md` just in case people are looking for a Llama 3 folder in examples. Pull Request resolved: #3142 Reviewed By: mikekgfb Differential Revision: D56337484 Pulled By: orionr fbshipit-source-id: 0e122b2bbaa3bdcd95c83ed45a28b96cc0b24ba7
Configuration menu - View commit details
-
Copy full SHA for 3db0362 - Browse repository at this point
Copy the full SHA 3db0362View commit details -
Update Llama3 perplexity numbers in README.md (#3145)
Summary: Update Llama3 perplexity numbers in README.md, with 4-bit quantization with different group sizes. Pull Request resolved: #3145 Reviewed By: orionr Differential Revision: D56338045 Pulled By: iseeyuan fbshipit-source-id: 74d06da50758c82cc0efb899d134b52423cc3ec6
Configuration menu - View commit details
-
Copy full SHA for 060d151 - Browse repository at this point
Copy the full SHA 060d151View commit details -
add cpu device to run eval on cpu (#3133)
Summary: Pull Request resolved: #3133 `HFLM` from `lm_eval` can take cpu device. https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/huggingface.py#L95 Currently running `eval_llama` fails on cpu Reviewed By: lucylq Differential Revision: D56313161 fbshipit-source-id: ceb5e650c3d31b9f1a96583d0396264bbf16a102
Configuration menu - View commit details
-
Copy full SHA for b5085aa - Browse repository at this point
Copy the full SHA b5085aaView commit details -
Summary: Pull Request resolved: #3037 Add a simple sdpa so it's decomposed to simpler ops instead of the decompose F.scaled_dot_product_attention, which includes 29 ops including `torch.where` ``` def forward(self, q, k, v): aten_mul_scalar = executorch_exir_dialects_edge__ops_aten_mul_Scalar(q, 0.5946035575013605); q = None aten_full_default = executorch_exir_dialects_edge__ops_aten_full_default([8, 8], True, dtype = torch.bool, layout = torch.strided, device = device(type='cpu'), pin_memory = False) aten_arange_start_step = executorch_exir_dialects_edge__ops_aten_arange_start_step(0, 8, layout = torch.strided, device = device(type='cpu'), pin_memory = False) aten_unsqueeze_copy_default = executorch_exir_dialects_edge__ops_aten_unsqueeze_copy_default(aten_arange_start_step, -2); aten_arange_start_step = None aten_arange_start_step_1 = executorch_exir_dialects_edge__ops_aten_arange_start_step(0, 8, layout = torch.strided, device = device(type='cpu'), pin_memory = False) aten_unsqueeze_copy_default_1 = executorch_exir_dialects_edge__ops_aten_unsqueeze_copy_default(aten_arange_start_step_1, -1); aten_arange_start_step_1 = None aten_sub_tensor = executorch_exir_dialects_edge__ops_aten_sub_Tensor(aten_unsqueeze_copy_default, aten_unsqueeze_copy_default_1); aten_unsqueeze_copy_default = aten_unsqueeze_copy_default_1 = None aten_le_scalar = executorch_exir_dialects_edge__ops_aten_le_Scalar(aten_sub_tensor, 0); aten_sub_tensor = None aten_logical_and_default = executorch_exir_dialects_edge__ops_aten_logical_and_default(aten_le_scalar, aten_full_default); aten_le_scalar = aten_full_default = None aten_full_like_default = executorch_exir_dialects_edge__ops_aten_full_like_default(aten_logical_and_default, 0, dtype = torch.float32, pin_memory = False, memory_format = torch.preserve_format) aten_logical_not_default = executorch_exir_dialects_edge__ops_aten_logical_not_default(aten_logical_and_default); aten_logical_and_default = None aten_scalar_tensor_default = executorch_exir_dialects_edge__ops_aten_scalar_tensor_default(-inf, dtype = torch.float32, layout = torch.strided, device = device(type='cpu')) aten_where_self = executorch_exir_dialects_edge__ops_aten_where_self(aten_logical_not_default, aten_scalar_tensor_default, aten_full_like_default); aten_logical_not_default = aten_scalar_tensor_default = aten_full_like_default = None aten_permute_copy_default = executorch_exir_dialects_edge__ops_aten_permute_copy_default(k, [0, 1, 3, 2]); k = None aten_mul_scalar_1 = executorch_exir_dialects_edge__ops_aten_mul_Scalar(aten_permute_copy_default, 0.5946035575013605); aten_permute_copy_default = None aten_expand_copy_default = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten_mul_scalar, [1, 1, 8, 8]); aten_mul_scalar = None aten_view_copy_default = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default, [1, 8, 8]); aten_expand_copy_default = None aten_expand_copy_default_1 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten_mul_scalar_1, [1, 1, 8, 8]); aten_mul_scalar_1 = None aten_view_copy_default_1 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_1, [1, 8, 8]); aten_expand_copy_default_1 = None aten_bmm_default = executorch_exir_dialects_edge__ops_aten_bmm_default(aten_view_copy_default, aten_view_copy_default_1); aten_view_copy_default = aten_view_copy_default_1 = None aten_view_copy_default_2 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_bmm_default, [1, 1, 8, 8]); aten_bmm_default = None aten_add_tensor = executorch_exir_dialects_edge__ops_aten_add_Tensor(aten_view_copy_default_2, aten_where_self); aten_view_copy_default_2 = aten_where_self = None aten__softmax_default = executorch_exir_dialects_edge__ops_aten__softmax_default(aten_add_tensor, -1, False); aten_add_tensor = None aten_expand_copy_default_2 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten__softmax_default, [1, 1, 8, 8]); aten__softmax_default = None aten_view_copy_default_3 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_2, [1, 8, 8]); aten_expand_copy_default_2 = None aten_expand_copy_default_3 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(v, [1, 1, 8, 8]); v = None aten_view_copy_default_4 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_3, [1, 8, 8]); aten_expand_copy_default_3 = None aten_bmm_default_1 = executorch_exir_dialects_edge__ops_aten_bmm_default(aten_view_copy_default_3, aten_view_copy_default_4); aten_view_copy_default_3 = aten_view_copy_default_4 = None aten_view_copy_default_5 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_bmm_default_1, [1, 1, 8, 8]); aten_bmm_default_1 = None return (aten_view_copy_default_5,) ``` After applying the diff, we remove the following ops ``` %aten_full_like_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.full_like.default](args = (%aten_index_tensor_2, 0), kwargs = {dtype: torch.float32, pin_memory: False, memory_format: torch.preserve_format}) %aten_logical_not_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.logical_not.default](args = (%aten_index_tensor_2,), kwargs = {}) %aten_scalar_tensor_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.scalar_tensor.default](args = (-inf,), kwargs = {dtype: torch.float32, layout: torch.strided, device: cpu}) %aten_where_self : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.where.self](args = (%aten_logical_not_default, %aten_scalar_tensor_default, %aten_full_like_default), kwargs = {}) %aten_mul_scalar : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Scalar](args = (%aten_permute_copy_default_3, 0.5946035575013605), kwargs = {}) ... %aten_mul_scalar_1 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Scalar](args = (%aten_permute_copy_default_6, 0.5946035575013605), kwargs = {}) ``` but introduce an add %aten_add_tensor_3 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.add.Tensor](args = (%aten_mul_tensor_11, %aten_index_tensor_2), kwargs = {}) ``` ghstack-source-id: 223152096 exported-using-ghexport Reviewed By: mergennachin, kimishpatel Differential Revision: D56119737 fbshipit-source-id: ec8e875f0a4c4ec67b7493e4872c9a5b081e6de7
Configuration menu - View commit details
-
Copy full SHA for cf78107 - Browse repository at this point
Copy the full SHA cf78107View commit details -
Fix quantized embedding export logic (#3095)
Summary: Add patches to make 4bit quantized embedding work for export. Fixed: * Schema mismatch between functional embedding_4bit and out variant * Set `packed=True` for 4bit quantization Pull Request resolved: #3095 Reviewed By: mikekgfb Differential Revision: D56340670 Pulled By: larryliu0820 fbshipit-source-id: c98623a9b7633fc5a6c390be1557213c719fa95a
Configuration menu - View commit details
-
Copy full SHA for 2c467dd - Browse repository at this point
Copy the full SHA 2c467ddView commit details -
Comply llama2 runner with gcc 11.4 (#3140)
Summary: Pull Request resolved: #3140 This seems like a simple change so that it can compile with gcc 11.4 bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: digantdesai Differential Revision: D56320381 fbshipit-source-id: 577a60bac78ed01ad450fcb58dbccc7f04fd5067
Configuration menu - View commit details
-
Copy full SHA for 74dba6e - Browse repository at this point
Copy the full SHA 74dba6eView commit details -
qnn end to end flow for stories model (#3038)
Summary: Pull Request resolved: #3038 Patch a few changes including: - support bool tensor type - support fp16 and fix the 8w8a quantization. - add two non-supported ops (slice_scatter and index_put) in common_defs.py stories model working end to end: AOT: fp16: ``` python -m examples.models.llama2.export_llama -kv --qnn -c stories110M.pt -p params.json ``` quantize: ``` python -m examples.models.llama2.export_llama -kv --qnn --pt2e_quantize qnn_8a8w -c stories110M.pt -p params.json ``` Runtime: ``` /llama_main --model_path=llama2_fp16_qnn_2.21.pte --tokenizer_path=tokenizer.bin --prompt="Once" ``` Output: ``` Once upon a time, there was a little girl named Lily. She loved to play outside and explore the world around her. One day, she went on a walk with her mommy and they found a beautiful landscape with lots of trees and flowers. Lily said, "Mommy, this place is so pretty! Can we take a picture?" Mommy replied, "Of course, Lily! Let's take a picture to remember the original place we found." After they took the picture, they continued their walk and saw a bird flying in the sky. Lily said, "MomPyTorchObserver {"prompt_tokens":2,"generated_tokens":125,"model_load_start_ms":1713226585936,"model_load_end_ms":1713226586909,"inference_start_ms":1713226586909,"inference_end_ms":1713226590363,"prompt_eval_end_ms":1713226586966,"first_token_ms":1713226586994,"aggregate_sampling_time_ms":23,"SCALING_FACTOR_UNITS_PER_SECOND":1000} I 00:00:04.436699 executorch:runner.cpp:414] Prompt Tokens: 2 Generated Tokens: 125 I 00:00:04.436703 executorch:runner.cpp:420] Model Load Time: 0.973000 (seconds) I 00:00:04.436732 executorch:runner.cpp:430] Total inference time: 3.454000 (seconds) Rate: 36.189925 (tokens/second) I 00:00:04.436735 executorch:runner.cpp:438] Prompt evaluation: 0.057000 (seconds) Rate: 35.087719 (tokens/second) I 00:00:04.436739 executorch:runner.cpp:449] Generated 125 tokens: 3.397000 (seconds) Rate: 36.797174 (tokens/second) I 00:00:04.436742 executorch:runner.cpp:457] Time to first generated token: 0.085000 (seconds) I 00:00:04.436744 executorch:runner.cpp:464] Sampling time over 127 tokens: 0.023000 (seconds) [INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters [INFO] [Qnn ExecuTorch]: Destroy Qnn context ``` Stories model is too small and sensitive to qunatization. ghstack-source-id: 223199545 exported-using-ghexport Reviewed By: mergennachin, kirklandsign Differential Revision: D56119738 fbshipit-source-id: daf5563fe51a677f302e09ae8a9fb80e6bda72c5
Configuration menu - View commit details
-
Copy full SHA for 3257c66 - Browse repository at this point
Copy the full SHA 3257c66View commit details -
Instructions for Llama3 (#3154)
Summary: Pull Request resolved: #3154 All the steps until validating on desktop. Reviewed By: iseeyuan Differential Revision: D56358723 fbshipit-source-id: 32d246882d9609840932a7da22c2e3dbf015c0a8
Configuration menu - View commit details
-
Copy full SHA for ceae80a - Browse repository at this point
Copy the full SHA ceae80aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 269b6ad - Browse repository at this point
Copy the full SHA 269b6adView commit details -
Add link to llama3 README file (#3156)
Summary: Pull Request resolved: #3156 bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: kirklandsign Differential Revision: D56362041 fbshipit-source-id: 472dd9864a26f2b8744673163a8cd2cea58cc8e7
Configuration menu - View commit details
-
Copy full SHA for fa433cb - Browse repository at this point
Copy the full SHA fa433cbView commit details -
make op_split_with_sizes_copy support dynamic shape (#3152)
Summary: Pull Request resolved: #3152 as title Reviewed By: SS-JIA Differential Revision: D56333587 fbshipit-source-id: deecbb2a394257dc146dd1af50cc0e7158ac79ed
Configuration menu - View commit details
-
Copy full SHA for bd07c75 - Browse repository at this point
Copy the full SHA bd07c75View commit details -
Call destructor explicitly when move constructing
Value
(#3148)Summary: Pull Request resolved: #3148 ## Context Inspecting code for ATen and ExecuTorch's `Value` classes (e.g. `IValue` and `EValue` respectively) I noticed that the destructor is called [explicitly when move constructing with non-trivial types](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/core/ivalue.h#L409). In practice I don't think calling the destructor explicitly is necessary because move constructing typically sets the moved from object to an inactive state, but since we use `Value` to encapsulate STL types (i.e. types for which we do not implement the destructor) it's best to call the destructor explicitly to be safe. ghstack-source-id: 223225898 exported-using-ghexport Reviewed By: jorgep31415 Differential Revision: D56357187 fbshipit-source-id: 4797a627efcd2a61ee35d4c6963e524b4161ff3b
Configuration menu - View commit details
-
Copy full SHA for 825db6c - Browse repository at this point
Copy the full SHA 825db6cView commit details -
Clean up api::vTensor class (#3149)
Summary: Pull Request resolved: #3149 ## Context Now that we have forked the `api/` directory from PyTorch Vulkan, we can clean up the `vTensor` class and remove functionality that is not necessary for the ExecuTorch Vulkan delegate. The following changes are made: * Remove unused member variables and member functions from `vTensor` and `vTensorStorage` * Remove all quantization related member variables, member functions, and the `vTensor` constructor for quantized tensors. The Quantization API will be reworked from the ground up. * Rename `view_` (which is an instance of `vTensorStorage`) to `storage_` Finally, the critical change that is introduced is that we now store `storage_` as a direct `vTensorStorage` member variable in `vTensor` instead of storing it as a `std::shared_ptr<vTensorStorage>`. For context, the reason `storage_` was stored as a shared pointer is to be compliant with ATen Tensors, which needs to enable copy construction to enable the following: ``` at::Tensor b = at::rand(...); // Oftentimes this will create a "view" of the tensor. a and b will point the the same underlying storage, but with different metadata. at::Tensor a = b; ``` However, in the ExecuTorch delegate this is no longer necessary. Each Tensor is associated with it's own independent storage and is responsible for managing it's own memory. **By getting rid of `std::shared_ptr`, we can avoid a heap allocation and avoid chasing pointers whenever we need to access the resources of a `vTensor`.** ghstack-source-id: 223225901 exported-using-ghexport Reviewed By: jorgep31415 Differential Revision: D55811279 fbshipit-source-id: 95c0ecc9658ef9bc64ecee9e5c9e272da12786b8
Configuration menu - View commit details
-
Copy full SHA for bf5093a - Browse repository at this point
Copy the full SHA bf5093aView commit details -
Introduce
ParamsBindList
to prevent needing to passshared_ptr
to…… bind parameter UBOs (#3150) Summary: Pull Request resolved: #3150 ## Context In keeping with the below changeset in this stack, this diff introduces the `ParamsBindList` structure to avoid storing shared pointers to `api::UniformParamsBuffer` objects in `ExecuteNode` and `PrepackNode`. The idea is to store the binding information of each UPB instead of taking ownership of the UPB itself. There isn't really a need for `ExecuteNode` and `PrepackNode` to take ownership since `ComputeGraph` provides a guarantee that the UPBs will be in scope at the time of binding. With this change, all `shared_ptr` members can be eliminated from `vTensor`, further reducing heap allocations and pointer chasing. In the future I will change `prepack_nodes_` and `execute_nodes_` to store `PrepackNode` and `ExecuteNode` instances directly instead of storing unique pointers to them. ghstack-source-id: 223225899 exported-using-ghexport Reviewed By: jorgep31415 Differential Revision: D56357188 fbshipit-source-id: 5f4d1be900711753aa2cc035c044fe71f93d555b
Configuration menu - View commit details
-
Copy full SHA for db17853 - Browse repository at this point
Copy the full SHA db17853View commit details -
Rename tokenizer file in Xcode. (#3160)
Summary: Pull Request resolved: #3160 . Reviewed By: kirklandsign Differential Revision: D56363030 fbshipit-source-id: 489a7d4a32ca3b3d020d2639d9c14b330ce01d86
Configuration menu - View commit details
-
Copy full SHA for 3ef9d2c - Browse repository at this point
Copy the full SHA 3ef9d2cView commit details -
Adding .model tokenizer to selection (#3163)
Summary: Pull Request resolved: #3163 We should allow both .bin and .model for tokenizer Reviewed By: shoumikhin Differential Revision: D56365079 fbshipit-source-id: 9b59d15b0b16ffd5a091d3deadacec0771547f77
Configuration menu - View commit details
-
Copy full SHA for 0800594 - Browse repository at this point
Copy the full SHA 0800594View commit details -
Docs for lower smaller models to mps/coreml/qnn (#3146)
Summary: Pull Request resolved: #3146 ghstack-source-id: 223235858 Reviewed By: mcr229, kirklandsign Differential Revision: D56340028 fbshipit-source-id: ef06142546ac54105ae87007cd82369917a22b3e
Configuration menu - View commit details
-
Copy full SHA for d47f9fe - Browse repository at this point
Copy the full SHA d47f9feView commit details -
Add missing ops for RNNT predictor (#3125)
Summary: Pull Request resolved: #3125 As titled. Permute and quantized_layer_norm were not registered properly. Reviewed By: tarun292 Differential Revision: D56305088 fbshipit-source-id: 0ceee7b3404ba95c1e758b6daf3a5b3a16f85662
Configuration menu - View commit details
-
Copy full SHA for 023ca07 - Browse repository at this point
Copy the full SHA 023ca07View commit details -
Summary: Pull Request resolved: #3168 Reviewed By: digantdesai, shoumikhin, mikekgfb Differential Revision: D56367151 fbshipit-source-id: a502e55abf41419c0b1775d0b2ec6ab170fb6299
Configuration menu - View commit details
-
Copy full SHA for 4ea0473 - Browse repository at this point
Copy the full SHA 4ea0473View commit details -
Slice, with lots of codegen improvements (#3171)
Summary: Pull Request resolved: #3171 1. Add slice operation. Instead of using copy in LI, we implement a simple shader with offsets. 2. Improvement in codegen. - add support of optional variables - improve indent of the code, for better readability - allow user to specify tensor value generation, possible to generate sequential values for easier debugging for index operations - sample code improve test-case specification, particularly with long and optional values. ghstack-source-id: 223254861 Reviewed By: SS-JIA, jorgep31415 Differential Revision: D56295985 fbshipit-source-id: f351dee25a72795d2ba768cb0bc33a467df64d8f
Configuration menu - View commit details
-
Copy full SHA for 7469a28 - Browse repository at this point
Copy the full SHA 7469a28View commit details -
Summary: Pull Request resolved: #3162 Reviewed By: kirklandsign Differential Revision: D56365716 Pulled By: lucylq fbshipit-source-id: 707c5b869df128cc7e669fc0d78ca185f1c68f31
Configuration menu - View commit details
-
Copy full SHA for c8b43d2 - Browse repository at this point
Copy the full SHA c8b43d2View commit details -
Update model arg name rope_theta to be consistent with those in llama…
Configuration menu - View commit details
-
Copy full SHA for 70baafe - Browse repository at this point
Copy the full SHA 70baafeView commit details
Commits on Apr 20, 2024
-
Summary: We follow D50914117 to implement a specific case of conv1d for our needs. Specifically, we require - the input tensor to have a single batch - groups == in_channels == out_channels - weight_sizes.at(1) == 1 - stride == 1 - padding == 0 - dilation == 1 We assume `bias==True`. The `bias==False` case in handled in the next diff. General cases and optimizations will be enabled later. Reviewed By: jorgep31415 Differential Revision: D56220143 fbshipit-source-id: a18de3a463875b9617cb7930febf7622fe866536
Configuration menu - View commit details
-
Copy full SHA for 1d467d0 - Browse repository at this point
Copy the full SHA 1d467d0View commit details -
Qualcomm AI Engine Direct - Enable SSD300_VGG16 (#3010)
Summary: - Enable SSD300_VGG16 - Adding new OPs: sqrt, sum_intList - Add test cases for SSD300 and new OPs - Repository for SSD300_VGG16: https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection Pull Request resolved: #3010 Reviewed By: kirklandsign Differential Revision: D56280698 Pulled By: cccclai fbshipit-source-id: 3de0a3e0c705fd2765401d61577c6e10b4eddb39
Configuration menu - View commit details
-
Copy full SHA for 7b1f10d - Browse repository at this point
Copy the full SHA 7b1f10dView commit details -
Summary: Under the same setting as the last diff, we support `bias=false`. Reviewed By: jorgep31415 Differential Revision: D56285842 fbshipit-source-id: 41636d19d2cd7db07ba924606c9cd33999cffdab
Configuration menu - View commit details
-
Copy full SHA for 87eb155 - Browse repository at this point
Copy the full SHA 87eb155View commit details -
Switch to a dedicated brach for prebuilt packages. (#3184)
Summary: Pull Request resolved: #3184 . Reviewed By: kirklandsign Differential Revision: D56383903 fbshipit-source-id: 88bce7f7b4987a5cc8a649480af09a0e1cac90ee
Configuration menu - View commit details
-
Copy full SHA for 36453fc - Browse repository at this point
Copy the full SHA 36453fcView commit details
Commits on Apr 21, 2024
-
Use "latest" as the version for prebuilt frameworks. (#3161)
Summary: Pull Request resolved: #3161 . Reviewed By: kirklandsign Differential Revision: D56363475 fbshipit-source-id: f2a56e7baef600ac45793878520d2bf2cbe6bfe7
Configuration menu - View commit details
-
Copy full SHA for d89eabb - Browse repository at this point
Copy the full SHA d89eabbView commit details -
Deprecate
gpu_sizes_ubo()
andextents()
; also toggle packing layo……ut via specialization constants (#3181) Summary: Pull Request resolved: #3181 ## Context This changeset cleans up how shaders consume tensor metadata in two ways: ### Pass in Packing Layout via Specialization Shader The packing layout of a tensor determines how to convert between tensor indices and physical texture coordinates. Currently, the packing layout is determined by generating a completely new variant of a shader. However, this is rather expensive for build size. Specialization constants support was added a while back, which enables packing layout to be communicated to the shader via a specialization constant. This is a much better and natural way for shaders to determine the packing layout of its tensors and vary its behaviour. The primary benefit of this is that we can vastly reduce the number of variants that are generated. Generating shader variants for combinations of dtypes and memory layouts can lead to combinatorial explosion of build size. Note that dtype cannot be passed as a specialization constant since it impacts the types used in the layout portion of a shader. ### Deprecate GPU sizes and Extents Currently there are 3 representations of the tensor's sizes; `cpu_sizes()`, `gpu_sizes()`, and `extents()`. The GPU sizes is a simple modification of the CPU sizes where the packed dim is aligned to the next multiple of 4. Extents represents the physical extents of the image texture used to store the image. However, often times shaders need to reference the original sizes of the tensor so we end up passing two different representations of the tensor sizes. The CPU sizes and extents is used to determine out of bounds elements and the GPU sizes is used to convert between logical tensor indices and physical texture coordinates. Since the GPU sizes and extents are easily determined from the CPU sizes given the packing layout, deprecate GPU sizes and use CPU sizes exclusively as the canonical tensor sizes. Hence `cpu_sizes()` is renamed to simple `sizes()`. The primary benefit of this change is such: 1. Less confusion over how to reference the tensor sizes 2. Fewer descriptors to bind when constructing compute pipelines 3. Fewer uniform buffers to update when resizing tensors between inferences. ghstack-source-id: 223317313 Reviewed By: yipjustin Differential Revision: D56377775 fbshipit-source-id: 31235fbdf0b694e24b8c6fc0b40c56ddb818439d
Configuration menu - View commit details
-
Copy full SHA for c350e58 - Browse repository at this point
Copy the full SHA c350e58View commit details -
Specify OSX deployment target for python package. (#3193)
Summary: Pull Request resolved: #3193 . Reviewed By: mikekgfb Differential Revision: D56403324 fbshipit-source-id: 07b29b0b12a8995bce4d45ea9308a5b3c566d7e6
Configuration menu - View commit details
-
Copy full SHA for 7c74010 - Browse repository at this point
Copy the full SHA 7c74010View commit details
Commits on Apr 22, 2024
-
Specify OSX deployment target for python package. (#3194)
Summary: Pull Request resolved: #3194 overriding_review_checks_triggers_an_audit_and_retroactive_review Oncall Short Name: executorch Differential Revision: D56405473 fbshipit-source-id: 785709e8acc1b07e57825b278c3e0a355641e13a
Configuration menu - View commit details
-
Copy full SHA for a7a9ab3 - Browse repository at this point
Copy the full SHA a7a9ab3View commit details -
Summary: Pull Request resolved: #3195 overriding_review_checks_triggers_an_audit_and_retroactive_review Oncall Short Name: executorch Differential Revision: D56405764 fbshipit-source-id: 284f54c9aabdebb070edf7d6931b43260af8ad24
Configuration menu - View commit details
-
Copy full SHA for ebc38b2 - Browse repository at this point
Copy the full SHA ebc38b2View commit details -
support emit sym value from delegate (#3103)
Summary: Pull Request resolved: #3103 For dynamic shape, if delegate output is dynamic shape, the return might be something like `(s0, x, y)`, and `s0` is a sym type while others are fake tensor. In this case, we will emit the sym value (including `SymFloat`, `SymBool`, `SymInt`) to a unique Evalue. Since the sym type node will have an empty spec, we use the `node.meta['val']` to find out it's a sym type node. Reviewed By: mcr229 Differential Revision: D56176100 fbshipit-source-id: a4ddc7225ed014c59ceb9fa8ba4a9cb394af00e5
Configuration menu - View commit details
-
Copy full SHA for 73599f4 - Browse repository at this point
Copy the full SHA 73599f4View commit details -
Update Xcode project to build tiktoken tokenizer for LLaMA 3. (#3197)
Summary: Pull Request resolved: #3197 . Reviewed By: mikekgfb Differential Revision: D56408302 fbshipit-source-id: 93b14fbbc70cde4ebaaab0084a78d7bd3b3e4b4a
Configuration menu - View commit details
-
Copy full SHA for 8dc54d5 - Browse repository at this point
Copy the full SHA 8dc54d5View commit details -
Add quantized ops to pybindings (#3206)
Summary: Pull Request resolved: #3206 Test Plan: Imported from GitHub, without a `Test Plan:` line. Test with eval, run pte file through pybindings that uses quantized embeddings ``` python3 -m examples.models.llama2.eval_llama --pte ../pte_files/llama3/llama3_x_int4_128_kv_sdpa_qe4_32.pte -p ../llama-models/llama3/params_april18.json -t ../llama-models/llama3/tokenizer_april18.model --max_seq_len 127 --limit 5 ``` Reviewed By: larryliu0820 Differential Revision: D56426846 Pulled By: lucylq fbshipit-source-id: ced9feaf043cf7beec94a08a109e9709864f15a2
Configuration menu - View commit details
-
Copy full SHA for d24af2b - Browse repository at this point
Copy the full SHA d24af2bView commit details -
Add memory and vector include in managed_tensor.h (#3201)
Summary: Pull Request resolved: #3201 In order to get rid of this patch https://github.com/pytorch/torchchat/blob/main/scripts/install_et.sh#L35-L36 We upstream the changes into ExecuTorch. Reviewed By: lucylq Differential Revision: D56424633 fbshipit-source-id: 72e6675b467416753b0fd995d8e514396eef8331
Configuration menu - View commit details
-
Copy full SHA for 90d0c1a - Browse repository at this point
Copy the full SHA 90d0c1aView commit details -
Summary: Refactor the hell out of export_llama_lib.py. All quantizer logic goes into `lib/quant_lib.py`. All partitioner logic goes into `lib/partitioner_lib.py`. All source transformation logic goes into `source_transformation/`. Reviewed By: iseeyuan, cccclai Differential Revision: D56372411 fbshipit-source-id: bfdf842980c7271aebaadfc445272fa4ca96f0d8
Configuration menu - View commit details
-
Copy full SHA for 67123b6 - Browse repository at this point
Copy the full SHA 67123b6View commit details -
Update setup.sh for tokenizer selection (#3207)
Summary: For LLAMA3, users need to use tiktoken. Add a option to load from env var. Pull Request resolved: #3207 Reviewed By: cccclai Differential Revision: D56430637 Pulled By: kirklandsign fbshipit-source-id: cc1cc50100d6142510a455ca29d56a810942f90b
Configuration menu - View commit details
-
Copy full SHA for 1a93dee - Browse repository at this point
Copy the full SHA 1a93deeView commit details -
Qualcomm AI Engine Direct - Fixed uint16 tensor and linear op (#3196)
Summary: - Fixed uint16 data type of tensor Pull Request resolved: #3196 Reviewed By: kirklandsign Differential Revision: D56431363 Pulled By: cccclai fbshipit-source-id: 42d763a18f7288c3ec0f233fcc52dde1476895bd
Configuration menu - View commit details
-
Copy full SHA for 3bb591c - Browse repository at this point
Copy the full SHA 3bb591cView commit details -
Add a pure python wrapper to pybindings.portable_lib (#3137)
Summary: Pull Request resolved: #3137 When installed as a pip wheel, we must import `torch` before trying to import the pybindings shared library extension. This will load libtorch.so and related libs, ensuring that the pybindings lib can resolve those runtime dependencies. So, add a pure python wrapper that lets us do this when users say `import executorch.extension.pybindings.portable_lib` We only need this for OSS, so don't bother doing this for other pybindings targets. Reviewed By: orionr, mikekgfb Differential Revision: D56317150 fbshipit-source-id: 920382636732aa276c25a76163afb7d28b1846d0
Configuration menu - View commit details
-
Copy full SHA for 969aa96 - Browse repository at this point
Copy the full SHA 969aa96View commit details -
Remove unused extension/aot_util directory (#3216)
Summary: The AOT util extension was removed a while back, but the directory and README still exist. This PR cleans them up. Note that the aot_util sources were deleted previously, so this is not a functional change. Pull Request resolved: #3216 Test Plan: CI. This is not a functional change, as it changes only a README file. Reviewed By: metascroy Differential Revision: D56436216 Pulled By: GregoryComer fbshipit-source-id: 2f8b8cee20b7a3efb25a1ef1df3ebd69f3b512c9
Configuration menu - View commit details
-
Copy full SHA for 67f3376 - Browse repository at this point
Copy the full SHA 67f3376View commit details -
Create dependabot rule to upgrade TorchFix version (#3208)
Summary: The parameters are from From https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file ### Testing On my own fork https://github.com/huydhn/executorch/blob/main/.github/dependabot.yml, and the PR to upgrade TorchFix is created successfully huydhn#2 Pull Request resolved: #3208 Reviewed By: kit1980 Differential Revision: D56428297 Pulled By: huydhn fbshipit-source-id: 8b4f9d638d208fe6f476efdf7667058b2d2ae2fc
Configuration menu - View commit details
-
Copy full SHA for dbf90c2 - Browse repository at this point
Copy the full SHA dbf90c2View commit details -
Bring back
extents_ubo()
astexture_limits_ubo()
(#3217)Summary: Pull Request resolved: #3217 ## Context #3181 deprecated the `gpu_sizes_ubo()` and `extents_ubo()` functions of `vTensor` in order to standardize how shaders consume shape/size metadata of input tensors. However, this came at the cost of increasing the overhead required for bounds checking, which is needed to support dynamic shapes as shaders now needed to convert the input sizes to texture limits before checking if a given texel position is out of bounds. Benchmarking revealed that this overhead can be quite significant especially on lower power mobile GPUs. In the interest of preserving performance, `extents_ubo()` is re-introduced since bounds checking is an operation that is common to every single shader. However, some improvements are made: * instead of `extents`, the nomenclature `texture_limits` is used in order to differentiate from physical image extents of the texture. * `texture_limits` is represented via an `ivec3` (previously `uvec4`); this means that to use it for bounds checking, there does not need to be an implicit cast to from `uvec` to `ivec` and there is also no need for swizzling. Also introduced in this changeset is the convention of passing both the texture limits and tensor sizes instead of using `pos_out_of_bounds()`. Passing in the texture limits is probably cheaper than using `pos_out_of_bounds()`. There are some exceptions though where I choose not to migrate to this pattern to avoid passing in too many variants of tensor metadata. ### What about `gpu_sizes_ubo`? I will hold off on re-introducing `gpu_sizes_ubo` for now since converting `sizes` to `gpu_sizes` is much cheaper compared to `pos_out_of_bounds()`: ``` ivec4 sizes[packed_dim] = alignup4(sizes[packed_dim]) ``` Will perform some additional benchmarking on this to see if the overhead of the alignment warrants an explicit API for passing in GPU sizes to shaders. ghstack-source-id: 223453651 exported-using-ghexport Reviewed By: yipjustin, jorgep31415 Differential Revision: D56435574 fbshipit-source-id: 656f79eecbfc7c77cbe067df6c9ea54c51c50633
Configuration menu - View commit details
-
Copy full SHA for 9769386 - Browse repository at this point
Copy the full SHA 9769386View commit details -
backout the schema definition change (#3213)
Summary: Pull Request resolved: #3213 The schame was changed to avoid double register, but it was hiding the symptons by using a differnt schema. Resume the correct the schema Reviewed By: larryliu0820 Differential Revision: D56432559 fbshipit-source-id: d9d0a92a6c6fa04857ea01916647eb46ed658849
Configuration menu - View commit details
-
Copy full SHA for 9d2af4c - Browse repository at this point
Copy the full SHA 9d2af4cView commit details
Commits on Apr 23, 2024
-
Update some SDK docs from MVP (#3212)
Summary: Pull Request resolved: #3212 doc changes including 1. Remove instruction for Buck because we're moving away from it and just use CMake now and future; 2. Remove Coming soon for the realized feature; 3. Formatting. Reviewed By: Jack-Khuu Differential Revision: D56433016 fbshipit-source-id: fffa283b4a04438866d84765a65377dcf8a88837
Configuration menu - View commit details
-
Copy full SHA for b41f763 - Browse repository at this point
Copy the full SHA b41f763View commit details -
Summary: It's requested by torchchat to have a newer version of torch nightly. Bump it from 4/15 to 4/21. Pull Request resolved: #3199 Reviewed By: malfet Differential Revision: D56420105 Pulled By: iseeyuan fbshipit-source-id: 3d2a9b0f8dbb48f0a81c7cdef8e419206b036faf
Configuration menu - View commit details
-
Copy full SHA for 03c7a99 - Browse repository at this point
Copy the full SHA 03c7a99View commit details -
Summary: Pull Request resolved: #3228 Fix a UI thread issue causing crash. Reviewed By: cccclai Differential Revision: D56447006 fbshipit-source-id: 02eff27d4b4cd108c95b664d04679d4f92aaf5db
Configuration menu - View commit details
-
Copy full SHA for 4389442 - Browse repository at this point
Copy the full SHA 4389442View commit details -
Fix executor_runner_mps and mpsdelegate linking with pybind (#3222)
Summary: Summary of changes: - fixes mps_executor_runner build - previously it would fail to build previously due to incorrect linking with portable ops - fixes `mpsdelegate` linking with `pybind` lib - added tests to check correctness directly through pybind - added a helper file (`bench_utils.py`) to help measure models forward pass between PyTorch MPS and ExecuTorch MPS Testing (will run both AOT and runtime if MPS was built with pybind): - `./install_requirements.sh --pybind mps` - invoke a single unit test: `python3 -m unittest backends.apple.mps.test.test_mps_indexing_ops -v -k test_mps_indexing_get_1`. - invoke all tests from a file: `python3 -m unittest backends.apple.mps.test.test_mps_indexing_ops -v` cc cccclai , shoumikhin Pull Request resolved: #3222 Reviewed By: shoumikhin Differential Revision: D56447888 Pulled By: cccclai fbshipit-source-id: 5cbbcbf8df34f29e23a1854df72f764337a9df76
Configuration menu - View commit details
-
Copy full SHA for 6c30eea - Browse repository at this point
Copy the full SHA 6c30eeaView commit details -
Update to transformers 4.38 (#3227)
Summary: To fix CVE-2024-3568 Pull Request resolved: #3227 Reviewed By: mikekgfb Differential Revision: D56447728 Pulled By: malfet fbshipit-source-id: 3758d9def101d58cead7bcae00cc91237abf42dd
Configuration menu - View commit details
-
Copy full SHA for aec2549 - Browse repository at this point
Copy the full SHA aec2549View commit details -
Update TorchNightly to 2024.04.22 (#3225)
Summary: Pull Request resolved: #3225 Reviewed By: larryliu0820 Differential Revision: D56447049 Pulled By: malfet fbshipit-source-id: 0e92827f9dead7422334abd84d3bd540cb87fb50
Configuration menu - View commit details
-
Copy full SHA for 9783697 - Browse repository at this point
Copy the full SHA 9783697View commit details -
Summary: Pull Request resolved: #3232 Reviewed By: iseeyuan Differential Revision: D56450983 Pulled By: kirklandsign fbshipit-source-id: 94103040321df55d6fb53a2971512fd1bdfd5ec8
Configuration menu - View commit details
-
Copy full SHA for 4668b5d - Browse repository at this point
Copy the full SHA 4668b5dView commit details -
strip symbol when linking (#3234)
Summary: Pull Request resolved: #3234 Refer to https://sourceware.org/binutils/docs/binutils/strip.html command to build for android ``` rm -rf cmake-android-out && mkdir cmake-android-out cmake -DBUCK2="$BUCK" \ -DCMAKE_INSTALL_PREFIX=cmake-android-out \ -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" \ -DANDROID_ABI="arm64-v8a" \ -DANDROID_PLATFORM=android-29 \ -DCMAKE_BUILD_TYPE=Release \ -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ -DEXECUTORCH_BUILD_CUSTOM=ON \ -DEXECUTORCH_BUILD_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_QUANTIZED=ON \ -DEXECUTORCH_BUILD_XNNPACK=ON \ -DEXECUTORCH_ENABLE_LOGGING=ON \ -Bcmake-android-out . cmake --build cmake-android-out -j16 --target install --config Release cmake -DBUCK2="$BUCK" \ -DCMAKE_INSTALL_PREFIX=cmake-android-out \ -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" \ -DANDROID_ABI="arm64-v8a" \ -DANDROID_PLATFORM=android-23 \ -DCMAKE_BUILD_TYPE=Release \ -DEXECUTORCH_BUILD_CUSTOM=ON \ -DEXECUTORCH_BUILD_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_XNNPACK=ON \ -DEXECUTORCH_ENABLE_LOGGING=ON \ -DEXECUTORCH_USE_TIKTOKEN=ON \ -Bcmake-android-out/${dir} \ ${dir} cmake --build cmake-android-out/${dir} -j16 --config Release ``` ``` (executorch) chenlai@chenlai-mbp executorch % du -sh cmake-android-out/examples/models/llama2/* 44K cmake-android-out/examples/models/llama2/CMakeCache.txt 2.2M cmake-android-out/examples/models/llama2/CMakeFiles 76K cmake-android-out/examples/models/llama2/Makefile 4.0K cmake-android-out/examples/models/llama2/cmake_install.cmake 4.0K cmake-android-out/examples/models/llama2/compile_commands.json 4.9M cmake-android-out/examples/models/llama2/custom_ops 736K cmake-android-out/examples/models/llama2/lib 54M cmake-android-out/examples/models/llama2/llama_main 16K cmake-android-out/examples/models/llama2/options-pinned.h 11M cmake-android-out/examples/models/llama2/runner 151M cmake-android-out/examples/models/llama2/third-party ``` Reviewed By: lucylq, kirklandsign Differential Revision: D56450794 fbshipit-source-id: 79e77732713708f3ced3801d11e30a9141075a76
Configuration menu - View commit details
-
Copy full SHA for d8e94b0 - Browse repository at this point
Copy the full SHA d8e94b0View commit details -
Summary: Pull Request resolved: #3235 It's "Release" not "RELEASE".... Reviewed By: lucylq Differential Revision: D56451118 fbshipit-source-id: 63702f6fb906b3bc0e8d79061a7f7f6e849ea162
Configuration menu - View commit details
-
Copy full SHA for 4342cf2 - Browse repository at this point
Copy the full SHA 4342cf2View commit details -
Bump torchfix from 0.1.1 to 0.5.0 (#3220)
Summary: Bumps [torchfix](https://github.com/pytorch-labs/torchfix) from 0.1.1 to 0.5.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pytorch-labs/torchfix/releases">torchfix's releases</a>.</em></p> <blockquote> <h2>TorchFix 0.5.0</h2> <ul> <li>Added rule TOR203 to replace 'import torchvision.models as models' with 'from torchvision import models'</li> <li>Added rules TOR104 and TOR105 for calling and importing non-public PyTorch functions that have known public aliases</li> <li>Added rules TOR004 and TOR103 for importing removed and deprecated functions (in addition to the existing rules for calling those functions)</li> <li>Fixed loading for deprecated symbols config in zipped deployments</li> <li>Done several smaller bug fixes and refactorings</li> </ul> <h2>TorchFix 0.4.0</h2> <ul> <li>Improvements for the standalone <code>torchfix</code> command: <ul> <li>Added <code>--version</code> flag</li> <li><code>--select</code> flag now accepts specific rules, not just <code>ALL</code></li> <li>Fixed excessive debug output on MacOS</li> </ul> </li> <li>Added PyTorch-internal rule TOR901</li> <li>TorchFix explicitly requires at least Python 3.9 now</li> <li>Small clean-ups and bugfixes</li> </ul> <h2>TorchFix 0.3.0</h2> <ul> <li>Added rule TOR003 about explicitly passing <code>use_reentrant</code> to <code>torch.utils.checkpoint</code></li> <li>Added <code>torch.nn.utils.weight_norm</code> to the list of deprecated functions flagged by TOR101</li> <li>Updated README with TOR0 rules description</li> </ul> <h2>TorchFix 0.2.1: first release for pytorch-labs/torchfix repo</h2> <p>This is the first release for pytorch-labs/torchfix repo, with the only differences from TorchFix 0.2.0 on PyPI are files related to repo maintenance and project metadata.</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/pytorch-labs/torchfix/commits/v0.5.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=torchfix&package-manager=pip&previous-version=0.1.1&new-version=0.5.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Pull Request resolved: #3220 Reviewed By: kit1980 Differential Revision: D56449277 Pulled By: huydhn fbshipit-source-id: ad3c86d49f86427c91af28063d5347b37b893e87
Configuration menu - View commit details
-
Copy full SHA for 0afb73d - Browse repository at this point
Copy the full SHA 0afb73dView commit details -
Summary: It is more stable to pin a release branch of CoreMLTools. We will periodically update it when necessary Pull Request resolved: #3170 Reviewed By: cccclai Differential Revision: D56373108 Pulled By: shoumikhin fbshipit-source-id: d6a96813f07df97abbf8f4ca75e2aae2666372b1
Configuration menu - View commit details
-
Copy full SHA for cb77763 - Browse repository at this point
Copy the full SHA cb77763View commit details -
Expand visibility of targets needed for executorch_llama2 kernel (#3174)
Summary: Pull Request resolved: #3174 See title Reviewed By: tarun292 Differential Revision: D56361946 fbshipit-source-id: 12d5d9cb3f265173696173073b6d2357dae0848a
Configuration menu - View commit details
-
Copy full SHA for 7b854b6 - Browse repository at this point
Copy the full SHA 7b854b6View commit details -
Support tensors in prim_getters (#3203)
Summary: Pull Request resolved: #3203 Adding support for tensors and tensor lists in prim getters Reviewed By: JacobSzwejbka Differential Revision: D56426044 fbshipit-source-id: 164e916bc7662d2864cee2a6d1cb06177311438d
Configuration menu - View commit details
-
Copy full SHA for 6c36f10 - Browse repository at this point
Copy the full SHA 6c36f10View commit details -
Enable doc upload for tags, disable for release branches (#3153)
Summary: - Disabled doc upload for branches like release/x.x - Enabled publishing for tags. Tested locally: ``` export GITHUB_REF=refs/tags/v3.1.4-rc5 bash test-version.py ``` ``` # test-version.py if [[ "${GITHUB_REF}" =~ ^refs/tags/v([0-9]+\.[0-9]+)\.* ]]; then TARGET_FOLDER="${BASH_REMATCH[1]}" else TARGET_FOLDER="main" fi echo "Target folder: ${TARGET_FOLDER}" ``` Output: ``` Target folder: 3.1 ``` One more: ``` export GITHUB_REF=refs/tags/v1.15.4 bash test-version.sh ``` Output: ``` Target folder: 1.15 ``` Pull Request resolved: #3153 Reviewed By: dbort Differential Revision: D56445037 Pulled By: svekars fbshipit-source-id: e7328523dfe308e8921c1e4f365d9a757d053191
Configuration menu - View commit details
-
Copy full SHA for ee8c3a6 - Browse repository at this point
Copy the full SHA ee8c3a6View commit details -
Update Core ML Backend Doc (#3188)
Summary: Update Core ML backend doc on: 1. Partitioner 2. Quantizer Pull Request resolved: #3188 Reviewed By: shoumikhin Differential Revision: D56481126 Pulled By: cccclai fbshipit-source-id: 925a107a210094e035a816a15c70d9aedd5bd369
Configuration menu - View commit details
-
Copy full SHA for c004efe - Browse repository at this point
Copy the full SHA c004efeView commit details -
bundled program alpha document (#3224)
Summary: Pull Request resolved: #3224 as title Reviewed By: tarun292, Jack-Khuu Differential Revision: D56446890 fbshipit-source-id: fc3dc6bb2349cd7ca4a8e998e528176dd9fb7679
Configuration menu - View commit details
-
Copy full SHA for 783e932 - Browse repository at this point
Copy the full SHA 783e932View commit details -
Fix a small inconsistency on the SDK debugging page (#3247)
Summary: Pull Request resolved: #3247 so that the code is consistent with the text description Reviewed By: dbort Differential Revision: D56481274 fbshipit-source-id: f303b966ebf3e07b510ef825c7bc09eaecd89554
Configuration menu - View commit details
-
Copy full SHA for ca8e589 - Browse repository at this point
Copy the full SHA ca8e589View commit details -
Summary: Pull Request resolved: #3242 Removed the use of capture_pre_autograd_graph in places where we are not quantizing, since we want to minimize the usage of this API for easier deprecation in the future. Reviewed By: mergennachin Differential Revision: D56475332 fbshipit-source-id: bd5cd4969f953d6d8e98ef7f04ad3d4a96bdacf1
Configuration menu - View commit details
-
Copy full SHA for ee28868 - Browse repository at this point
Copy the full SHA ee28868View commit details -
update sdk delegate integration (#3246)
Summary: Pull Request resolved: #3246 As title Reviewed By: tarun292 Differential Revision: D56479387 fbshipit-source-id: c324d2b46dc7f849dfb42b3452c6a82f24aa9319
Configuration menu - View commit details
-
Copy full SHA for cf487f1 - Browse repository at this point
Copy the full SHA cf487f1View commit details -
Add iPad support to demo apps. (#3251)
Summary: Pull Request resolved: #3251 . Reviewed By: cccclai Differential Revision: D56488666 fbshipit-source-id: d63a08b4abdf055607948229be88f0c7762948ab
Configuration menu - View commit details
-
Copy full SHA for 1eaed2b - Browse repository at this point
Copy the full SHA 1eaed2bView commit details -
Add more prebuilt artifacts (#3245)
Summary: Build for different ABI in prebuild. Pull Request resolved: #3245 Test Plan: CI Reviewed By: kirklandsign Differential Revision: D56480274 Pulled By: huydhn fbshipit-source-id: 451116a0f90745dd9f08ef32be3fe02940d6fbb1
Configuration menu - View commit details
-
Copy full SHA for 3b0f271 - Browse repository at this point
Copy the full SHA 3b0f271View commit details -
SDK tutorial doc update (#3238)
Summary: Pull Request resolved: #3238 fix some links, remove outdated commands Reviewed By: GregoryComer Differential Revision: D56453800 fbshipit-source-id: 8bd86a593f8c5b9342e61ab2d129473d315b57a8
Configuration menu - View commit details
-
Copy full SHA for f89c312 - Browse repository at this point
Copy the full SHA f89c312View commit details -
Summary: Pull Request resolved: #3223 We port jorgep31415's work of conv1d for lite interpreter into ET. The current implementation supports general batch_size, weight_size, stride, padding, dilation and groups. Reviewed By: jorgep31415 Differential Revision: D56380147 fbshipit-source-id: 62fdc2958d683590317aaec5be3d0366f6df42e4
Configuration menu - View commit details
-
Copy full SHA for 45fd796 - Browse repository at this point
Copy the full SHA 45fd796View commit details -
move code under executorch/example (#3176)
Summary: Pull Request resolved: #3176 This diff moves llm manual code from outside github (Dave's and Georgey's) to executorch codebase for better pointing to. After this diff. //executorch/examples/llm_maunal will become the only source of truth of our llm manual code. Reviewed By: byjlw, dbort Differential Revision: D56365058 fbshipit-source-id: 97280fc0ca955caabb6056cddbb72102ed711f2c
Configuration menu - View commit details
-
Copy full SHA for b6e54d0 - Browse repository at this point
Copy the full SHA b6e54d0View commit details -
update XNNPACK/README.md (#3236)
Summary: Pull Request resolved: #3236 Fixing the XNNPACK/README - Updated the file layout overview - Added end-to-end tutorial flow for quick starts - Added See more section linking to static docs Reviewed By: metascroy Differential Revision: D56431923 fbshipit-source-id: 4f3e35d85c27330ed46fe189351b3aa570c5aa43
Configuration menu - View commit details
-
Copy full SHA for 8748d57 - Browse repository at this point
Copy the full SHA 8748d57View commit details -
Update Profiling Section in XNNPACK Delegate Docs (#3237)
Summary: Pull Request resolved: #3237 Updating Profiling Section of the docs Main point is pointing the the SDK Profiling Tutorial on how to get XNNPACK profiling information Reviewed By: metascroy, cccclai Differential Revision: D56439491 fbshipit-source-id: 1d724ffae6d89e8769ea427cb37b4ec85fe3452f
Configuration menu - View commit details
-
Copy full SHA for 329184a - Browse repository at this point
Copy the full SHA 329184aView commit details
Commits on Apr 24, 2024
-
Add allocate_temp method to KernelRuntimeContext (#3209)
Summary: Pull Request resolved: #3209 This adds an `allocate_temp` method to KernelRuntimeContext, and passes the temporary memory allocator from `execute_instruction`. The method returns a result that errors if the temporary `MemoryAllocator` was not provided or the memory could not be allocated. Reviewed By: dbort Differential Revision: D56421957 fbshipit-source-id: 6da73bdb8e31638fc6d575e98cfc08c27b25f09c
Configuration menu - View commit details
-
Copy full SHA for 719b368 - Browse repository at this point
Copy the full SHA 719b368View commit details -
Summary: The old screenshot has outdated event block name and event names. New screenshot was taken from a recent real run. bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: tarun292, Jack-Khuu Differential Revision: D56447799 fbshipit-source-id: 040fe45311c9aa8e8a1a0f6756ebda5f0ebbdebf
Configuration menu - View commit details
-
Copy full SHA for 9c99fe1 - Browse repository at this point
Copy the full SHA 9c99fe1View commit details -
Summary: Pull Request resolved: #3260 As title, the link was wrong... Reviewed By: kirklandsign Differential Revision: D56498322 fbshipit-source-id: 42708b5f7a634f1c01e05af4c897d0c6da54d724
Configuration menu - View commit details
-
Copy full SHA for e9d7868 - Browse repository at this point
Copy the full SHA e9d7868View commit details -
Add index.Tensor and aten.logical_not (#3221)
Summary: Add missing llama ops for MPS delegate: - `index.Tensor` - `logical_not` `index.put` works correctly for generating 1 token, but gives incorrect results on 2nd token. This remains disabled. Summary of changes: - Adds missing llama2 ops - Adds support for launching Metal kernels instead of MPSGraph ops (if MPSGraph doesn't have the support) cc cccclai , shoumikhin Pull Request resolved: #3221 Reviewed By: shoumikhin Differential Revision: D56447710 Pulled By: cccclai fbshipit-source-id: 778a485df5e67d1afd006b42f07b69c8a3961223
Configuration menu - View commit details
-
Copy full SHA for 02a6b66 - Browse repository at this point
Copy the full SHA 02a6b66View commit details -
Fix broken links on the coreml tutorial page (#3250)
Summary: Pull Request resolved: #3250 Reviewed By: dbort Differential Revision: D56487125 fbshipit-source-id: 502019365de043a7e07bb0d766134b334ee115ba
Configuration menu - View commit details
-
Copy full SHA for ba0caf8 - Browse repository at this point
Copy the full SHA ba0caf8View commit details -
Fix compilation with gcc-9+ (#3262)
Summary: To fix `cannot resolve overloaded function ‘isinf’ based on conversion to type ‘torch::executor::FunctionRef<bool(double)>’` error Not sure how it ever worked before see https://godbolt.org/z/939YKdjqW Pull Request resolved: #3262 Reviewed By: kimishpatel, manuelcandales Differential Revision: D56501235 Pulled By: malfet fbshipit-source-id: 6f89beef9fd56a80ecbb2df573821da95b2da746
Configuration menu - View commit details
-
Copy full SHA for d98dc01 - Browse repository at this point
Copy the full SHA d98dc01View commit details -
Add delegate time scale converter to Inspector (#3240)
Summary: Pull Request resolved: #3240 The time scale of delegate events reported might be different from the timescale of CPU events. This diff adds support for providing a callable that can be invoked by Inspector to modify the timescale of delegated events to ensure consistency in timescales across delegated and non-delegated events. Reviewed By: Jack-Khuu Differential Revision: D55298701 fbshipit-source-id: e888e51b602c7e1ec8cb9e05ac052280daa12823
Configuration menu - View commit details
-
Copy full SHA for b7b40ac - Browse repository at this point
Copy the full SHA b7b40acView commit details -
Tie quantization of add operands and result together (#3091)
Summary: Change-Id: Ie2662ebd6555821fa1d813163daf4b209a319b44 Pull Request resolved: #3091 Reviewed By: mergennachin Differential Revision: D56476825 Pulled By: digantdesai fbshipit-source-id: 7f1e7d8ab9051c30c69189244ea927ed49440d93
Configuration menu - View commit details
-
Copy full SHA for 8b1f49a - Browse repository at this point
Copy the full SHA 8b1f49aView commit details -
Add semihosting to cmake for executor_runner (#3008)
Summary: Add cmake option to enable semihosting for the executor runner application. Change-Id: I5db7271413b39e5122f86f321d15dd2a1086a547 Pull Request resolved: #3008 Reviewed By: mergennachin Differential Revision: D56476642 Pulled By: digantdesai fbshipit-source-id: 5cc60da33d1999bb3e3baff2d57e196c65e4b819
Configuration menu - View commit details
-
Copy full SHA for 6712185 - Browse repository at this point
Copy the full SHA 6712185View commit details -
Capture output of Vela and print on error (#3057)
Summary: Change-Id: I0443a6ab26766a51511d9e4ea532fc8e76836ede Pull Request resolved: #3057 Reviewed By: mergennachin Differential Revision: D56476746 Pulled By: digantdesai fbshipit-source-id: 4b6d9738a9202980fa06bb8f4232fb4a916a7633
Configuration menu - View commit details
-
Copy full SHA for 2f5cbd4 - Browse repository at this point
Copy the full SHA 2f5cbd4View commit details -
Fix for TOSA BI clamp ops (#3092)
Summary: Min/max range values need to be on quantized form. Pull Request resolved: #3092 Reviewed By: mergennachin Differential Revision: D56476931 Pulled By: digantdesai fbshipit-source-id: 80fe1e4981c048653f808ef1ad9339997eb853a6
Configuration menu - View commit details
-
Copy full SHA for b0a400c - Browse repository at this point
Copy the full SHA b0a400cView commit details -
Summary: Pull Request resolved: #3254 Create a new page for the new util functions Chen and I made to debug delegations. These functions were well-received within the team as well as by partner teams including modai, thus I think it's important to call them out in our documentation. The examples were copied from the llm manual, but reworded a little bit to flow naturally in this doc. bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: cccclai Differential Revision: D56491214 fbshipit-source-id: 162b4ae75e79730218b0d669d1ec2a7a914b933c
Configuration menu - View commit details
-
Copy full SHA for bf9888f - Browse repository at this point
Copy the full SHA bf9888fView commit details -
update memory planning docs (#3270)
Summary: Pull Request resolved: #3270 Reviewed By: JacobSzwejbka Differential Revision: D56503511 Pulled By: lucylq fbshipit-source-id: d9e39f32adf39761652feaccdb73344b4550a094
Configuration menu - View commit details
-
Copy full SHA for de0c233 - Browse repository at this point
Copy the full SHA de0c233View commit details -
DynamicShim for dlsym user (#3136)
Summary: Add a shim layer so that users just need the header and load the symbol with dlsym. we will have two libraries: - header, where declarations and class (shim) are to be compiled with their codebase statically. Want to keep this minimal. - implementation, which pulls in the ET libraries and shim implementation. It’s compiled separately as a .so file and they can load and find symbols with dlopen and dlsym. Note that users only need to compile the header dynamic_shim.h into their code in compile time. dynamic_shim.h contains minimal dependency from ExecuTorch, so it won't impact static time binary size or startup time. The actual implementation dynamic_shim_impl is compiled into a separate shared library, which has all the ExecuTorch libraries. The shared library can be loaded later with dlopen. For users, they can now only load the so library, and just use dysym to look for exposed API `create_executorch_dynamic_shim` and `free_executorch_dynamic_shim`, and use API code in DynamicShim (as a pointer to an interface), and the DynamicShimImpl will invoke the actual ET Module code in its implementation details. Pull Request resolved: #3136 Reviewed By: kimishpatel Differential Revision: D55025594 Pulled By: kirklandsign fbshipit-source-id: a0b1fa90997dee920920e6f582dd51719c2958eb
Configuration menu - View commit details
-
Copy full SHA for b5bb921 - Browse repository at this point
Copy the full SHA b5bb921View commit details -
Summary: Pull Request resolved: #3172 Exploit the fact that, we reduce the unsqueeze operation to permute. ``` torch.all(torch.permute(x.unsqueeze(0), [1, 0, 2, 3]) == x.unsqueeze(1)) torch.all(torch.permute(x.unsqueeze(0), [1, 2, 0, 3]) == x.unsqueeze(2)) torch.all(torch.permute(x.unsqueeze(0), [1, 2, 3, 0]) == x.unsqueeze(3)) ``` This diff introduce a minor change to the Permute implementation that it no longer requires the input dimension length to match the length of the permute array. This allows the `unsqueeze` operation to achieve a no-op `unsqueeze(0)` and then apply a permute. ghstack-source-id: 223698863 Reviewed By: kimishpatel, SS-JIA Differential Revision: D56347734 fbshipit-source-id: 7decc88aa74b4f355fb9497798d304cf5c0d6db1
Configuration menu - View commit details
-
Copy full SHA for d053611 - Browse repository at this point
Copy the full SHA d053611View commit details -
Summary: Pull Request resolved: #3219 Introduce a clone node for copy operation. Also register `aten.clone` to this node. Important to note that during model export, possible to point the lvalue of `aten.clone` to the underlying shared object of the rvalue to achieve no-copy. ghstack-source-id: 223698862 Reviewed By: copyrightly, SS-JIA, jorgep31415 Differential Revision: D56441547 fbshipit-source-id: a6d05e37ca7a0a0f15e50355e4e2a90a1735a962
Configuration menu - View commit details
-
Copy full SHA for 2dac5f3 - Browse repository at this point
Copy the full SHA 2dac5f3View commit details -
add dynamic export into llm manual (#3202)
Summary: Pull Request resolved: #3202 This diff adds dynamic export into llm manual, including code and related comments. Also update other documentations for better understanding. Reviewed By: dbort Differential Revision: D56365041 fbshipit-source-id: 5ce4c15206a2923c4d54811cefca03f72869c719
Configuration menu - View commit details
-
Copy full SHA for 66a350b - Browse repository at this point
Copy the full SHA 66a350bView commit details -
Summary: Pull Request resolved: #3301 overriding_review_checks_triggers_an_audit_and_retroactive_review Oncall Short Name: executorch Differential Revision: D56517032 fbshipit-source-id: ec2f7fbb1111daf8bd529e0917be698bac3435f4
Configuration menu - View commit details
-
Copy full SHA for 5b0030f - Browse repository at this point
Copy the full SHA 5b0030fView commit details -
Fix portable is[inf|nan|_out compilation on older Linux (#3272)
Summary: By wrapping a potentially non-compliant `isinf`/`isnan` implementations into a lambda with a defined return type Compiler should be able to optimize it out into direct function call, see https://godbolt.org/z/bqYGd47Mx Pull Request resolved: #3272 Reviewed By: GregoryComer Differential Revision: D56504717 Pulled By: malfet fbshipit-source-id: 72da456027dbc837c3cfac83b18a5f002fedc3a5
Configuration menu - View commit details
-
Copy full SHA for e25e5d2 - Browse repository at this point
Copy the full SHA e25e5d2View commit details -
Use relative links in llm/getting-started.md (#3244)
Summary: Use relative markdown links instead of full URLs. This way, the docs will always point to a consistent branch. Pull Request resolved: #3244 Test Plan: Clicked on all modified links in the rendered docs preview: https://docs-preview.pytorch.org/pytorch/executorch/3244/llm/getting-started.html Reviewed By: Gasoonjia Differential Revision: D56479234 Pulled By: dbort fbshipit-source-id: 45fb25f017c73df8606c3fb861acafbdd82fec8c
Configuration menu - View commit details
-
Copy full SHA for b560864 - Browse repository at this point
Copy the full SHA b560864View commit details -
Update examples/README.md with Llama 3 and names (#3275)
Summary: - Added Llama 3 8B - Added llm_manual in the list - changed name from Extensa to Cadence Pull Request resolved: #3275 Reviewed By: Gasoonjia Differential Revision: D56524960 Pulled By: iseeyuan fbshipit-source-id: 2b4464028fe3cdf3c2b524d233fa3e87b2561dda
Configuration menu - View commit details
-
Copy full SHA for 98a7e66 - Browse repository at this point
Copy the full SHA 98a7e66View commit details -
Revert D56480274: Add more prebuilt artifacts
Differential Revision: D56480274 Original commit changeset: 451116a0f907 Original Phabricator Diff: D56480274 fbshipit-source-id: e9603e5076113560b1224a56432abf321f82e284
Configuration menu - View commit details
-
Copy full SHA for 727a68d - Browse repository at this point
Copy the full SHA 727a68dView commit details -
Summary: Pull Request resolved: #3300 This diff solves part of Ali's comments in our tracer sheet (https://docs.google.com/spreadsheets/d/1PoJt7P9qMkFSaMmS9f9j8dVcTFhOmNHotQYpwBySydI/edit#gid=0). Specifically speaking: "NanoGPT" -> "nanoGPT" "CoreML" -> "Core ML" "ExecuTorch Codebase" -> "ExecuTorch codebase" "Android Phone" -> "Android phone" "How to build Mobile Apps" -> "How to Build Mobile Apps" also shorten the following two column names for avoid overlapping. "occurrences_in_delegated_graphs" -> "# in_delegated_graphs" "occurrences_in_non_delegated_graphs" -> # in_non_delegated_graphs Reviewed By: Jack-Khuu Differential Revision: D56513601 fbshipit-source-id: 7015c2c5b94b79bc6c57c533ee812c9e58ab9d56
Configuration menu - View commit details
-
Copy full SHA for b669056 - Browse repository at this point
Copy the full SHA b669056View commit details -
Summary: . Reviewed By: cccclai Differential Revision: D56532283 fbshipit-source-id: 62d7c9e8583fdb5c9a1b2e781e80799c06682aae
Configuration menu - View commit details
-
Copy full SHA for ce1e9c1 - Browse repository at this point
Copy the full SHA ce1e9c1View commit details -
Update custom kernel registration API
Summary: As titled Reviewed By: lucylq, Gasoonjia, guangy10 Differential Revision: D56532035 fbshipit-source-id: ddf4f3864db0f200b97e67673a7086dac790eb82
Configuration menu - View commit details
-
Copy full SHA for f6758fc - Browse repository at this point
Copy the full SHA f6758fcView commit details -
Summary: - add note for embedding quantize, for llama3 - re-order export args to be the same as llama2, group_size missing `--` Pull Request resolved: #3315 Reviewed By: cccclai Differential Revision: D56528535 Pulled By: lucylq fbshipit-source-id: 4453070339ebdb3d782b45f96fe43d28c7006092
Configuration menu - View commit details
-
Copy full SHA for 34f59ed - Browse repository at this point
Copy the full SHA 34f59edView commit details -
Fix sdk_example_runner.sh (#3298)
Summary: Pull Request resolved: #3298 Reviewed By: Olivia-liu Differential Revision: D56509749 Pulled By: tarun292 fbshipit-source-id: 36b56e7cc039144105d64431697a16a793029af8
Configuration menu - View commit details
-
Copy full SHA for aa3e736 - Browse repository at this point
Copy the full SHA aa3e736View commit details -
Summary: . Reviewed By: cccclai Differential Revision: D56535633 fbshipit-source-id: 070a3b0af9dea234f8ae4be01c37c03b4e0a56e6
Configuration menu - View commit details
-
Copy full SHA for 035aee4 - Browse repository at this point
Copy the full SHA 035aee4View commit details -
Update MPS documentation; add helper script to build mps_executor_run…
…ner (#3324) Summary: **Summary of changes**: - Update MPS documentation to reflect all changes since previous release - Add helper script to build `mps_executor_runner` **Testing**: - Verified that mps_executor_runner builds correctly: ``` ./examples/apple/mps/scripts/build_mps_executor_runner.sh /examples/apple/mps/scripts/build_mps_executor_runner.sh --Debug ``` Verified that the docs are building correctly: ``` cd docs make html ``` cc shoumikhin, cccclai Pull Request resolved: #3324 Reviewed By: shoumikhin Differential Revision: D56535774 Pulled By: cccclai fbshipit-source-id: 5974795732dbe1089e3d63cd1b618cadf7a2573e
Configuration menu - View commit details
-
Copy full SHA for 453ebad - Browse repository at this point
Copy the full SHA 453ebadView commit details -
Remove the sorting of the nodes from partitioning (not needed for now…
… as Custom Metal kernels are yet not enabled) (#3328) Summary: Remove the sorting of the nodes from partitioning (not needed for now as Custom Metal kernels are yet not enabled) **Testing**: Verified that tracing works correctly with release branch: `python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3"` cc shoumikhin , cccclai Pull Request resolved: #3328 Reviewed By: shoumikhin Differential Revision: D56540389 Pulled By: cccclai fbshipit-source-id: e8a53f624b58ac4d2348c87e08acd5f2fb3de5b2
Configuration menu - View commit details
-
Copy full SHA for 9811eea - Browse repository at this point
Copy the full SHA 9811eeaView commit details -
copy node, aten.repeat (#3299)
Summary: Pull Request resolved: #3299 1. Introduce a `CopyNode` for generic copy-with-offset operations. 2. `aten.repeat` on all dimensions. 2.1 Use `CopyNode` where possible. 2.2. Specialized `repeat_channel` shader to handle packings 3. Update codegen to support `Methods` variant only operations. Need a new route to trigger the dispatch. ghstack-source-id: 223812048 Reviewed By: copyrightly Differential Revision: D56499329 fbshipit-source-id: 72936e621940588ce398dd62669ec9aa637e98ba
Configuration menu - View commit details
-
Copy full SHA for b2c794a - Browse repository at this point
Copy the full SHA b2c794aView commit details -
add buck2 installation into setup.md
Summary: bring buck2 installation back, and scrub any "-DBUCK2=buck2" in our docs, to unblock users from using buck2 Reviewed By: guangy10 Differential Revision: D56540769 fbshipit-source-id: 363e592c17dd2747a693e59d8d6b6d20f43c8451
Configuration menu - View commit details
-
Copy full SHA for 590cbce - Browse repository at this point
Copy the full SHA 590cbceView commit details
Commits on Apr 25, 2024
-
register
view
,reshape
andselect
Summary: - We register `select`, `unsqueeze` and `view` in `vulkan_partitioner.py` in order to run vulkan_delegate test (Python e2e test). The latter two might be used to implement `bmm` and `addmm`, so I want to make sure they work. - We register `reshape` in `View.cpp` explicitly. `reshape` is implemented through `_reshape_alias` (see [this](https://www.internalfb.com/code/fbsource/[a3dd6401f00d73f09bbdea63887fef54ea2c6dd2]/fbcode/caffe2/aten/src/ATen/native/native_functions.yaml?lines=4872-4881)) which is [decomposed as `view`](https://www.internalfb.com/code/fbsource/[bbb783ae1cff98b3b549da3edd845dde946d3da8]/xplat/caffe2/torch/_decomp/decompositions.py?lines=3669-3672). For codegen test, we still need to register the op, otherwise there is error ``` C++ exception with description "Exception raised from get_op_fn at xplat/executorch/backends/vulkan/runtime/graph/ops/OperatorRegistry.cpp:20: (it != table_.end()) is false! Could not find operator with name aten.reshape.default" thrown in the test body. ``` Reviewed By: yipjustin, liuk22 Differential Revision: D56454941 fbshipit-source-id: c83e6fb97d9cf9019cc6e786508f353a22236931
Configuration menu - View commit details
-
Copy full SHA for b2a7243 - Browse repository at this point
Copy the full SHA b2a7243View commit details -
Update llama2 readme file - main branch (#3340)
Summary: Pull Request resolved: #3340 Reviewed By: orionr, kimishpatel, cccclai Differential Revision: D56553088 Pulled By: mergennachin fbshipit-source-id: 2994dd3ab2692c5b972316af1617bd06d647af96
Configuration menu - View commit details
-
Copy full SHA for 79b79cb - Browse repository at this point
Copy the full SHA 79b79cbView commit details -
Build custom ops in pybinding (#3263)
Summary: Right now we are not building it and it is causing missing ops in torchchat. This PR adds it into pybinding. Pull Request resolved: #3263 Reviewed By: lucylq Differential Revision: D56500693 Pulled By: larryliu0820 fbshipit-source-id: 0ed0e28fcccb6002ef48e6a38b60e92d8af4def6
Configuration menu - View commit details
-
Copy full SHA for 30128f3 - Browse repository at this point
Copy the full SHA 30128f3View commit details -
Enable doc job to run on -rc tags. (#3345)
Summary: Pull Request resolved: #3345 Reviewed By: dbort Differential Revision: D56557091 Pulled By: svekars fbshipit-source-id: 4300ca86d01ec110fc6934588cd691c12661a730
Configuration menu - View commit details
-
Copy full SHA for fd63d0c - Browse repository at this point
Copy the full SHA fd63d0cView commit details -
Eliminate deprecated api usage (#2695)
Summary: Pull Request resolved: #2695 Reviewed By: mergennachin Differential Revision: D55091814 fbshipit-source-id: 04b2a888c6bbdaa195cb916c6564aa93daca2514
Configuration menu - View commit details
-
Copy full SHA for 8fcba36 - Browse repository at this point
Copy the full SHA 8fcba36View commit details -
Remove unneeded _to_copy in edge dialect.
Summary: In executorch we will dtype-specialize the kernels and also run on a single device with export. Therefore _to_copy is not needed in edge dialect. Reviewed By: tugsbayasgalan Differential Revision: D56579169 fbshipit-source-id: 5a2e3cd453a11bd2ad009b439587b0fc589f7fe4
Configuration menu - View commit details
-
Copy full SHA for 319a4f2 - Browse repository at this point
Copy the full SHA 319a4f2View commit details -
Extend setup cmake ability (#3349)
Summary: For executorch users, we see a common pattern that they have to: ```bash bash install_requirements.sh --pybind xnnpack cmake -S . -Bcmake-out ... cmake --build ... ``` This is repeating cmake build twice, the first one is inside setup.py. Here I'm adding a way to allow setup.py to install the libraries seperately, by passing `CMAKE_ARGS` and `CMAKE_BUILD_ARGS` into setup.py, through `install_requirements.sh`. After this change, user can do: ```bash export CMAKE_ARGS="-DCMAKE_INSTALL_PREFIX=<install dir> \ -DEXECUTORCH_BUILD_OPTIMIZED=ON \ ..." export CMAKE_BUILD_ARGS="--target install" bash install_requirements.sh --pybind xnnpack ``` Then we should be able to find `libxnnpack.a` `liboptimized_ops_lib.a` etc under install dir. Pull Request resolved: #3349 Reviewed By: mikekgfb Differential Revision: D56560786 Pulled By: larryliu0820 fbshipit-source-id: fb6cd230df2317067f07ae0f1e72d0596b7b454b
Configuration menu - View commit details
-
Copy full SHA for 8ec0af9 - Browse repository at this point
Copy the full SHA 8ec0af9View commit details -
Reviewed By: cccclai Differential Revision: D56543186 fbshipit-source-id: 4fed6b9b3ede3cdcb67a9a52150e3f22cc02b180
Configuration menu - View commit details
-
Copy full SHA for 7b3b485 - Browse repository at this point
Copy the full SHA 7b3b485View commit details -
Add EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT cmake option (#3356)
Summary: Currently, we always build two copies of the flatcc targets, just in case we happen to be cross-compiling. But because the flatcc project puts its binaries in the source directory, those two copies can interfere with each other. We don't need to build two copies when not cross-compiling, so add a new option to avoid the second "host" build. Eventually we should only enable this when cross-compiling, but for now disable it when building the pip package (which is never cross-compiled). Pull Request resolved: #3356 Test Plan: `rm -rf pip-out && ./install_requirements.sh` succeeded. Looking in the `pip-out/temp.*/cmake-out` directory, there is no `_host_build` directory, but the etdump headers were successfully generated under `pip-out/temp.*/cmake-out/sdk/include/executorch/sdk/etdump/`. Reviewed By: malfet, larryliu0820 Differential Revision: D56582507 Pulled By: dbort fbshipit-source-id: 4ce6c680657bc57cfcf016826364a3f46c4c953e
Configuration menu - View commit details
-
Copy full SHA for 80d72f2 - Browse repository at this point
Copy the full SHA 80d72f2View commit details -
Export the ET_VERSION_DOCS variable in doc build (#3358)
Summary: Pull Request resolved: #3358 Reviewed By: dbort Differential Revision: D56584847 Pulled By: svekars fbshipit-source-id: 77c4105edf15503bf1b29c1f120111a73b973c4c
Configuration menu - View commit details
-
Copy full SHA for c32b0a2 - Browse repository at this point
Copy the full SHA c32b0a2View commit details -
Fix extension/data_loader installation (#3355)
Summary: `libextension_data_loader.a` is not installed properly. This PR removes the prefix so that it can be properly installed Pull Request resolved: #3355 Test Plan: See `libextension_data_loader.a` showing up under executorch/cmake-out/lib. Reviewed By: lucylq, mikekgfb Differential Revision: D56580943 Pulled By: larryliu0820 fbshipit-source-id: b771192d03799fd576e8591ec7c45fae23f20762
Configuration menu - View commit details
-
Copy full SHA for c209e12 - Browse repository at this point
Copy the full SHA c209e12View commit details -
Reword "preview release" notice now that we are at alpha (#3364)
Summary: Pull Request resolved: #3364 Test Plan: https://docs-preview.pytorch.org/pytorch/executorch/3364/index.html Reviewed By: svekars Differential Revision: D56596949 Pulled By: dbort fbshipit-source-id: f6c71e072bcefbb7d04354d1ef78d780c14facb5
Configuration menu - View commit details
-
Copy full SHA for 7b3f5c6 - Browse repository at this point
Copy the full SHA 7b3f5c6View commit details
Commits on Apr 26, 2024
-
Fix quantized_linear cpp op schema
Summary: The cpp op schema does not match the registered one. Fix that. Reviewed By: tarun292, cccclai Differential Revision: D56594373 fbshipit-source-id: cb4853030715245e7a0177c0f193c4558f19584d
Configuration menu - View commit details
-
Copy full SHA for 44d4bac - Browse repository at this point
Copy the full SHA 44d4bacView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3fe25df - Browse repository at this point
Copy the full SHA 3fe25dfView commit details