Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disclaimer #3376

Closed
wants to merge 270 commits into from
Closed

disclaimer #3376

wants to merge 270 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Apr 9, 2024

  1. update torch pin (#2944)

    Summary:
    Pull Request resolved: #2944
    
    Need change from D55354487 to get mutable buffer + pt2e working
    
    Reviewed By: JacobSzwejbka
    
    Differential Revision: D55922254
    
    fbshipit-source-id: 5ea4471eb0e22149a0dbb4e921fe447cceb13bf1
    cccclai authored and facebook-github-bot committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    cb6ddae View commit details
    Browse the repository at this point in the history
  2. aten.convolution (Transpose) (#2883)

    Summary:
    Pull Request resolved: #2883
    
    ## Summary (cases handled)
    
    We introduce support for the convolution cases covered by ATen-VK's transpose implementation. This is achieved by
    - reusing the existing [`conv_transpose2d.glsl`](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/glsl/conv_transpose2d.glsl), and
    - [moving special weights prepacking from CPU](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/ops/Convolution.cpp#L134-L235) to the GPU in `conv_transpose2d_prepack_weights.glsl`.
    
    We also include resizing support for dynamic shapes. Note that only height and width of the input can vary.
    
    ## Cases not handled
    
    The implementation is on-par with ATen-VK's Transpose. This means the following cases are missing:
    1. **Groups G > 1.**
    2. **Batch (input) N > 1.**
    3. **Dilation > 1.**
    ghstack-source-id: 221721754
    exported-using-ghexport
    bypass-github-export-checks
    
    Reviewed By: copyrightly, SS-JIA
    
    Differential Revision: D55667336
    
    fbshipit-source-id: 3b7b7c912ef947610624e2e1c5b753de393234a0
    jorgep31415 authored and facebook-github-bot committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    8a6427e View commit details
    Browse the repository at this point in the history
  3. aten.convolution (Depthwise) (#2884)

    Summary:
    Pull Request resolved: #2884
    
    ## Summary
    We introduce support for the convolution cases covered by [ATen-VK's default Depthwise implementation](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/ops/Convolution.cpp#L68). This is achieved by
    - reusing the [existing `conv2d_dw.glsl`](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/glsl/conv2d_dw.glsl), and
    - [moving special weights prepacking from CPU](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/ops/Convolution.cpp#L80-L132) to the GPU in `conv2d_dw_prepack_weights.glsl`.
    
    The implementation is on-par with ATen-VK's Depthwise. This means it only covers:
    - `in_channels == groups`, `out_channels == groups`
    
    A full implementation would cover, for any positive integer K:
    - `in_channels == groups`, `out_channels == groups * K`
    ghstack-source-id: 221721752
    exported-using-ghexport
    bypass-github-export-checks
    
    Reviewed By: SS-JIA
    
    Differential Revision: D55813511
    
    fbshipit-source-id: c0726798bd36cc5ff2326836c28a5f7d23494f5e
    jorgep31415 authored and facebook-github-bot committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    c4ac14c View commit details
    Browse the repository at this point in the history
  4. Fix Validation Layer warnings about wrong image layout (#2854)

    Summary:
    Pull Request resolved: #2854
    
    ## Context
    
    Currently, when executing a `ComputeGraph` with prepacked tensors with [Vulkan Validation Layers](https://github.com/KhronosGroup/Vulkan-ValidationLayers) turned on, the following Validation Errors can be observed. Note that Validation Layers can be turned on by running Vulkan binaries on Mac with the `vkconfig` app opened.
    
    ```
    UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x7fb76dbbf988, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x7fb76dbbf988[] expects VkImage 0xd79c8a0000000f09[] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_GENERAL--instead, current layout is VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.
        Objects: 1
            [0] 0x7fb76dbbf988, type: 6, name: NULL
    ```
    
    The reason for this is that prepacked textures are written to with `WRITE` memory access during packing, which means they will be in the `VK_IMAGE_LAYOUT_GENERAL` layout. However, they will subsequently be read from during `graph.execute()`, meaning the texture will have transitioned to `VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL`, but will be bound using the `VK_IMAGE_LAYOUT_GENERAL` layout. Subsequent calls to `execute()` will therefore see that the prepacked texture has been bound with the wrong layout, since after the first graph execution the texture will have the `VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL` layout.
    
    The solution is to submit a no-op shader dispatch during prepacking to trigger a transition to the `READ_ONLY_OPTIMAL` layout.
    ghstack-source-id: 221871426
    
    bypass-github-pytorch-ci-checks
    
    Reviewed By: jorgep31415
    
    Differential Revision: D55772003
    
    fbshipit-source-id: f9c69e6e571ca0d0d28a6c25716766af98e82d41
    SS-JIA authored and facebook-github-bot committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    4599650 View commit details
    Browse the repository at this point in the history
  5. Introduce convenience constexpr for StorageTypes and `GPUMemoryLayo…

    …ut`s (#2948)
    
    Summary:
    Pull Request resolved: #2948
    
    ## Context
    
    Introduce the following convenience `constexpr`:
    
    * `api::kBuffer`, `api::kTexture3D`, and `api::kTexture2D`
    * `api::kWidthPacked`, `api::kHeightPacked`, and `api::kChannelsPacked`
    
    Also remove the `api::StorageType::UNKNOWN` enum entry as it doesn't really serve any purpose.
    ghstack-source-id: 221871428
    
    bypass-github-pytorch-ci-checks
    
    Reviewed By: copyrightly, jorgep31415
    
    Differential Revision: D55811278
    
    fbshipit-source-id: 26dc1706ac2605c13f247d08a21863ff3ef94488
    SS-JIA authored and facebook-github-bot committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    b26eee8 View commit details
    Browse the repository at this point in the history
  6. Use __ET_UNLIKELY in assertion macros (#2949)

    Summary:
    Pull Request resolved: #2949
    
    It is supposed to be unlikely for assert/check conditions to fail; let's tell the compiler about that.
    
    Reviewed By: mergennachin
    
    Differential Revision: D55929730
    
    fbshipit-source-id: 5677c19cd8342cbd77a9c0b973059ed3d5ee800b
    swolchok authored and facebook-github-bot committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    6cb6051 View commit details
    Browse the repository at this point in the history
  7. s/heirarchies/hierarchies/ (#2772)

    Summary:
    Pull Request resolved: #2772
    
    Just a spelling mistake.
    
    Reviewed By: JacobSzwejbka
    
    Differential Revision: D55542731
    
    fbshipit-source-id: c12bcab53661561bf0d8223d5cae9ed92b39e599
    swolchok authored and facebook-github-bot committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    3661a11 View commit details
    Browse the repository at this point in the history
  8. Fix indentation in selective build example code (#2773)

    Summary:
    Pull Request resolved: #2773
    
    Noticed this page didn't line up right. Now it does.
    
    Reviewed By: mergennachin, kirklandsign
    
    Differential Revision: D55542836
    
    fbshipit-source-id: a25a376ce9e77f3bc360e9ab6cf15c9ae9ecc7bf
    swolchok authored and facebook-github-bot committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    02f565e View commit details
    Browse the repository at this point in the history
  9. aten.convolution (Depthwise Output-Tile) (#2885)

    Summary:
    Pull Request resolved: #2885
    
    We port an optimization from ATen-VK for specific weight sizes: [`conv2d_dw_output_tile.glsl`](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/glsl/conv2d_dw_output_tile.glsl)
    ghstack-source-id: 221887576
    exported-using-ghexport
    bypass-github-export-checks
    
    Reviewed By: SS-JIA
    
    Differential Revision: D55814588
    
    fbshipit-source-id: 86a85d122abbcebfed41466bc0a4907a6ddc80f9
    jorgep31415 authored and facebook-github-bot committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    f00afe7 View commit details
    Browse the repository at this point in the history
  10. aten.convolution (Pointwise) (#2886)

    Summary:
    Pull Request resolved: #2886
    
    We port an optimization from ATen-VK for specific weight sizes: [`conv2d_pw.glsl`](https://github.com/pytorch/pytorch/blob/09c72eaa3f69f90402c86a30abf4fc621298578c/aten/src/ATen/native/vulkan/glsl/conv2d_pw.glsl)
    ghstack-source-id: 221887670
    exported-using-ghexport
    bypass-github-export-checks
    
    Reviewed By: SS-JIA
    
    Differential Revision: D55814587
    
    fbshipit-source-id: 419d82ddcf2dce59b2d1ec5abf313356fce074e6
    jorgep31415 authored and facebook-github-bot committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    99c4f4e View commit details
    Browse the repository at this point in the history

Commits on Apr 10, 2024

  1. Make minor updates to LLM guide setup instructions (#2940)

    Summary:
    Minor updates to the prerequisite section of the LLM getting started guide. Passing -s to pyenv install prevents a prompt if python 3.10 is already installed (it will just silently continue in this case when the flag is passed). Additionally, under pyenv, we should be using python, not python3. I also added a little bit of wording on env management.
    
    Pull Request resolved: #2940
    
    Test Plan: Ran LLM guide prerequisite section on an m1 mac with pyenv-virtualenv.
    
    Reviewed By: byjlw
    
    Differential Revision: D55913382
    
    Pulled By: GregoryComer
    
    fbshipit-source-id: 7f04262b025db83b8621c972c90d3cdc3f029377
    GregoryComer authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    218f643 View commit details
    Browse the repository at this point in the history
  2. resolve_buck.py: Add an entry for darwin-x86_64 (#2868)

    Summary:
    Version hash reported by
    https://github.com/facebook/buck2/releases/download/2024-02-15/buck2-x86_64-apple-darwin.zst
    
    Pull Request resolved: #2868
    
    Reviewed By: Olivia-liu
    
    Differential Revision: D55914146
    
    Pulled By: GregoryComer
    
    fbshipit-source-id: b9882900acfd4cb6f74eda90a7c99bdb119ec122
    dbort authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    de7fdaa View commit details
    Browse the repository at this point in the history
  3. Compute graph print readable (#2825)

    Summary:
    Pull Request resolved: #2825
    
    Add capability to print the node list with arguments to allow better debugging.
    
    Reviewed By: SS-JIA
    
    Differential Revision: D55510335
    
    fbshipit-source-id: 151e3a6f249417dfe644172c1b5f0e83a3b110dd
    yipjustin authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    564c276 View commit details
    Browse the repository at this point in the history
  4. aten.convolution (Bias=False) (#2887)

    Summary:
    Pull Request resolved: #2887
    
    The final touches to get ET-VK convolution on-par with ATen-VK's convolution.
    
    ## Idea
    In our shaders, we add the bias to our sum.
    ```
    ${VEC4_T[DTYPE]} sum = texelFetch(bias_in, ivec2(pos.z, 0), 0);
    ```
    To keep our shaders as is, we implement having no bias by allocating a buffer of zeros. Then, our shader adds zero to our sum.
    
    ## Issue
    If `Bias=False`, dummy buffer of zeros is not serialized with the graph. The bias ValueRef is deserialized in the runtime as `TypeTag::NONE`, not `TypeTag::TENSORREF`.
    
    ## Solution
    If `TypeTag::NONE` is given, (1) create the `vTensor` using the `out_channels` value from the weights, (2) allocate a StagingBuffer of that size, and (3) `memset` its data to zero. Failure to do (3) will result in undefined behavior.
    
    ghstack-source-id: 221926167
    exported-using-ghexport
    bypass-github-export-checks
    
    Reviewed By: SS-JIA
    
    Differential Revision: D55814589
    
    fbshipit-source-id: ce7b82c31bb11540ed2d98ab14131841fcee93e4
    jorgep31415 authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    8aaf2c5 View commit details
    Browse the repository at this point in the history
  5. Add convolution cases to codegen (#2920)

    Summary:
    Pull Request resolved: #2920
    
    TSIA
    ghstack-source-id: 221926168
    exported-using-ghexport
    bypass-github-export-checks
    
    Reviewed By: SS-JIA
    
    Differential Revision: D55829466
    
    fbshipit-source-id: 48b4f15c41141093dd061c43e6b769eb4c25c81b
    jorgep31415 authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    f0bfc3c View commit details
    Browse the repository at this point in the history
  6. add aten.sum.default (#2807)

    Summary:
    Pull Request resolved: #2807
    
    The operator `aten.sum.dim_IntList` could take an empty list as the parameter for `dims`. We modify `vulkan_graph_builder.py` to accommodate the empty list.
    
    Moreover, the op `aten.sum.default` is implemented as a [decomposition](https://www.internalfb.com/code/fbsource/[96e496f9db8f92967b4394bd4f60e39ab916740b]/xplat/caffe2/torch/_decomp/decompositions.py?lines=4676) into `aten.sum.dim_IntList` with empty `dims`. So we will support `aten.sum.default` with the changes.
    
    Context: `torch.sum(x, ())` and `torch.sum(x)` are two ways to compute the sum of all elements in tensor `x`.
    
    Reviewed By: SS-JIA, jorgep31415
    
    Differential Revision: D55630993
    
    fbshipit-source-id: 923d276118e893ff6885b92eb7b4c7cb7a95b374
    copyrightly authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    b145701 View commit details
    Browse the repository at this point in the history
  7. Fix failing CI jobs caused by #2934 (#2961)

    Summary:
    Pull Request resolved: #2961
    
    Fix these 3 CI job failures caused by #2934 (D55907752):
    
    * Apple / build-frameworks-ios / macos-job
    * trunk / test-arm-backend-delegation / linux-job
    * trunk / test-coreml-delegate / macos-job
    
    Reviewed By: kirklandsign
    
    Differential Revision: D55950023
    
    fbshipit-source-id: 6166d9112e6d971d042df1400442395d8044c3b3
    larryliu0820 authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    d993797 View commit details
    Browse the repository at this point in the history
  8. Replace std::stringstream with std::string for Shader names (#2964)

    Summary:
    Pull Request resolved: #2964
    
    ## Context
    
    Some research into efficient string concatenation suggests that streams in C++ are not quite efficient. The best way to concatenate strings seems to be creating a `std::string` and reserving sufficient capacity for the `std::string`. This diff deprecates the usage of `std::stringstream` when constructing kernel names in favor of using `std::string` directly.
    
    Reviewed By: copyrightly
    
    Differential Revision: D55951475
    
    fbshipit-source-id: a1a584669e80984b85d11b7d6d4f7593290e562b
    SS-JIA authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    a983ebc View commit details
    Browse the repository at this point in the history
  9. Refine the LLM manual (focus on the debugging and profiling part) (#2952

    )
    
    Summary:
    Pull Request resolved: #2952
    
    * Some auto-formatting by my VSCode (remove extra spaces)
    * Remove imports that have been imported in previous part of the doc
    * Other minor changes to keep consistency across the doc
    * Link a screenshot instead of using the raw table because the original table is illegible:
     {F1482781056}
    
    Reviewed By: GregoryComer
    
    Differential Revision: D55938344
    
    fbshipit-source-id: 699abb9ebe1196ab73d90a3d08d60be7aa0d8688
    Olivia-liu authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    e733f2d View commit details
    Browse the repository at this point in the history
  10. Android demo app tutorial fix for XNNPACK and QNN (#2962)

    Summary:
    * Update tutorial due to recent changes.
    * Clean up setup.sh for app helper lib build.
    
    Pull Request resolved: #2962
    
    Reviewed By: cccclai
    
    Differential Revision: D55951189
    
    Pulled By: kirklandsign
    
    fbshipit-source-id: 2c95e8580145b039f503e7cd99a4003867f8dbb0
    kirklandsign authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    26365f1 View commit details
    Browse the repository at this point in the history
  11. Qualcomm AI Engine Direct - Enable per channel linear op (#2822)

    Summary:
    - Add per channel weight quantization for linear op
    - Bias quantization for per channel weight Linear op is not support yet
    
    Pull Request resolved: #2822
    
    Reviewed By: kirklandsign
    
    Differential Revision: D55731629
    
    Pulled By: cccclai
    
    fbshipit-source-id: 831a47c897b34e1a749325df56a8bbd0acda80e1
    chunit-quic authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    554cd27 View commit details
    Browse the repository at this point in the history
  12. Custom ops API small fixes (#2936)

    Summary:
    Pull Request resolved: #2936
    
    Fix the way we use `at::from_blob()` and add proper namespace to `CompileTimeFunctionPointer` so to not confused with `at::CompileTimeFunctionPointer`.
    
    bypass-github-pytorch-ci-checks
    bypass-export-ci-checks
    
    Reviewed By: lucylq
    
    Differential Revision: D55907751
    
    fbshipit-source-id: ad793e30ec72f48e7300d75820209035d42cae6c
    larryliu0820 authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    8f8d969 View commit details
    Browse the repository at this point in the history
  13. Consolidate EXECUTORCH_BUILD_CUSTOM option (#2935)

    Summary:
    Pull Request resolved: #2935
    
    Currently `EXECUTORCH_BUILD_CUSTOM` is not being respected properly.
    
    If this option is false, we should not build `llama2/custom_ops` anywhere.
    
    If this option is true, we should build `llama2/custom_ops` in both llama runner binary and pybind.
    
    This PR consolidates it.
    
    bypass-github-pytorch-ci-checks
    bypass-export-ci-checks
    
    Reviewed By: lucylq
    
    Differential Revision: D55907750
    
    fbshipit-source-id: 03a7a8cbd499c734060de385d6edb193cf35470d
    larryliu0820 authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    d209e41 View commit details
    Browse the repository at this point in the history
  14. Consolidate tokenizer interface (#2954)

    Summary:
    Pull Request resolved: #2954
    
    Change the tokenizer APIs to:
    
    ```
    Result<std::vector<uint64_t>> encode(const std::string& input, int8_t bos, int8_t eos);
    Result<std::string> decode(uint64_t prev_token, uint64_t token);
    ```
    
    Notice that: we use `uint64_t` for token id just to be safe. We return a std::vector of tokens for encode() API.
    
    Reviewed By: lucylq
    
    Differential Revision: D55944780
    
    fbshipit-source-id: 9b44437e7061424526f4e0b049a3449129f0ba53
    larryliu0820 authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    948760a View commit details
    Browse the repository at this point in the history
  15. Update OSS repo (#2033)

    Summary:
    Pull Request resolved: #2033
    
    Update the OSS Xtensa repo with more up to date compiler and quantizer things. Introduce a test folder and a conv1d test.
    
    Reviewed By: tarun292, cccclai
    
    Differential Revision: D54034581
    
    fbshipit-source-id: c2bf0c43897a2ef7dff291698370d2583433a6ba
    mcremon-meta authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    859e924 View commit details
    Browse the repository at this point in the history
  16. Add the missing import generate_etrecord to doc Getting Started with …

    …LLM (#2977)
    
    Summary:
    Pull Request resolved: #2977
    
    As titled
    
    Reviewed By: Gasoonjia
    
    Differential Revision: D55992093
    
    fbshipit-source-id: 7864c330bd86af5d4127cacfd47e96f1e6666bfb
    Olivia-liu authored and facebook-github-bot committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    cb9caa3 View commit details
    Browse the repository at this point in the history

Commits on Apr 11, 2024

  1. Fix llama runner test (#2981)

    Summary:
    Pull Request resolved: #2981
    
    As titled, a quick follow up of D55907750
    
    Reviewed By: lucylq
    
    Differential Revision: D55996735
    
    fbshipit-source-id: f535b013b7b900c5a2c2ed79f6b6738dcf1f91ec
    larryliu0820 authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    75c27c3 View commit details
    Browse the repository at this point in the history
  2. Forward fix macOS job after test-infra #5086 (#2980)

    Summary:
    After pytorch/test-infra#5086, the working directory is now set correctly, so `pushd` isn't needed anymore.  More importantly, trying to change the directory ends up failing all macOS CI jobs because that subdirectory doesn't exist.
    
    Pull Request resolved: #2980
    
    Reviewed By: larryliu0820
    
    Differential Revision: D55996299
    
    Pulled By: huydhn
    
    fbshipit-source-id: 05758603d7628cc0a01fd577a49202d45c84e6c5
    huydhn authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    2fc99b0 View commit details
    Browse the repository at this point in the history
  3. Add a mock perf test for llama2 on Android (#2963)

    Summary:
    I'm trying to setup a simple perf test when running llama2 on Android.  It's naively sent a prompt and record the TPS.  Open for comment about the test here before setting this up on CI.
    
    ### Testing
    
    Copy the exported model and the tokenizer as usual, then cd to the app and run `./gradlew :app:connectAndroidTest`.  The test will fail if the model is failed to load or if the TPS is lower than 7 as measure by https://github.com/pytorch/executorch/tree/main/examples/models/llama2
    
    Pull Request resolved: #2963
    
    Reviewed By: kirklandsign
    
    Differential Revision: D55951637
    
    Pulled By: huydhn
    
    fbshipit-source-id: 34c189aefd7e31514fcf49103352ef3cf8e5b2c9
    huydhn authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    d761f99 View commit details
    Browse the repository at this point in the history
  4. Core ML Has Added Index_Put Support, No Need to Skip Anymore (#2975)

    Summary:
    It was a workaround to skip `aten.index_put` op in Core ML delegation, at the cost of partitioning the Llama model into 13 pieces.
    
    For better performance, we prefer to delegate the whole model to Core ML. Since Core ML has added the [necessary support](apple/coremltools#2190), it is time to revert this workaround
    
    Pull Request resolved: #2975
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56002979
    
    Pulled By: cccclai
    
    fbshipit-source-id: e7a7c8c43706cb57eba3e6f720b3d713bec5065b
    yifan_shen3 authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    7d4bafc View commit details
    Browse the repository at this point in the history
  5. Minor fix in README.md page

    Summary: It's not obvious that there are two different versions of the documentation.
    
    Reviewed By: iseeyuan
    
    Differential Revision: D56018543
    
    fbshipit-source-id: 09e5facf3c2f2faaf216ebc76cd5c21697dbcb37
    mergennachin authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    7c71970 View commit details
    Browse the repository at this point in the history
  6. Add llama2 readme in examples/README (#2992)

    Summary:
    Pull Request resolved: #2992
    
    We should promote the llama2 page more in https://github.com/pytorch/executorch/tree/main/examples/
    
    bypass-github-export-checks
    bypass-github-pytorch-ci-checks
    bypass-github-executorch-ci-checks
    
    Reviewed By: iseeyuan
    
    Differential Revision: D56018978
    
    fbshipit-source-id: cbbc7bd2ea4ce55e564bd6b4a2900f623599dde6
    mergennachin authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    e641ffc View commit details
    Browse the repository at this point in the history
  7. Use new API to register custom ops for llama model (#2916)

    Summary:
    Pull Request resolved: #2916
    
    Retry of D55713944
    
    Use `EXECUTORCH_LIBRARY` to register custom kernel to ExecuTorch runtime.
    
    Reviewed By: lucylq
    
    Differential Revision: D55856491
    
    fbshipit-source-id: 0e17ea18a7cd0b0b45a8e56e9d09ab5e2f8eb95e
    larryliu0820 authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    6e43135 View commit details
    Browse the repository at this point in the history
  8. Fix tutorial for Qualcomm AI Engine Direct Backend (#2956)

    Summary:
    We have refactors recently and need to update the tutorial and cmake.
    
    See #2955 for isseues.
    
    Pull Request resolved: #2956
    
    Reviewed By: mcr229, cccclai
    
    Differential Revision: D55947725
    
    Pulled By: kirklandsign
    
    fbshipit-source-id: f23af28b9a8fe071223d8ffa922a6cd4e49efe61
    kirklandsign authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    c7fd394 View commit details
    Browse the repository at this point in the history
  9. Update name from xtensa to cadence (#2982)

    Summary:
    Pull Request resolved: #2982
    
    As titled.
    
    Reviewed By: cccclai
    
    Differential Revision: D55998135
    
    fbshipit-source-id: a57bd233afe170290c7def4406d6d6e769d467ed
    mcremon-meta authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    7b8343b View commit details
    Browse the repository at this point in the history
  10. Use new API to register custom ExecuTorch kernels into ATen (#2937)

    Summary:
    Pull Request resolved: #2937
    
    Retry of D55713944
    Use `WRAP_TO_ATEN` to register custom ExecuTorch kernel to PyTorch.
    
    This PR added installation logic to `libcustom_ops_aot_lib.so` in `setup.py`. This is to make sure we can build `libcustom_ops_aot_lib.so` and install it to the correct position (`<site-packages>/executorch/examples/models/llama2/custom_ops/libcustom_ops_aot_lib.so`) and then it can be loaded by `torch.ops.load_library`.
    
    Reviewed By: lucylq
    
    Differential Revision: D55907749
    
    fbshipit-source-id: 6b7f9af3c68b31f6df780a041291684eb6ddd90f
    larryliu0820 authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    c322685 View commit details
    Browse the repository at this point in the history
  11. fix et-view (#2843)

    Summary:
    Pull Request resolved: #2843
    
    et-view should always copy the data pointer.
    
    Reviewed By: JacobSzwejbka
    
    Differential Revision: D55715318
    
    fbshipit-source-id: 9745cfc3a84e40cfc29fe6c6a4cbe4151d14d68c
    metascroy authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    1f6f711 View commit details
    Browse the repository at this point in the history
  12. Replace view copy with view (3/3) (#2463)

    Summary:
    Pull Request resolved: #2463
    
    Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib
    
    This stack replaces view_copy nodes with memory.view nodes.
    
    In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.
    
    In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.
    
    Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.
    
    In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).
    
    Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.
    
    The first two steps are the just the first and second diff described above.
    
    In config.to_out_var_pass, the memory.view nodes are skipped.
    
    In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)
    
    Finally, during emission the memory.view is emitted as an evalue.
    
    There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.
    
    Reviewed By: larryliu0820
    
    Differential Revision: D54827438
    
    fbshipit-source-id: ed29b9b2653f512ef3b4006e159d225f835ebbf6
    metascroy authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    62a4dd3 View commit details
    Browse the repository at this point in the history
  13. Skip annotate boolean input (#2957)

    Summary:
    Pull Request resolved: #2957
    
    ghstack-source-id: 222200589
    exported-using-ghexport
    
    It only makes sense to quantize fp tensor, but not boolean. Add a check to make sure only fp tensor are annotated in quantizer
    
    Reviewed By: jerryzh168
    
    Differential Revision: D55946526
    
    fbshipit-source-id: d94bfee38ab2d29fc9672ab631b4d5d0c5239d25
    cccclai authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    ce344bc View commit details
    Browse the repository at this point in the history
  14. Fix build-framework-ios CI job (#2996)

    Summary:
    As titled. `build_apple_frameworks.sh` is copying all the exported headers out and in #2934 `//executorch/schema:program` is being moved to `exported_deps` and causing `build_apple_frameworks.sh` to not able to copy generated headers `program_generated.h` and `scalar_type_generated.h`.
    
    This PR fixes it by moving it back to `deps`.
    
    Pull Request resolved: #2996
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56028952
    
    Pulled By: larryliu0820
    
    fbshipit-source-id: 2cd4999154877b0ac7b49cd1f54d518cba34b2f2
    larryliu0820 authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    3b727a7 View commit details
    Browse the repository at this point in the history
  15. Extend constant prop pass to work with int/float/etc scalars and fix …

    …input specs. (#2950)
    
    Summary:
    Pull Request resolved: #2950
    
    1. Cleanup / Refactor constant prop pass.
    
    2. Enable constant propagation for ops with constant scalar arguments -- int/float/dtype/bool/str.
    Nodes of type `Op(constant_tensor, some_int, some_float, some_dtype, ...)` can now be constant propagated.
    
    3. Fix order of input spec to match the expected spec in `ExportGraphSignature` class.
    parameters->buffers->constants->user_inputs.
    Before this diff, input_specs for the newly added constant tensors were appended to graph_signature, which would cause failures.
    
    Reviewed By: dulinriley
    
    Differential Revision: D55891278
    
    fbshipit-source-id: fe1867cb6a99d0140d6a2e076027688cb1ddc0cd
    hsharma35 authored and facebook-github-bot committed Apr 11, 2024
    Configuration menu
    Copy the full SHA
    5ef8427 View commit details
    Browse the repository at this point in the history

Commits on Apr 12, 2024

  1. Introduce vTensorPtr to prevent reference invalidation and remove `…

    …get_val()` API (#2978)
    
    Summary:
    Pull Request resolved: #2978
    
    ## Context
    
    Currently when writing operators developers will save a reference to a `vTensor` retrieved from a `ComputeGraph`'s list of `values_` like so:
    
    ```
    vTensor& vten = graph.get_val(vref).toTensor();
    ```
    
    However, this is dangerous since if any values are added once the reference has been stored, `values_` which is a `std::vector` may have been resized and therefore have its contents moved, meaning the reference is now invalid.
    
    To protect against this, this changeset introduces the `vTensorPtr` class which is a wrapper around a `vTensor*`.  When constructed, it will increment a counter in the `ComputeGraph` instance, and when destroyed it will decrement the counter. `ComputeGraph` cannot add any values while the counter is not zero.
    
    Since `Value` can be converted to other non-trivial types, this changeset also removes the `get_val` function entirely to guard against unsafe behaviour.
    ghstack-source-id: 222224052
    exported-using-ghexport
    
    Reviewed By: jorgep31415
    
    Differential Revision: D55984187
    
    fbshipit-source-id: 22c619f651b5b3783c7626263694ca46b9f84723
    SS-JIA authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    76d8513 View commit details
    Browse the repository at this point in the history
  2. Add Tiktoken in python (#2986)

    Summary:
    Tiktoken by OpenAI is a popular tokenizer.
    
    Pull Request resolved: #2986
    
    Reviewed By: lucylq
    
    Differential Revision: D56004355
    
    Pulled By: larryliu0820
    
    fbshipit-source-id: 5656eba6fc6e550fc1d7356162da1d1897e43e78
    larryliu0820 authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    46cf1c7 View commit details
    Browse the repository at this point in the history
  3. Dynamic Shapes (#2442)

    Summary:
    Pull Request resolved: #2442
    
    Only need to look at tester.py file for the tester changes.
    
    Change is from `.run_method().compare_outputs() ` to `.run_method_and_compare_outputs()`
    
    now if Tester is initialized with dynamic inputs, we will generate random dynamic inputs (according to the specification of the dynamic shapes)  to run on the model. This allows us to test that the inputs fed into the model can be dynamic.
    
    We ad a num_runs to run_method_and_compare_outputs so that we can choose to run a number of different dynamic inputs with dynamic shapes.
    
    Reviewed By: digantdesai, kirklandsign
    
    Differential Revision: D54650121
    
    fbshipit-source-id: a813816cf19850219ec0962aaf6592f1047e85c8
    mcr229 authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    65be9b4 View commit details
    Browse the repository at this point in the history
  4. dynamic qd8-fc test with 2 batch dims (#2441)

    Summary:
    Pull Request resolved: #2441
    
    Adding the first dynamic input test, in which we test DQ Linear where it's inputs have rank = 3.
    
    Reviewed By: digantdesai, kirklandsign
    
    Differential Revision: D54665767
    
    fbshipit-source-id: 3c6c7eb0a10b32f390effeb9ae88b74df21e823f
    mcr229 authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    bf59da6 View commit details
    Browse the repository at this point in the history
  5. dynamic mobilenetv2 (#2440)

    Summary:
    Pull Request resolved: #2440
    
    adding dynamism to mobilenetv2 and testing
    
    Reviewed By: kirklandsign
    
    Differential Revision: D54666427
    
    fbshipit-source-id: 5699636bbd18598ab26adb5054824c5a38534396
    mcr229 authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    1f5a833 View commit details
    Browse the repository at this point in the history
  6. dynamic mv3 (#2475)

    Summary:
    Pull Request resolved: #2475
    
    Test to verify dynamic mv3
    
    Reviewed By: digantdesai, kirklandsign
    
    Differential Revision: D54972684
    
    fbshipit-source-id: c3573f17bd26dc391d249b7c15217b7e500e9adf
    mcr229 authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    fec9c2f View commit details
    Browse the repository at this point in the history
  7. Dynamic ResNet (#2474)

    Summary:
    Pull Request resolved: #2474
    
    Test for dynamic resnet.
    
    ResNet has some restrictions on the input shape, so we create a dynamic version by bilinear resizing the input to resnet's fixed shape. Thus we test that dynamic bilinear resize correctly resizes to fixed shape
    
    Reviewed By: digantdesai, kirklandsign
    
    Differential Revision: D54972682
    
    fbshipit-source-id: f8a1128437ca9c562ccc3eb5ff03545455b548fa
    mcr229 authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    33f41bd View commit details
    Browse the repository at this point in the history
  8. Dynamic ViT (#2476)

    Summary:
    Pull Request resolved: #2476
    
    Tests for Dynamic ViT
    
    We make ViT dynamic by bilinear resizing the input before feeding to ViT
    
    Reviewed By: digantdesai, kirklandsign
    
    Differential Revision: D54972681
    
    fbshipit-source-id: 626195d07d45c05112dfd251005c407a6444a87b
    mcr229 authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    d1bc794 View commit details
    Browse the repository at this point in the history
  9. add export configs (#2965)

    Summary: Pull Request resolved: #2965
    
    Reviewed By: larryliu0820
    
    Differential Revision: D55953027
    
    fbshipit-source-id: 1e5f60e46daf3591167b8c703e5452b3125b7904
    lucylq authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    ab323a5 View commit details
    Browse the repository at this point in the history
  10. Add exir.save and exir.load with export_serialize (#3000)

    Summary:
    Pull Request resolved: #3000
    
    Adding exir.save and exir.load similar to torch.export.save and torch.export.load for saving and loading edge exported program's.
    
    Reviewed By: cccclai
    
    Differential Revision: D56037593
    
    fbshipit-source-id: dc2a11b836baf479fcf6e23f33b345cb239f3ac5
    tarun292 authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    6acc86f View commit details
    Browse the repository at this point in the history
  11. Fix 3 CI jobs (#3006)

    Summary:
    * Apple / build-frameworks-ios / macos-job
    
    We removed libcustom_ops_lib.a in #2916 so need to remove it from `build_apple_frameworks.sh`.
    
    * Lint / lintrunner / linux-job
    
    Remove extra line in backends/qualcomm/quantizer/utils.py
    
    * pull / unittest / macos (buck2) / macos-job
    
    Fix it by using `executorch_no_prim_ops` instead of `executorch` in MPS and CoreML.
    
    Pull Request resolved: #3006
    
    Reviewed By: lucylq
    
    Differential Revision: D56048430
    
    Pulled By: larryliu0820
    
    fbshipit-source-id: 9dcb476eea446ea3aba566d595167c691fb00eec
    larryliu0820 authored and facebook-github-bot committed Apr 12, 2024
    2 Configuration menu
    Copy the full SHA
    5b7c4ba View commit details
    Browse the repository at this point in the history
  12. Add util to print out ops and frequency (#2983)

    Summary:
    Pull Request resolved: #2983
    
    As titled.
    
    Reviewed By: cccclai
    
    Differential Revision: D56001227
    
    fbshipit-source-id: cefef12662e03171136f03138fb814d61a28a0f3
    mcremon-meta authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    b1edc3d View commit details
    Browse the repository at this point in the history
  13. Decouple custom ops in llama_transformer.py Part 1/N (#3005)

    Summary:
    This is a no-op
    
    Pull Request resolved: #3005
    
    Test Plan:
    CI
    
    Run with
    
    `python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -kv --use_sdpa_with_kv_cache -X`
    
    and with
    
    `python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -kv -X`
    
    Make sure both work
    
    Reviewed By: cccclai
    
    Differential Revision: D56048177
    
    Pulled By: mergennachin
    
    fbshipit-source-id: 3ac9ac5c34f6fe215de1cfe8b5ddc7aae3635359
    mergennachin authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    488afc5 View commit details
    Browse the repository at this point in the history
  14. Decouple custom ops in llama_transformer.py Part 2/N (#3007)

    Summary:
    Pull Request resolved: #3007
    
    Keep llama_transformer.py to look like stock implementation, so that it can be reused everywhere.
    
    Do module swap
    
    Reviewed By: cccclai
    
    Differential Revision: D56048640
    
    fbshipit-source-id: 76de1b09b7f5d79422bb3b32bc830a9a7ecd935c
    mergennachin authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    74eb8b3 View commit details
    Browse the repository at this point in the history
  15. Update README.md (#3012)

    Summary: Pull Request resolved: #3012
    
    Reviewed By: mergennachin
    
    Differential Revision: D56074130
    
    Pulled By: jerryzh168
    
    fbshipit-source-id: 53e8a1db6ef802789469f1e5ba6c79c03a16e5e1
    jerryzh168 authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    0f379ba View commit details
    Browse the repository at this point in the history
  16. add more instructions and examples on Delegation (#2973)

    Summary:
    Pull Request resolved: #2973
    
    as title.
    
    Reviewed By: vmpuri, byjlw
    
    Differential Revision: D55988177
    
    fbshipit-source-id: 8cdc953118ecd22e8e9a809f0dd716a30a7fc117
    Gasoonjia authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    17c64a3 View commit details
    Browse the repository at this point in the history
  17. Run LlamaDemo app on AWS Device Farm (#3004)

    Summary:
    This upload the built LlamaDemo app to S3 and use them to run the test on Device Farm
    
    Pull Request resolved: #3004
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56073767
    
    Pulled By: huydhn
    
    fbshipit-source-id: 088a1af2463f035dcc8b06ec96d83162746f2df1
    huydhn authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    cd248b4 View commit details
    Browse the repository at this point in the history
  18. Remove RemoveRedundantViewCopyPass (#2464)

    Summary:
    Pull Request resolved: #2464
    
    The RemoveRedundantViewCopyPass is unnecessary and can be replaced by NormalizeViewCopyBasePass + dead code elimintation.
    
    Reviewed By: larryliu0820
    
    Differential Revision: D54866523
    
    fbshipit-source-id: 106b8c4a15cf2e68014ccc6a85027e47517195ef
    metascroy authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    c075eea View commit details
    Browse the repository at this point in the history
  19. Change tokenizer name to bpe_tokenizer and extract a base class (#3009)

    Summary:
    Pull Request resolved: #3009
    
    We want to be able to support more than 1 implementation of tokenizer. Currently `tokenizer.cpp` is adopted from `llama2.c` but we also wanted to support `Tiktoken` (will be added in next PR).
    
    This PR extract out a base class `Tokenizer` and make it extendable by different implementations.
    
    Reviewed By: mergennachin
    
    Differential Revision: D56052583
    
    fbshipit-source-id: bd9143957165211b1f600f781233b9ceff440cc1
    larryliu0820 authored and facebook-github-bot committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    21fdc4e View commit details
    Browse the repository at this point in the history

Commits on Apr 13, 2024

  1. Update README.md and add submodule update (#3029)

    Summary:
    Without the submodule update, install_requitements would not work. Add this step in the documentation's README.md
    
    Pull Request resolved: #3029
    
    Reviewed By: lucylq
    
    Differential Revision: D56087389
    
    Pulled By: iseeyuan
    
    fbshipit-source-id: fd96530b44f81b6dfcea07faccef06f6348fa373
    iseeyuan authored and facebook-github-bot committed Apr 13, 2024
    Configuration menu
    Copy the full SHA
    cd32712 View commit details
    Browse the repository at this point in the history
  2. Throw in VK_GET_OP_FN if op is not found (#3028)

    Summary:
    Pull Request resolved: #3028
    
    Make yipjustin happy. Forgot this safeguard when I originally wrote the `OperatorRegistry` class.
    
    Reviewed By: SS-JIA
    
    Differential Revision: D56085588
    
    fbshipit-source-id: ba116eab8054e3610011fd0c8ffc0aabe61ae8ea
    jorgep31415 authored and facebook-github-bot committed Apr 13, 2024
    Configuration menu
    Copy the full SHA
    4d7dd03 View commit details
    Browse the repository at this point in the history
  3. update the pinned pytorch hash (#2824)

    Summary:
    This PR is auto-generated nightly by [this action](https://github.com/pytorch/executorch/blob/main/.github/workflows/nightly.yml).
    Update the pinned pytorch hash.
    
    Pull Request resolved: #2824
    
    Reviewed By: mergennachin
    
    Differential Revision: D55814757
    
    Pulled By: guangy10
    
    fbshipit-source-id: cea55d3468ae7155906a44d038e25e53c207dcef
    pytorchupdatebot authored and facebook-github-bot committed Apr 13, 2024
    Configuration menu
    Copy the full SHA
    c095046 View commit details
    Browse the repository at this point in the history

Commits on Apr 14, 2024

  1. Apply clang-format 18

    Summary: Previously this code conformed from clang-format 12.
    
    Reviewed By: igorsugak
    
    Differential Revision: D56065247
    
    fbshipit-source-id: f5a985dd8f8b84f2f9e1818b3719b43c5a1b05b3
    zertosh authored and facebook-github-bot committed Apr 14, 2024
    Configuration menu
    Copy the full SHA
    c61ef44 View commit details
    Browse the repository at this point in the history
  2. oss: Upgrade clap, add string feature (#3035)

    Summary:
    Pull Request resolved: #3035
    
    ^
    
    Reviewed By: stepancheg
    
    Differential Revision: D56115188
    
    fbshipit-source-id: 67b1293d26adc77973a7c17808fb2d958da2d04f
    JakobDegen authored and facebook-github-bot committed Apr 14, 2024
    Configuration menu
    Copy the full SHA
    57dd7f1 View commit details
    Browse the repository at this point in the history

Commits on Apr 15, 2024

  1. Update to clang 18.1.3

    Reviewed By: zertosh
    
    Differential Revision: D56139356
    
    fbshipit-source-id: a740606db6e308ed133caa3f0756c2a53d7dce7b
    mergennachin authored and facebook-github-bot committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    057e432 View commit details
    Browse the repository at this point in the history
  2. Fix handling constant inputs when delegating (#3031)

    Summary: Pull Request resolved: #3031
    
    Reviewed By: mcr229
    
    Differential Revision: D56089279
    
    fbshipit-source-id: 15f0b621b2efbc317c25f8b75907ff6c28ac2c6d
    angelayi authored and facebook-github-bot committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    7616d42 View commit details
    Browse the repository at this point in the history
  3. Fix lint in clang-format (#3041)

    Summary:
    Pull Request resolved: #3041
    
    We are updating to clang-formatter 18.
    
    The current clang-format in coreml code has duplicate key. Deleting one of them.
    
    See context D56139356
    
    bypass-github-export-checks
    bypass-github-pytorch-ci-checks
    bypass-github-executorch-ci-checks
    
    Reviewed By: cccclai
    
    Differential Revision: D56139927
    
    fbshipit-source-id: 937f58092abd6f695304ee2a5dd38bc4b8412ec0
    mergennachin authored and facebook-github-bot committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    7c81155 View commit details
    Browse the repository at this point in the history
  4. generation.py with kv cache (#3030)

    Summary:
    python e2e generation, using tiktoken tokenizer.
    
    using text_completion, haven't tried chat_completion.
    
    Pull Request resolved: #3030
    
    Test Plan:
    Imported from GitHub, without a `Test Plan:` line.
    
    Command, with prompt "Hello, I am" and seq_len = 10
    ```
    python -m examples.models.llama2.runner.generation --pte llama_4ckpts_x.pte --tokenizer tokenizer.model --prompt="Hello I am"  --temperature=0 --params ../llama-models/llama3/params_less.json --max_gen_len=10
    ```
    
    fp32, xnn, kv
    fp32, xnn
    same results:
    ```
    Result: [{'generation': ' a 25 year old woman. I am a'}]
    ```
    
    fp32, xnn, int4
    ```
    Result: [{'generation': ' interested in the following products: - 1 x'}]
    ```
    
    fp32, xnn, kv, sdpa (need investigation)
    ```
    Result: [{'generation': 'ฉopteraenthalenthalenthalenthalenthalenthalenthalenthal'}]
    ```
    
    Reviewed By: larryliu0820
    
    Differential Revision: D56087430
    
    Pulled By: lucylq
    
    fbshipit-source-id: 31c73fe87af8646bf2512e1a6aadc8804a101719
    lucylq authored and facebook-github-bot committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    645256d View commit details
    Browse the repository at this point in the history
  5. Clean up shader library and introduce some new conventions (#3024)

    Summary:
    Pull Request resolved: #3024
    
    ## Context
    
    This changeset introduces some fairly mechnical improvements to the Vulkan compute graph shader library in order to introduce some new conventions.
    
    **Note that backwards compatibility with existing shader authoring methods is preserved**.
    
    ### Only List `VALUE` in the `.yaml` files
    
    Previously, to generate variants for a combination of vales, the YAML file will contain
    
    ```
        PACKING:
          - VALUE: CHANNELS_PACKED
            SUFFIX: C_packed
          - VALUE: WIDTH_PACKED
            SUFFIX: W_packed
          - VALUE: HEIGHT_PACKED
            SUFFIX: H_packed
    ```
    
    however, the shader code generation script will use the `VALUE` as the `SUFFIX` if no `SUFFIX` is provided.
    
    Therefore, only the below is needed:
    
    ```
        PACKING:
          - VALUE: C_packed
          - VALUE: W_packed
          - VALUE: H_packed
    ```
    
    ### Change indexing utility macros to lowercase
    
    Indexing utility macros have been changed to lowercase, and the packing identifiers have been changed due to the change in YAML files.
    
    The change to lowercase is to make calls to the macro read more like functions (and indeed they are typically used as functions) in order to help make the code more readable.
    
    ```
    POS_TO_COORD_${PACKING} -> pos_to_coord_${PACKING}
    ```
    
    ### Use convention of defining macros in order to reduce Python code blocks usage
    
    Previously python code blocks were used in the GLSL code itself in order to vary the shader between different settings. However, usage of Python code blocks negatively impact code readability. Therefore, this diff seeks to introduce a convention of defining macros near the top of the shader to reduce the usage of Python code blocks, i.e.
    
    ```
    #define pos_to_coord pos_to_coord_${PACKING}
    #define get_packed_dim get_packed_dim_${PACKING}
    #define get_packed_stride get_packed_stride_${PACKING}
    ```
    
    ### Improve GLSL type definitions
    
    Previously, the following Python code blocks were used to determine appropriate vectorized and scalar types:
    
    ```
    ${VEC4_T[DTYPE}} texel = ...
    ${T[DTYPE]} scalar = ...
    ```
    
    This changeset replaces that with:
    
    ```
    
    #define BUF_T ${buffer_scalar_type(DTYPE)}
    #define VEC4_T ${texel_type(DTYPE)}
    #define SCALAR_T ${texel_component_type(DTYPE)}
    
    layout(set = 0, binding = 1) buffer  PRECISION restrict readonly Buffer {
      BUF_T data[];
    }
    
    buffer_in;
    VEC4_T texel = ...
    SCALAR_T scalar = ...
    ```
    
    The main differences are as such:
    
    * `buffer_scalar_type()` produces the same result as `T[DTYPE]`
    * `texel_type()` is not determined from a mapping with `DTYPE`, but is determined indirectly based on the image format that is associated with the `DTYPE`.
    * `texel_component_type()` is based on the result of `texel_type(DTYPE)`
    
    Essentially, the mapping is more in-line with what happens in code.
    
    The reason for this change is to enable FP16 support and is a bit complicated. Basically, we need a way to distinguish the scalar type used for buffer storage, vs the scalar type used to store a component of a vec4 type (hence `BUF_T` vs `SCALAR_T`). The reason this is required is that to support half-precision tensors, the buffer representation will use a 16-bit float type but textures will still extract to `vec4` (i.e. 4x34bit floats).
    ghstack-source-id: 222551445
    
    Reviewed By: jorgep31415
    
    Differential Revision: D56082461
    
    fbshipit-source-id: 49fb8ff5fb0d8c48d0fadd8fd24184cc20db2147
    SS-JIA authored and facebook-github-bot committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    59023ed View commit details
    Browse the repository at this point in the history
  6. Move compile spec to ArmTester interface (#2991)

    Summary:
    * Create compile spec builder
    * Added default compile spec for unit tests
    * Cleaned up some redundant parameters
    
    Pull Request resolved: #2991
    
    Reviewed By: mergennachin
    
    Differential Revision: D56143727
    
    Pulled By: digantdesai
    
    fbshipit-source-id: c34a7f1f6f073b558cca056eeaa4c810df6e25c6
    freddan80 authored and facebook-github-bot committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    64497b7 View commit details
    Browse the repository at this point in the history
  7. remove duplicate generate_lib_aten target under aten kernel (#2951)

    Summary:
    Pull Request resolved: #2951
    
    generate_lib and generate_lib_aten are exactly the same under executorch/kernels/aten. Remove the generate_lib_aten for better understanding.
    
    Reviewed By: larryliu0820
    
    Differential Revision: D55937122
    
    fbshipit-source-id: 5e7e7c06efbd4876874880627b67934d782473a2
    Gasoonjia authored and facebook-github-bot committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    075fe40 View commit details
    Browse the repository at this point in the history
  8. native_layer_norm (for width dim) (#3001)

    Summary:
    Pull Request resolved: #3001
    
    We implement `native_layer_norm` which has 3 outputs
    - normalization of the input tensor according to the given `normalized_shape`
    - mean
    - 1/sqrt(var + eps)
    
    ```
    func: native_layer_norm(Tensor input, SymInt[] normalized_shape, Tensor? weight, Tensor? bias, float eps) -> (Tensor, Tensor, Tensor)
    ```
    
    According to SS-JIA's suggestion, a model specific implementation is more performant and preferred to a generic one. So we implemented the op in the following optimized way
    - our current use case has `normalized_shape` of len 1, namely we do the normalization through computing the mean and var at the last width dim
    - we do the computation in just one shader `native_layer_norm.glsl` without invoking the shaders to compute mean and var respectively
    - we use [Welford's online algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm) to compute mean and variance in one pass
    
    Reviewed By: SS-JIA, jorgep31415
    
    Differential Revision: D56005629
    
    fbshipit-source-id: 096c2e2f04b95f1f5c9205c4827091169771978c
    copyrightly authored and facebook-github-bot committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    74576e8 View commit details
    Browse the repository at this point in the history
  9. aten.full.default (#3013)

    Summary:
    Pull Request resolved: #3013
    
    We implement [`aten.full.default`](https://pytorch.org/docs/stable/generated/torch.full.html) which has the following signature.
    ```
    func: full(SymInt[] size, Scalar fill_value, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
    ```
    
    In order to bypass graph build error, we simply create null value for the following arg types:
    - torch.device
    - torch.dtype
    - torch.layout
    
    since they don't have any effect to our operator implementation on Vulkan. (Note that [`torch.layout`](https://pytorch.org/docs/stable/tensor_attributes.html#torch.layout) is a totally different concept from `GPUMemoryLayout` on Vulkan.)
    
    Reviewed By: jorgep31415
    
    Differential Revision: D56049674
    
    fbshipit-source-id: dc2a27b4e702829e077e874ccf697f6c4196756d
    copyrightly authored and facebook-github-bot committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    eb44e88 View commit details
    Browse the repository at this point in the history
  10. Add tiktoken (#3015)

    Summary:
    Pull Request resolved: #3015
    
    C++ implementation of Tiktoken. Added unit tests.
    
    Reviewed By: lucylq
    
    Differential Revision: D56053255
    
    fbshipit-source-id: 3d2f6e30a2a16d6311506fe17176d412fca7222e
    larryliu0820 authored and facebook-github-bot committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    49d1f02 View commit details
    Browse the repository at this point in the history
  11. Add tiktoken to eval (#3044)

    Summary: Pull Request resolved: #3044
    
    Test Plan:
    Imported from GitHub, without a `Test Plan:` line.
    
    ```
    python -m examples.models.llama2.eval_llama --pte llama3_4_ckpts_x.pte -p ../llama-models/llama3/params_less.json -t ../llama-models/llama3/tokenizer.model --max_seq_len=127 --limit 5
    wikitext: {'word_perplexity,none': 22.00035213493939, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.8289244201951567, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.8709954573378033, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
    ```
    
    Reviewed By: larryliu0820
    
    Differential Revision: D56163999
    
    Pulled By: lucylq
    
    fbshipit-source-id: db255a6e49a3e9b6db92c9f94fe9e7fcb475c924
    lucylq authored and facebook-github-bot committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    780ed25 View commit details
    Browse the repository at this point in the history

Commits on Apr 16, 2024

  1. Update pytorch commit pin to 04/15 (#3047)

    Summary: Pull Request resolved: #3047
    
    Reviewed By: lucylq
    
    Differential Revision: D56166332
    
    fbshipit-source-id: d98f2c18e63e15a78bbd5c893ef9c5aa5e1ddd5f
    larryliu0820 authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    15f141b View commit details
    Browse the repository at this point in the history
  2. Dynamic Conv1d + W2L (#2976)

    Summary:
    Pull Request resolved: #2976
    
    Conv1d uses static reshape operator, in order to convert 3d tensor to 4d tensor so xnnpack can operate using conv2d.
    
    For dynamism, reshape only accepts a single dynamic dimension, which is denoted as dynamic with a dim of 0.
    
    Reviewed By: digantdesai, kirklandsign
    
    Differential Revision: D55815092
    
    fbshipit-source-id: a3c96bc5c86c130291c1d54f8174a6ff5d25a6b8
    mcr229 authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    7b375fe View commit details
    Browse the repository at this point in the history
  3. Fix iOS build by excluding external CoreML SDK dependencies (#3043)

    Summary:
    Pull Request resolved: #3043
    
    CoreML delegate SDK integration broke the app build. Getting the SDK integration to work properly internally will require buckifying the third-party targets on which the CoreML delegate SDK itself depends (not to be confused with the third-party dependencies from ET itself). Running the `install_requirements.sh` script (CoreML's, not the generic ET one) clones a bunch of Git repos, XCode-specific tooling, and generates Protobuf headers on which their SDK integration relies.
    
    To avoid this, we simply add the `BUILD_SDK` flag and set it to false and disable building the SDK and exclude references to generated headers.
    
    Reviewed By: kirklandsign
    
    Differential Revision: D55456558
    
    fbshipit-source-id: 6ab931b39298ee0a4a4b238699c64c84952e180e
    Varun Puri authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    d0208d0 View commit details
    Browse the repository at this point in the history
  4. aten.select.int (#3033)

    Summary:
    Pull Request resolved: #3033
    
    Port over the `select.int` shaders to ET.
    
    1. Since in ET, tensor-shape reasoning happens in AOT, therefore we can simplify the c++ caller code by a lot.
    2. In this diff, we also try to use the same buffer object for passing arguments to all shaders. Not worry about perf cost, since cost difference between passing int and ivec4 is very minor.
    
    Reviewed By: SS-JIA
    
    Differential Revision: D56082483
    
    fbshipit-source-id: f3a28712714034375eb86f6f5c6b6a3e23d525e8
    yipjustin authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    458d743 View commit details
    Browse the repository at this point in the history
  5. 4b quantized embedding table operator (#3050)

    Summary:
    Pull Request resolved: #3050
    
    4b quantized embedding table operator
    
    Reviewed By: mikekgfb
    
    Differential Revision: D56123408
    
    fbshipit-source-id: 26293e2b09f93ccb8f14462de7ae0969efc7acc5
    manuelcandales authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    3b31eff View commit details
    Browse the repository at this point in the history
  6. Fix test_llama_runner by hiding tiktoken (#3055)

    Summary:
    Pull Request resolved: #3055
    
    We don't always want to build tiktoken dependencies (re2 and
    abseil) so this PR only build it if the option is on.
    
    Reviewed By: iseeyuan
    
    Differential Revision: D56178928
    
    fbshipit-source-id: 8021d1526ad6e89c929183f368c0fb25a4808b6f
    larryliu0820 authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    473c98c View commit details
    Browse the repository at this point in the history
  7. Bump Vulkan API requirement to 1.1 and enable 16 bit and 8 bit types …

    …in buffer storage (#3058)
    
    Summary:
    Pull Request resolved: #3058
    
    ## Context
    
    Enable use of explicit fp16 and int8 types in GPU storage buffers via the following extensions:
    
    * [VK_KHR_16bit_storage](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_16bit_storage.html)
    * [VK_KHR_8bit_storage](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_8bit_storage.html)
    * [VK_KHR_shader_float16_int8](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_shader_float16_int8.html)
    
    The first two enables usage of 16-bit and 8-bit types in storage buffers, while the last one enables using those types in arithmetic operations.
    
    By enabling these extensions and checking that the device supports the required features, explicit fp16 and int8 types can be used in compute shaders, as demonstrated by the added test.
    
    Vulkan 1.1 is required in order to access `vkGetPhysicalDeviceFeatures2`, which is required to query whether the device support 16bit and 8bit types. This should be a fairly straightforward version bump as Vulkan 1.1 is supported by the vast majority of Android devices.
    ghstack-source-id: 222727208
    exported-using-ghexport
    
    Reviewed By: jorgep31415
    
    Differential Revision: D56164239
    
    fbshipit-source-id: 879804567ff08201933a220c9f168f435af80019
    SS-JIA authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    d481c11 View commit details
    Browse the repository at this point in the history
  8. Enable FP16 type in operators (#3059)

    Summary:
    Pull Request resolved: #3059
    
    ## Context
    
    Enable half precision shader computation using the `GL_EXT_shader_16bit_storage` extension that was enabled in the change just below this stack.
    ghstack-source-id: 222727209
    
    Reviewed By: jorgep31415
    
    Differential Revision: D56189470
    
    fbshipit-source-id: 0eb5990651ad34e5a2ada601a0d3944dfe2ae9ea
    SS-JIA authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    ab62707 View commit details
    Browse the repository at this point in the history
  9. Fix formatting issues in executorch/test/size_test.cpp (#3065)

    Summary:
    Pull Request resolved: #3065
    
    Required for LLVM-17. Fixes a mismatch between what the format string expects and the type supplied.
    
    Reviewed By: tarun292
    
    Differential Revision: D56206887
    
    fbshipit-source-id: f52883cb43840b34b5d5b25711f73bc71979da30
    r-barnes authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    9931301 View commit details
    Browse the repository at this point in the history
  10. ETRecord ser/de handling "None" outputs and more (#3039)

    Summary:
    Pull Request resolved: #3039
    
    For the ease of communication, let me assign nicknames to the files related to this diff:
    * File A: *caffe2/torch/_export/serde/serialize.py*
    * File B: *executorch/exir/serde/serialize.py*
    * File C: *executorch/exir/serde/export_serialize.py*
    
    Recently, we noticed that error `torch._export.serde.serialize.SerializeError: Unable to deserialize output node Argument(as_none=[])` (P1210590561) was thrown from File B when deserializing ETRecord. It's possible that the error has been there since the beginning, but we've just never tested that logic path.
    
    In this diff, I made a fix on File B to resolve this particular issue. Also adding handling for "None" output case in sdk logic. ***Keep on reading if you don't think the code changes make sense:***
    
    I explored the history of file changes. In chronological order:
    1. D48258552, `deserialize_graph_output()` was copied from File A to File B, with some modifications made. The `deserialize_graph_output()` in File B overrides that in File A due to polymorphism.
    2. D52446586, File C was created by ***copying*** File A. As a result of this diff, the `deserialize_graph_output()` in File B now overrides that in File C.
    3. Also in D52446586, the `deserialize_graph_output()` in File A had some significant changes; File C got the new version of `deserialize_graph_output()`. But this diff didn't update the `deserialize_graph_output()` in File B.
    4. D55391674 added the handling for "None" outputs to File A.
    
    This diff brings (parts of) File C up-to-date with File A, and make `deserialize_graph_output()` in File B properly overrides that in File A.
    
    In the future, we should figure out how to keep File C and File A in sync. Recently, File C was broken because it didn't stay in sync with File A in D54855251 and had to be fixed by D55776877. There will be a design review session this Friday to discuss consolidating the serialization code for edge and export.
    
    Reviewed By: tarun292
    
    Differential Revision: D56091104
    
    fbshipit-source-id: 20c75ddc610c3be7ab2bb62943419d3b8b2be079
    Olivia-liu authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    89cfa73 View commit details
    Browse the repository at this point in the history
  11. Update doc-build.yml (#3045)

    Summary: Pull Request resolved: #3045
    
    Reviewed By: clee2000
    
    Differential Revision: D56201946
    
    Pulled By: svekars
    
    fbshipit-source-id: 4212c24b02a1229ff06137b0d437b4e8c5dd454e
    svekars authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    c73bfc0 View commit details
    Browse the repository at this point in the history
  12. Add int16 support to aten_bridge (#3069)

    Summary:
    Pull Request resolved: #3069
    
    Running an executorch program via pybindings requires the aten_bridge. This currently fails if the model uses the `int16` dtype. This diff adds support for the type by adding it to the conversion switch statements.
    
    Reviewed By: tarun292
    
    Differential Revision: D56199304
    
    fbshipit-source-id: 19a6815cf2885dda72febf247c3ca3bde91193a8
    Vysarat authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    eb664a0 View commit details
    Browse the repository at this point in the history
  13. fix linear recomposition (#3064)

    Summary:
    Pull Request resolved: #3064
    
    Fixes the torchat ci where we are failing with expand copy.
    
    Reviewed By: digantdesai, mikekgfb, kirklandsign
    
    Differential Revision: D56204667
    
    fbshipit-source-id: 1d648460b59785884c33cdd479eb9c4c7d452a2a
    mcr229 authored and facebook-github-bot committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    4b6d2c3 View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2024

  1. Set kernel default visibility to hidden (#3060)

    Summary:
    Pull Request resolved: #3060
    
    When we compile the kernel into a shared library, we don't know whether the definition of kernel implementation symbol can be dropped or not based on op registry. The kernel itself is just a normal function and the user can find it. We set its visibility to hidden by default. Then these kernels are gone when we do `objdump -TC`
    
    This reduces binary size.
    
     ---
    
    This is not done in fbcode so far. When we compile in fbcode, seems that all dependency libraries is compiled into shared library, not static library. For example, op tests depends on op implementation through shared library. In that case, the hidden symbols are not exposed and could cause link time failure.
    
    In xplat, these dependencies are set to static libraries so it has no impact. Only when we explicitly build a shared library (for android), we hide the symbols and rely on op registry to store the impl.
    
     ---
    
    This applies to internal build only for now. We will re-visit this for OSS later. It's a step needed to make use of selective build for building shared library (android use case mainly)
    
    Reviewed By: dbort
    
    Differential Revision: D56167833
    
    fbshipit-source-id: 98cd47836b616fc33dbc9af284d9e758b242b3a3
    kirklandsign authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    54f9f3e View commit details
    Browse the repository at this point in the history
  2. Fix Android llama2 demo app after #2962 (#3032)

    Summary:
    This fixes the issue when the demo Android app fails to load llama2 model and returns an exit code 20.
    
    As this failure can be captured by running the instrumentation test suite on Android devices, I also add the test spec that I'm using there for future reference.
    
    ### Testing
    
    https://github.com/pytorch/executorch/actions/runs/8682469360/job/23808274556?pr=3032#step:12:80 loads the model successfully and shows the observed TSP now
    
    Pull Request resolved: #3032
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56124177
    
    Pulled By: huydhn
    
    fbshipit-source-id: 7cc3987d186e670143f2ca739d29f02649091ec2
    huydhn authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    9b55f48 View commit details
    Browse the repository at this point in the history
  3. Update doc-build.yml (#3071)

    Summary:
    Move noindex logic to the build job
    
    Pull Request resolved: #3071
    
    Reviewed By: clee2000
    
    Differential Revision: D56218857
    
    Pulled By: svekars
    
    fbshipit-source-id: 69dff489d98eee046d69185a6c03d62fbae37a16
    svekars authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    5d7949d View commit details
    Browse the repository at this point in the history
  4. Handle empty (size=0) tensor in Inspector (#2998)

    Summary:
    Pull Request resolved: #2998
    
    Empty tensors are not handled so they throw errors.
     {F1484412951}
    
    Reviewed By: tarun292
    
    Differential Revision: D56027102
    
    fbshipit-source-id: a8dab52d9ba7eb0784a72493e9888cf63aefbb76
    Olivia-liu authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    f14dc83 View commit details
    Browse the repository at this point in the history
  5. Add quantized op support to llama runner (#3062)

    Summary: Pull Request resolved: #3062
    
    Reviewed By: lucylq, mikekgfb
    
    Differential Revision: D56197863
    
    fbshipit-source-id: c564a99d10be70fb69e554687bd506d8ff13268e
    larryliu0820 authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    1f4b631 View commit details
    Browse the repository at this point in the history
  6. {executorch][llama] support mqa (#3080)

    Summary:
    Pull Request resolved: #3080
    
    This diff adds support for multi query attention for sdpa with kv cache
    
    bypass-github-export-checks
    
    Reviewed By: mikekgfb
    
    Differential Revision: D56228316
    
    fbshipit-source-id: 29fdf78acf841b651476a39068940b616f076991
    kimishpatel authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    bae0387 View commit details
    Browse the repository at this point in the history
  7. Load missing state dict in edge program serialization (#3076)

    Summary:
    Pull Request resolved: #3076
    
    The state dict wasn't being passed in when ExportedProgram was being created after deserialization.
    
    Reviewed By: pssrawat
    
    Differential Revision: D56224054
    
    fbshipit-source-id: 7c3f74999994b23616e626d7b9d68d1a9eeab0ae
    tarun292 authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    22dfc6a View commit details
    Browse the repository at this point in the history
  8. Remove noindex from upload to gh-pages job (#3077)

    Summary:
    Pull Request resolved: #3077
    
    For some reason this wasn't removed in previous PR.
    
    Reviewed By: clee2000
    
    Differential Revision: D56225136
    
    fbshipit-source-id: bb18c5f36fd443dc01c2127d361911625be8352a
    svekars authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    65f2693 View commit details
    Browse the repository at this point in the history
  9. forward fix ConstantArgument initialization (#3074)

    Summary:
    Pull Request resolved: #3074
    
    following up on https://www.internalfb.com/diff/D55506949 breaking an executorch call.
    
    Reviewed By: angelayi
    
    Differential Revision: D56220174
    
    fbshipit-source-id: 041614c888ce2e55c08717d7da1430d4f787b816
    pianpwk authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    ebde8e1 View commit details
    Browse the repository at this point in the history
  10. Fix llama2 README.md cmake instructions (#3096)

    Summary:
    Pull Request resolved: #3096
    
    As titled. The current instruction runs into issue due to our way of arranging `pthreadpool` and `cpuinfo` in CMake. Will need a bigger effort to clean them up. For now let's update the instruction to be able to run it.
    
    Reviewed By: mergennachin
    
    Differential Revision: D56251563
    
    fbshipit-source-id: daf0b1ecb75abb90612efbd64108edc99a129efd
    larryliu0820 authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    980aaca View commit details
    Browse the repository at this point in the history
  11. Fix build time warning (#3097)

    Summary:
    Pull Request resolved: #3097
    
    tensor.data_ptr() is deprecated. To avoid warning change it to tensor.const_data_ptr()
    
    Reviewed By: mergennachin
    
    Differential Revision: D56251975
    
    fbshipit-source-id: c984ba33600c94da78a85060be5699042b12e83e
    larryliu0820 authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    5f9478d View commit details
    Browse the repository at this point in the history
  12. change call_delegate_autograd (#3073)

    Summary:
    Pull Request resolved: #3073
    
    Some changes angela told me to make 😂
    
    Reviewed By: angelayi
    
    Differential Revision: D56222503
    
    fbshipit-source-id: ab1e5194492df439effab550781f056d12eaba53
    mcr229 authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    20bf0db View commit details
    Browse the repository at this point in the history
  13. remove exir.capture from dynamic_shape_propogation test (#3070)

    Summary:
    Pull Request resolved: #3070
    
    title
    
    Reviewed By: mergennachin
    
    Differential Revision: D56216416
    
    fbshipit-source-id: 3ae317e3c2a8765ca3c2c460178526b0af4fb6ba
    JacobSzwejbka authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    73438a5 View commit details
    Browse the repository at this point in the history
  14. Create __init__.py in example folder (#3093)

    Summary:
    For my internal CentOS development env, `python -m examples/models/...` does not work with error message that module cannot be found. Adding this empty file fixes the issue.
    
    Pull Request resolved: #3093
    
    Reviewed By: cccclai
    
    Differential Revision: D56242802
    
    Pulled By: iseeyuan
    
    fbshipit-source-id: 33b98855682490ed1242b1cd2843e7963831915a
    iseeyuan authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    f729b2d View commit details
    Browse the repository at this point in the history
  15. move mask as sdpa input instead of attribute (#3036)

    Summary:
    Pull Request resolved: #3036
    
    sdpa (https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) input is taking attention mask as input, refactor the sdpa module input closer to the sdpa input
    ghstack-source-id: 222650466
    exported-using-ghexport
    
    Reviewed By: mergennachin
    
    Differential Revision: D56119739
    
    fbshipit-source-id: d9adda66e540abc518b7ffb6a5ebd2aab1626b3b
    cccclai authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    b341223 View commit details
    Browse the repository at this point in the history
  16. remove exir.capture from test_rpc.py (#3102)

    Summary:
    Pull Request resolved: #3102
    
    title
    
    Reviewed By: tarun292
    
    Differential Revision: D56259168
    
    fbshipit-source-id: be80eeb616d6634c563ff3f1746cc6dc4aad0b6a
    JacobSzwejbka authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    7e14c0e View commit details
    Browse the repository at this point in the history
  17. Introduce SpecVarList to represent specialization constants (#3078)

    Summary:
    Pull Request resolved: #3078
    
    ## Context
    
    Specialization constants are a useful tool to compile compute shaders with constants defined at runtime. The primary application of specialization constants is to define variables which may have an impact on how the code is compiled, for example:
    
    * the number of elements of an array
    * the range of a loop
    
    Compared to the shader codegen system, which produces a complete copy of the shader and for which variants must be defined at build time, specialization constants can be defined at runtime when the compute pipeline is built.
    
    Specialization constants are currently used to define local work group sizes in Vulkan, but the Compute API hard-codes the number of specialization constants accepted by the shader to 3.
    
    This changeset introduces the `SpecVar` and `SpecVarList` classes to manage specialization constants and enable additional specialization constants to be specified.
    ghstack-source-id: 222903462
    exported-using-ghexport
    
    Reviewed By: copyrightly, jorgep31415
    
    Differential Revision: D56225041
    
    fbshipit-source-id: 88c94c09e380793c75edcb0a92c2987fac882431
    SS-JIA authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    0815c2b View commit details
    Browse the repository at this point in the history
  18. Enable additional specialization constants in compute shaders (#3079)

    Summary:
    Pull Request resolved: #3079
    
    ## Context
    
    Building on top of the previous changeset in the stack, this changeset modifies shader dispatch APIs to accept additional specialization constants for a shader.
    ghstack-source-id: 222903463
    
    Reviewed By: copyrightly, jorgep31415
    
    Differential Revision: D56225042
    
    fbshipit-source-id: 154c51f927116e4a658f224794ec354151398a8a
    SS-JIA authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    78cb141 View commit details
    Browse the repository at this point in the history
  19. select_copy.int (#3085)

    Summary:
    Pull Request resolved: #3085
    
    equivalent to `select.int`
    ghstack-source-id: 222407935
    exported-using-ghexport
    
    Reviewed By: SS-JIA
    
    Differential Revision: D56092143
    
    fbshipit-source-id: 2959069d87cef6f08aa0960e2f10a9416eb4109d
    yipjustin authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    49928bc View commit details
    Browse the repository at this point in the history
  20. aten.permute_copy.default (#3086)

    Summary:
    Pull Request resolved: #3086
    
    Implementation adopted from LI, with clean-up.
    ghstack-source-id: 222906934
    
    Reviewed By: copyrightly
    
    Differential Revision: D56093765
    
    fbshipit-source-id: 0ed78ae06e5b106a92cf3c1fdc85179f1e829919
    yipjustin authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    de00717 View commit details
    Browse the repository at this point in the history
  21. Improve codegen for aten.permute (#3087)

    Summary:
    Pull Request resolved: #3087
    
    In the generated code, it uses CPU as reference implementation.
    
    Tricky part happens when CPU modify the stride for some indexing operations like `permute`, leading the return Tensor with a non-continous stride.
    
    When we create a `vk_out` tensor based on this non-continous tensor with `at::empty_like`, the `vk_out` tensor inherits the stride property. Leading to wrong answer when moving data back from staging.
    
    As a solution, we add `.continous()` to after `at::empty_like` to revert back to default stride.
    ghstack-source-id: 222417364
    
    Reviewed By: SS-JIA
    
    Differential Revision: D56095204
    
    fbshipit-source-id: d42777ec876e47465c892331b5f854203c9fb8ef
    yipjustin authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    28be9d6 View commit details
    Browse the repository at this point in the history
  22. make_seq_tensor in codegen (#3088)

    Summary:
    Pull Request resolved: #3088
    
    increasing sequence is very useful for development, particularly for "slicing" and "indexing" operations.
    ghstack-source-id: 222827546
    
    Reviewed By: SS-JIA
    
    Differential Revision: D56095314
    
    fbshipit-source-id: 1491bb2399581eb303472b572c74c070c833d654
    yipjustin authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    5fbd1f4 View commit details
    Browse the repository at this point in the history
  23. remove exir.capture from quant fusion test (#3106)

    Summary:
    Pull Request resolved: #3106
    
    title
    
    Reviewed By: jerryzh168
    
    Differential Revision: D56264730
    
    fbshipit-source-id: c434d3f9891063319fe78e52dfbc1b60b0c7e195
    JacobSzwejbka authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    cca9f65 View commit details
    Browse the repository at this point in the history
  24. Don't crash when execute_method fails (#3104)

    Summary:
    Pull Request resolved: #3104
    
    Currently, we hard crash the process when execute_method failed, and it's not catchable. Instead, we should return null to Java, so they can handle.
    
    Reviewed By: shoumikhin, cccclai
    
    Differential Revision: D56260831
    
    fbshipit-source-id: 281aa53985e021e803444ea5ee1c89a1e4b66e6b
    kirklandsign authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    9c2b41b View commit details
    Browse the repository at this point in the history
  25. update readme to not use exir.capture (#3107)

    Summary:
    Pull Request resolved: #3107
    
    title
    
    Reviewed By: angelayi
    
    Differential Revision: D56265239
    
    fbshipit-source-id: 3d2ed83bea645824819828a0a384970a736a688c
    JacobSzwejbka authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    b3ac533 View commit details
    Browse the repository at this point in the history
  26. remove exir.capture from example delegate test (#3101)

    Summary:
    Pull Request resolved: #3101
    
    title
    
    Reviewed By: cccclai
    
    Differential Revision: D56258614
    
    fbshipit-source-id: 1f5d3a57926be2c54eba7d4f9df6d50f31fdbc63
    JacobSzwejbka authored and facebook-github-bot committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    203ae40 View commit details
    Browse the repository at this point in the history

Commits on Apr 18, 2024

  1. throw Java exception when execution fails (#3112)

    Summary:
    Pull Request resolved: #3112
    
    Instead of logging, we throw a java exception and let user catch it.
    
    Reviewed By: dbcakadbc
    
    Differential Revision: D56270287
    
    fbshipit-source-id: 9c581fb384c671ca14d2a4a8946654569ae953a6
    kirklandsign authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    b19d586 View commit details
    Browse the repository at this point in the history
  2. Handle missing data types. (#2984)

    Summary:
    **Changes**
    - The runtime was failing if it encountered a datatype not supported by CoreML framework. The changes add support for all the datatypes that are supported by coremltools basically if `CoreMLBackend` can export a model then runtime would execute it. Complex types are not supported because `coremltools` doesn't support it.
    
    - Improves and cleans the multiarray copying code.
    
    - Adds portable ops to CoreML executor so that it can run a partitioned model.
    
    **Testing**
    - Tested partitioned model `coreml_stories.pte`
    - Added multiarray copying tests.
    
    Pull Request resolved: #2984
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56003795
    
    Pulled By: shoumikhin
    
    fbshipit-source-id: fa1c7846f9510d68c359aed6761aedb2d10c6f46
    cymbalrush authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    d731866 View commit details
    Browse the repository at this point in the history
  3. Documentation for Vulkan Delegate (#3113)

    Summary:
    Pull Request resolved: #3113
    
    imported-using-ghimport
    
    Test Plan: Imported from OSS
    
    Reviewed By: cccclai
    
    Differential Revision: D56279743
    
    Pulled By: SS-JIA
    
    fbshipit-source-id: af55cdf2d8518c582b7d8deccb731c6bc442a1c9
    SS-JIA authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    414cd05 View commit details
    Browse the repository at this point in the history
  4. fix embedding_4bit resize (#3118)

    Summary: Pull Request resolved: #3118
    
    Reviewed By: larryliu0820
    
    Differential Revision: D56282683
    
    fbshipit-source-id: fa1f255bcc82929efeeeb1de1f259682bc11d8e5
    manuelcandales authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    910f851 View commit details
    Browse the repository at this point in the history
  5. Delete llama_quantized lib (#3119)

    Summary:
    Pull Request resolved: #3119
    
    Delete llama_quantized lib, and move embedding_byte.dtype to exir pass
    
    Reviewed By: manuelcandales, mikekgfb
    
    Differential Revision: D56206703
    
    fbshipit-source-id: 629a3c7c2d981a212dfb619ac9106ba9bf478b62
    larryliu0820 authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    6510625 View commit details
    Browse the repository at this point in the history
  6. Add quantized cmake option back to fix build-apple-framework (#3115)

    Summary:
    As titled. Got too excited in #3062 and removed `EXECUTORCH_BUILD_QUANTIZED`. Looking at the CI job failure of `build-apple-framework` probably worth adding it back.
    
    Pull Request resolved: #3115
    
    Test Plan: See that CI job pass
    
    Reviewed By: shoumikhin
    
    Differential Revision: D56281923
    
    Pulled By: larryliu0820
    
    fbshipit-source-id: e6ad411f763ff8e11d4fb1e0bc7037eb2cf69357
    larryliu0820 authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    eb47c4e View commit details
    Browse the repository at this point in the history
  7. Fix typo in sub & clean up (#3100)

    Summary: Pull Request resolved: #3100
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56255838
    
    fbshipit-source-id: b6567320b557aeb287db66b43447db9caabebd13
    manuelcandales authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    e69a662 View commit details
    Browse the repository at this point in the history
  8. Free Vulkan delegate segments after compileModel (#3116)

    Summary:
    Pull Request resolved: #3116
    
    It's been a while since I had an impactful one-liner. :)
    
    Nothing innovative here, just reusing the same solution as [other backends](https://github.com/pytorch/executorch/blob/b19d5860568187f2567d93dd5e7cd5af32378d9f/backends/xnnpack/runtime/XNNPACKBackend.cpp#L47-L48).
    
    Reviewed By: yipjustin, copyrightly, SS-JIA
    
    Differential Revision: D56281665
    
    fbshipit-source-id: 6b4c9d25ef085a394bcd2904903fff680b4f1794
    jorgep31415 authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    e0b0647 View commit details
    Browse the repository at this point in the history
  9. Define embedding_4bit (#3121)

    Summary: Pull Request resolved: #3121
    
    Reviewed By: larryliu0820
    
    Differential Revision: D56246346
    
    fbshipit-source-id: ccf8c7ca0569a8c6381b54640dcf39adc2568773
    manuelcandales authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    4c552d4 View commit details
    Browse the repository at this point in the history
  10. cherry-pick: Add required deps to pyproject.toml (#3117)

    Summary:
    Cherry-pick 28f1c8c from release/0.2 into main
    
    These pip dependencies need to be present to build the pip wheel.
    
    Also, change the version to a stub that looks less like a real version,
    until we can hook up the logic to get the version from the git repo
    state.
    
    Pull Request resolved: #3117
    
    Test Plan: Ran `./install_requirements.sh` in a new conda environment on my mac M1, and it built/installed the pip package successfully.
    
    Reviewed By: tugsbayasgalan
    
    Differential Revision: D56282487
    
    Pulled By: dbort
    
    fbshipit-source-id: 81e575957ca4d1262eecb4dd5b480a88942371f6
    dbort authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    f2e660b View commit details
    Browse the repository at this point in the history
  11. Preserve modelname (#3122)

    Summary: Pull Request resolved: #3122
    
    Reviewed By: mikekgfb
    
    Differential Revision: D56212361
    
    fbshipit-source-id: 877f2d3d8b2c078e21b0ababdfbc4e447cd86374
    manuelcandales authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    29faa2e View commit details
    Browse the repository at this point in the history
  12. fix llama-runner-linux-android (#3127)

    Summary: Pull Request resolved: #3127
    
    Reviewed By: larryliu0820, kirklandsign
    
    Differential Revision: D56306284
    
    fbshipit-source-id: cb092c358cb2db021a368027e4efd78593bec9b4
    manuelcandales authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    ab02a9c View commit details
    Browse the repository at this point in the history
  13. Buck build - fix use_tiktoken config

    Summary:
    Make it work
    
    bypass-github-export-checks
    
    Reviewed By: larryliu0820
    
    Differential Revision: D56287998
    
    fbshipit-source-id: 02b92c8110f7ea72055edd4c194858cc71b49093
    digantdesai authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    8d25288 View commit details
    Browse the repository at this point in the history
  14. delete exir/experimental (#3109)

    Summary:
    Pull Request resolved: #3109
    
    unused so deleting
    
    Reviewed By: angelayi
    
    Differential Revision: D56271249
    
    fbshipit-source-id: 79b624a0b45684ead4e89a410fc1e2267b5ad2a9
    JacobSzwejbka authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    944dd4c View commit details
    Browse the repository at this point in the history
  15. 4b embedding quantizer (#3135)

    Summary:
    Pull Request resolved: #3135
    
    4b embedding quantizer
    
    Reviewed By: larryliu0820
    
    Differential Revision: D56229021
    
    fbshipit-source-id: 560911333b173b4d03c3c62769e6db4e2ab54c7b
    manuelcandales authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    8fd92bc View commit details
    Browse the repository at this point in the history
  16. Update README.md (#3094)

    Summary:
    Fix Android adb shell quotes. Tested prompt quote escapes locally.
    
    Pull Request resolved: #3094
    
    Reviewed By: mergennachin
    
    Differential Revision: D56318301
    
    Pulled By: digantdesai
    
    fbshipit-source-id: f9bf1b62a905006a8b440c57cf0bc29510a30637
    digantdesai authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    74204f4 View commit details
    Browse the repository at this point in the history
  17. Adding Gotchas in README.md (#3138)

    Summary:
    Pull Request resolved: #3138
    
    Populating based on feedback from George from Arm
    
    Created from CodeHub with https://fburl.com/edit-in-codehub
    
    bypass-github-export-checks
    bypass-github-pytorch-ci-checks
    bypass-github-executorch-ci-checks
    
    Reviewed By: digantdesai
    
    Differential Revision: D56319098
    
    fbshipit-source-id: 6c15ef3c2cb3857b58c21d7b58a0cdf36077ee9d
    mergennachin authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    02ec589 View commit details
    Browse the repository at this point in the history
  18. Update README.md for llama3 (#3141)

    Summary: Pull Request resolved: #3141
    
    Reviewed By: mergennachin
    
    Differential Revision: D56324924
    
    Pulled By: orionr
    
    fbshipit-source-id: 7d3f2a7abec560d9d5cbeb767ce3f701b7db7e73
    iseeyuan authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    523c2cb View commit details
    Browse the repository at this point in the history
  19. aten.view_copy (#3129)

    Summary:
    Pull Request resolved: #3129
    
    aten.view_copy, supporting all packing.
    
    Using SS-JIA's idea to do a direct lookup.
    ghstack-source-id: 223111187
    
    Reviewed By: SS-JIA
    
    Differential Revision: D56281400
    
    fbshipit-source-id: 355493fc18c015523672665e7c1c37a4c92debdd
    yipjustin authored and facebook-github-bot committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    1eed125 View commit details
    Browse the repository at this point in the history

Commits on Apr 19, 2024

  1. Update README.md on the evaluation parameters (#3139)

    Summary:
    It's not clear how we got the perplexity numbers. Add the parameters we used to get those numbers.
    
    Pull Request resolved: #3139
    
    Reviewed By: lucylq
    
    Differential Revision: D56319905
    
    Pulled By: iseeyuan
    
    fbshipit-source-id: dc387cc84c2fe7a21e44642ff591000fd6728abb
    iseeyuan authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    06beace View commit details
    Browse the repository at this point in the history
  2. Add reference to the llama2 example for llama3 (#3142)

    Summary:
    In conjunction with iseeyuan's changes, add an `examples/models/llama3/README.md` just in case people are looking for a Llama 3 folder in examples.
    
    Pull Request resolved: #3142
    
    Reviewed By: mikekgfb
    
    Differential Revision: D56337484
    
    Pulled By: orionr
    
    fbshipit-source-id: 0e122b2bbaa3bdcd95c83ed45a28b96cc0b24ba7
    orionr authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    3db0362 View commit details
    Browse the repository at this point in the history
  3. Update Llama3 perplexity numbers in README.md (#3145)

    Summary:
    Update Llama3 perplexity numbers in README.md, with 4-bit quantization with different group sizes.
    
    Pull Request resolved: #3145
    
    Reviewed By: orionr
    
    Differential Revision: D56338045
    
    Pulled By: iseeyuan
    
    fbshipit-source-id: 74d06da50758c82cc0efb899d134b52423cc3ec6
    iseeyuan authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    060d151 View commit details
    Browse the repository at this point in the history
  4. add cpu device to run eval on cpu (#3133)

    Summary:
    Pull Request resolved: #3133
    
    `HFLM` from `lm_eval` can take cpu device. https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/huggingface.py#L95
    
    Currently running `eval_llama` fails on cpu
    
    Reviewed By: lucylq
    
    Differential Revision: D56313161
    
    fbshipit-source-id: ceb5e650c3d31b9f1a96583d0396264bbf16a102
    cccclai authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    b5085aa View commit details
    Browse the repository at this point in the history
  5. Add a simple sdpa (#3037)

    Summary:
    Pull Request resolved: #3037
    
    Add a simple sdpa so it's decomposed to simpler ops instead of the decompose F.scaled_dot_product_attention, which includes 29 ops including `torch.where`
    ```
    def forward(self, q, k, v):
        aten_mul_scalar = executorch_exir_dialects_edge__ops_aten_mul_Scalar(q, 0.5946035575013605);  q = None
        aten_full_default = executorch_exir_dialects_edge__ops_aten_full_default([8, 8], True, dtype = torch.bool, layout = torch.strided, device = device(type='cpu'), pin_memory = False)
        aten_arange_start_step = executorch_exir_dialects_edge__ops_aten_arange_start_step(0, 8, layout = torch.strided, device = device(type='cpu'), pin_memory = False)
        aten_unsqueeze_copy_default = executorch_exir_dialects_edge__ops_aten_unsqueeze_copy_default(aten_arange_start_step, -2);  aten_arange_start_step = None
        aten_arange_start_step_1 = executorch_exir_dialects_edge__ops_aten_arange_start_step(0, 8, layout = torch.strided, device = device(type='cpu'), pin_memory = False)
        aten_unsqueeze_copy_default_1 = executorch_exir_dialects_edge__ops_aten_unsqueeze_copy_default(aten_arange_start_step_1, -1);  aten_arange_start_step_1 = None
        aten_sub_tensor = executorch_exir_dialects_edge__ops_aten_sub_Tensor(aten_unsqueeze_copy_default, aten_unsqueeze_copy_default_1);  aten_unsqueeze_copy_default = aten_unsqueeze_copy_default_1 = None
        aten_le_scalar = executorch_exir_dialects_edge__ops_aten_le_Scalar(aten_sub_tensor, 0);  aten_sub_tensor = None
        aten_logical_and_default = executorch_exir_dialects_edge__ops_aten_logical_and_default(aten_le_scalar, aten_full_default);  aten_le_scalar = aten_full_default = None
        aten_full_like_default = executorch_exir_dialects_edge__ops_aten_full_like_default(aten_logical_and_default, 0, dtype = torch.float32, pin_memory = False, memory_format = torch.preserve_format)
        aten_logical_not_default = executorch_exir_dialects_edge__ops_aten_logical_not_default(aten_logical_and_default);  aten_logical_and_default = None
        aten_scalar_tensor_default = executorch_exir_dialects_edge__ops_aten_scalar_tensor_default(-inf, dtype = torch.float32, layout = torch.strided, device = device(type='cpu'))
        aten_where_self = executorch_exir_dialects_edge__ops_aten_where_self(aten_logical_not_default, aten_scalar_tensor_default, aten_full_like_default);  aten_logical_not_default = aten_scalar_tensor_default = aten_full_like_default = None
        aten_permute_copy_default = executorch_exir_dialects_edge__ops_aten_permute_copy_default(k, [0, 1, 3, 2]);  k = None
        aten_mul_scalar_1 = executorch_exir_dialects_edge__ops_aten_mul_Scalar(aten_permute_copy_default, 0.5946035575013605);  aten_permute_copy_default = None
        aten_expand_copy_default = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten_mul_scalar, [1, 1, 8, 8]);  aten_mul_scalar = None
        aten_view_copy_default = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default, [1, 8, 8]);  aten_expand_copy_default = None
        aten_expand_copy_default_1 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten_mul_scalar_1, [1, 1, 8, 8]);  aten_mul_scalar_1 = None
        aten_view_copy_default_1 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_1, [1, 8, 8]);  aten_expand_copy_default_1 = None
        aten_bmm_default = executorch_exir_dialects_edge__ops_aten_bmm_default(aten_view_copy_default, aten_view_copy_default_1);  aten_view_copy_default = aten_view_copy_default_1 = None
        aten_view_copy_default_2 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_bmm_default, [1, 1, 8, 8]);  aten_bmm_default = None
        aten_add_tensor = executorch_exir_dialects_edge__ops_aten_add_Tensor(aten_view_copy_default_2, aten_where_self);  aten_view_copy_default_2 = aten_where_self = None
        aten__softmax_default = executorch_exir_dialects_edge__ops_aten__softmax_default(aten_add_tensor, -1, False);  aten_add_tensor = None
        aten_expand_copy_default_2 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(aten__softmax_default, [1, 1, 8, 8]);  aten__softmax_default = None
        aten_view_copy_default_3 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_2, [1, 8, 8]);  aten_expand_copy_default_2 = None
        aten_expand_copy_default_3 = executorch_exir_dialects_edge__ops_aten_expand_copy_default(v, [1, 1, 8, 8]);  v = None
        aten_view_copy_default_4 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_expand_copy_default_3, [1, 8, 8]);  aten_expand_copy_default_3 = None
        aten_bmm_default_1 = executorch_exir_dialects_edge__ops_aten_bmm_default(aten_view_copy_default_3, aten_view_copy_default_4);  aten_view_copy_default_3 = aten_view_copy_default_4 = None
        aten_view_copy_default_5 = executorch_exir_dialects_edge__ops_aten_view_copy_default(aten_bmm_default_1, [1, 1, 8, 8]);  aten_bmm_default_1 = None
        return (aten_view_copy_default_5,)
    ```
    After applying the diff, we remove the following ops
    ```
        %aten_full_like_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.full_like.default](args = (%aten_index_tensor_2, 0), kwargs = {dtype: torch.float32, pin_memory: False, memory_format: torch.preserve_format})
    
        %aten_logical_not_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.logical_not.default](args = (%aten_index_tensor_2,), kwargs = {})
    
        %aten_scalar_tensor_default : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.scalar_tensor.default](args = (-inf,), kwargs = {dtype: torch.float32, layout: torch.strided, device: cpu})
    
        %aten_where_self : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.where.self](args = (%aten_logical_not_default, %aten_scalar_tensor_default, %aten_full_like_default), kwargs = {})
    
        %aten_mul_scalar : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Scalar](args = (%aten_permute_copy_default_3, 0.5946035575013605), kwargs = {})
        ...
        %aten_mul_scalar_1 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Scalar](args = (%aten_permute_copy_default_6, 0.5946035575013605), kwargs = {})
    ```
    but introduce an add
        %aten_add_tensor_3 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.add.Tensor](args = (%aten_mul_tensor_11, %aten_index_tensor_2), kwargs = {})
    ```
    ghstack-source-id: 223152096
    exported-using-ghexport
    
    Reviewed By: mergennachin, kimishpatel
    
    Differential Revision: D56119737
    
    fbshipit-source-id: ec8e875f0a4c4ec67b7493e4872c9a5b081e6de7
    cccclai authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    cf78107 View commit details
    Browse the repository at this point in the history
  6. Fix quantized embedding export logic (#3095)

    Summary:
    Add patches to make 4bit quantized embedding work for export. Fixed:
    * Schema mismatch between functional embedding_4bit and out variant
    * Set `packed=True` for 4bit quantization
    
    Pull Request resolved: #3095
    
    Reviewed By: mikekgfb
    
    Differential Revision: D56340670
    
    Pulled By: larryliu0820
    
    fbshipit-source-id: c98623a9b7633fc5a6c390be1557213c719fa95a
    larryliu0820 authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    2c467dd View commit details
    Browse the repository at this point in the history
  7. Comply llama2 runner with gcc 11.4 (#3140)

    Summary:
    Pull Request resolved: #3140
    
    This seems like a simple change so that it can compile with gcc 11.4
    
    bypass-github-export-checks
    bypass-github-pytorch-ci-checks
    bypass-github-executorch-ci-checks
    
    Reviewed By: digantdesai
    
    Differential Revision: D56320381
    
    fbshipit-source-id: 577a60bac78ed01ad450fcb58dbccc7f04fd5067
    mergennachin authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    74dba6e View commit details
    Browse the repository at this point in the history
  8. qnn end to end flow for stories model (#3038)

    Summary:
    Pull Request resolved: #3038
    
    Patch a few changes including:
    - support bool tensor type
    - support fp16 and fix the 8w8a quantization.
    - add two non-supported ops (slice_scatter and index_put) in common_defs.py
    
    stories model working end to end:
    AOT:
    fp16:
    ```
    python -m examples.models.llama2.export_llama -kv --qnn -c stories110M.pt -p params.json
    ```
    quantize:
    ```
    python -m examples.models.llama2.export_llama -kv --qnn --pt2e_quantize qnn_8a8w -c stories110M.pt -p params.json
    ```
    
    Runtime:
    ```
    /llama_main --model_path=llama2_fp16_qnn_2.21.pte  --tokenizer_path=tokenizer.bin --prompt="Once"
    ```
    Output:
    ```
    Once upon a time, there was a little girl named Lily. She loved to play outside and explore the world around her. One day, she went on a walk with her mommy and they found a beautiful landscape with lots of trees and flowers.
    Lily said, "Mommy, this place is so pretty! Can we take a picture?"
    Mommy replied, "Of course, Lily! Let's take a picture to remember the original place we found."
    After they took the picture, they continued their walk and saw a bird flying in the sky. Lily said, "MomPyTorchObserver {"prompt_tokens":2,"generated_tokens":125,"model_load_start_ms":1713226585936,"model_load_end_ms":1713226586909,"inference_start_ms":1713226586909,"inference_end_ms":1713226590363,"prompt_eval_end_ms":1713226586966,"first_token_ms":1713226586994,"aggregate_sampling_time_ms":23,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
    I 00:00:04.436699 executorch:runner.cpp:414] 	Prompt Tokens: 2    Generated Tokens: 125
    I 00:00:04.436703 executorch:runner.cpp:420] 	Model Load Time:		0.973000 (seconds)
    I 00:00:04.436732 executorch:runner.cpp:430] 	Total inference time:		3.454000 (seconds)		 Rate: 	36.189925 (tokens/second)
    I 00:00:04.436735 executorch:runner.cpp:438] 		Prompt evaluation:	0.057000 (seconds)		 Rate: 	35.087719 (tokens/second)
    I 00:00:04.436739 executorch:runner.cpp:449] 		Generated 125 tokens:	3.397000 (seconds)		 Rate: 	36.797174 (tokens/second)
    I 00:00:04.436742 executorch:runner.cpp:457] 	Time to first generated token:	0.085000 (seconds)
    I 00:00:04.436744 executorch:runner.cpp:464] 	Sampling time over 127 tokens:	0.023000 (seconds)
    [INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
    [INFO] [Qnn ExecuTorch]: Destroy Qnn context
    ```
    
    Stories model is too small and sensitive to qunatization.
    ghstack-source-id: 223199545
    exported-using-ghexport
    
    Reviewed By: mergennachin, kirklandsign
    
    Differential Revision: D56119738
    
    fbshipit-source-id: daf5563fe51a677f302e09ae8a9fb80e6bda72c5
    cccclai authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    3257c66 View commit details
    Browse the repository at this point in the history
  9. Instructions for Llama3 (#3154)

    Summary:
    Pull Request resolved: #3154
    
    All the steps until validating on desktop.
    
    Reviewed By: iseeyuan
    
    Differential Revision: D56358723
    
    fbshipit-source-id: 32d246882d9609840932a7da22c2e3dbf015c0a8
    mergennachin authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    ceae80a View commit details
    Browse the repository at this point in the history
  10. Fix embedding_4bit out variant (#3151)

    Summary:
    Pull Request resolved: #3151
    
    In  #3095 there's an issue with the embedding_4bit schema which causes mismatch between functional and out variant. P1217884556
    
    Reviewed By: mergennachin, digantdesai
    
    Differential Revision: D56357762
    
    fbshipit-source-id: e8a1c249a02bfb4db295a1a933a8b3054e11099a
    larryliu0820 authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    269b6ad View commit details
    Browse the repository at this point in the history
  11. Add link to llama3 README file (#3156)

    Summary:
    Pull Request resolved: #3156
    
    bypass-github-export-checks
    bypass-github-pytorch-ci-checks
    bypass-github-executorch-ci-checks
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56362041
    
    fbshipit-source-id: 472dd9864a26f2b8744673163a8cd2cea58cc8e7
    mergennachin authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    fa433cb View commit details
    Browse the repository at this point in the history
  12. make op_split_with_sizes_copy support dynamic shape (#3152)

    Summary:
    Pull Request resolved: #3152
    
    as title
    
    Reviewed By: SS-JIA
    
    Differential Revision: D56333587
    
    fbshipit-source-id: deecbb2a394257dc146dd1af50cc0e7158ac79ed
    Gasoonjia authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    bd07c75 View commit details
    Browse the repository at this point in the history
  13. Call destructor explicitly when move constructing Value (#3148)

    Summary:
    Pull Request resolved: #3148
    
    ## Context
    
    Inspecting code for ATen and ExecuTorch's `Value` classes (e.g. `IValue` and `EValue` respectively) I noticed that the destructor is called [explicitly when move constructing with non-trivial types](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/core/ivalue.h#L409). In practice I don't think calling the destructor explicitly is necessary because move constructing typically sets the moved from object to an inactive state, but since we use `Value` to encapsulate STL types (i.e. types for which we do not implement the destructor) it's best to call the destructor explicitly to be safe.
    
    ghstack-source-id: 223225898
    exported-using-ghexport
    
    Reviewed By: jorgep31415
    
    Differential Revision: D56357187
    
    fbshipit-source-id: 4797a627efcd2a61ee35d4c6963e524b4161ff3b
    SS-JIA authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    825db6c View commit details
    Browse the repository at this point in the history
  14. Clean up api::vTensor class (#3149)

    Summary:
    Pull Request resolved: #3149
    
    ## Context
    
    Now that we have forked the `api/` directory from PyTorch Vulkan, we can clean up the `vTensor` class and remove functionality that is not necessary for the ExecuTorch Vulkan delegate.
    
    The following changes are made:
    
    * Remove unused member variables and member functions from `vTensor` and `vTensorStorage`
    * Remove all quantization related member variables, member functions, and the `vTensor` constructor for quantized tensors. The Quantization API will be reworked from the ground up.
    * Rename `view_` (which is an instance of `vTensorStorage`) to `storage_`
    
    Finally, the critical change that is introduced is that we now store `storage_` as a direct `vTensorStorage` member variable in `vTensor` instead of storing it as a `std::shared_ptr<vTensorStorage>`.
    
    For context, the reason `storage_` was stored as a shared pointer is to be compliant with ATen Tensors, which needs to enable copy construction to enable the following:
    
    ```
    at::Tensor b = at::rand(...);
    // Oftentimes this will create a "view" of the tensor. a and b will point the the same underlying storage, but with different metadata.
    at::Tensor a = b;
    ```
    
    However, in the ExecuTorch delegate this is no longer necessary. Each Tensor is associated with it's own independent storage and is responsible for managing it's own memory. **By getting rid of `std::shared_ptr`, we can avoid a heap allocation and avoid chasing pointers whenever we need to access the resources of a `vTensor`.**
    
    ghstack-source-id: 223225901
    exported-using-ghexport
    
    Reviewed By: jorgep31415
    
    Differential Revision: D55811279
    
    fbshipit-source-id: 95c0ecc9658ef9bc64ecee9e5c9e272da12786b8
    SS-JIA authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    bf5093a View commit details
    Browse the repository at this point in the history
  15. Introduce ParamsBindList to prevent needing to pass shared_ptr to…

    … bind parameter UBOs (#3150)
    
    Summary:
    Pull Request resolved: #3150
    
    ## Context
    
    In keeping with the below changeset in this stack, this diff introduces the `ParamsBindList` structure to avoid storing shared pointers to `api::UniformParamsBuffer` objects in `ExecuteNode` and `PrepackNode`.
    
    The idea is to store the binding information of each UPB instead of taking ownership of the UPB itself. There isn't really a need for `ExecuteNode` and `PrepackNode` to take ownership since `ComputeGraph` provides a guarantee that the UPBs will be in scope at the time of binding.
    
    With this change, all `shared_ptr` members can be eliminated from `vTensor`, further reducing heap allocations and pointer chasing.
    
    In the future I will change `prepack_nodes_` and `execute_nodes_` to store `PrepackNode` and `ExecuteNode` instances directly instead of storing unique pointers to them.
    
    ghstack-source-id: 223225899
    exported-using-ghexport
    
    Reviewed By: jorgep31415
    
    Differential Revision: D56357188
    
    fbshipit-source-id: 5f4d1be900711753aa2cc035c044fe71f93d555b
    SS-JIA authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    db17853 View commit details
    Browse the repository at this point in the history
  16. Rename tokenizer file in Xcode. (#3160)

    Summary:
    Pull Request resolved: #3160
    
    .
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56363030
    
    fbshipit-source-id: 489a7d4a32ca3b3d020d2639d9c14b330ce01d86
    shoumikhin authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    3ef9d2c View commit details
    Browse the repository at this point in the history
  17. Adding .model tokenizer to selection (#3163)

    Summary:
    Pull Request resolved: #3163
    
    We should allow both .bin and .model for tokenizer
    
    Reviewed By: shoumikhin
    
    Differential Revision: D56365079
    
    fbshipit-source-id: 9b59d15b0b16ffd5a091d3deadacec0771547f77
    kirklandsign authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    0800594 View commit details
    Browse the repository at this point in the history
  18. Docs for lower smaller models to mps/coreml/qnn (#3146)

    Summary:
    Pull Request resolved: #3146
    
    ghstack-source-id: 223235858
    
    Reviewed By: mcr229, kirklandsign
    
    Differential Revision: D56340028
    
    fbshipit-source-id: ef06142546ac54105ae87007cd82369917a22b3e
    cccclai authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    d47f9fe View commit details
    Browse the repository at this point in the history
  19. Add missing ops for RNNT predictor (#3125)

    Summary:
    Pull Request resolved: #3125
    
    As titled. Permute and quantized_layer_norm were not registered properly.
    
    Reviewed By: tarun292
    
    Differential Revision: D56305088
    
    fbshipit-source-id: 0ceee7b3404ba95c1e758b6daf3a5b3a16f85662
    mcremon-meta authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    023ca07 View commit details
    Browse the repository at this point in the history
  20. fix test-demo-android (#3168)

    Summary: Pull Request resolved: #3168
    
    Reviewed By: digantdesai, shoumikhin, mikekgfb
    
    Differential Revision: D56367151
    
    fbshipit-source-id: a502e55abf41419c0b1775d0b2ec6ab170fb6299
    manuelcandales authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    4ea0473 View commit details
    Browse the repository at this point in the history
  21. Slice, with lots of codegen improvements (#3171)

    Summary:
    Pull Request resolved: #3171
    
    1. Add slice operation. Instead of using copy in LI, we implement a simple shader with offsets.
    
    2. Improvement in codegen.
    - add support of optional variables
    - improve indent of the code, for better readability
    - allow user to specify tensor value generation, possible to generate sequential values for easier debugging for index operations
    - sample code improve test-case specification, particularly with long and optional values.
    ghstack-source-id: 223254861
    
    Reviewed By: SS-JIA, jorgep31415
    
    Differential Revision: D56295985
    
    fbshipit-source-id: f351dee25a72795d2ba768cb0bc33a467df64d8f
    yipjustin authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    7469a28 View commit details
    Browse the repository at this point in the history
  22. add kv cache to eval (#3162)

    Summary: Pull Request resolved: #3162
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56365716
    
    Pulled By: lucylq
    
    fbshipit-source-id: 707c5b869df128cc7e669fc0d78ca185f1c68f31
    lucylq authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    c8b43d2 View commit details
    Browse the repository at this point in the history
  23. Update model arg name rope_theta to be consistent with those in llama…

    …'s website (#3147)
    
    Summary:
    As title
    
    Pull Request resolved: #3147
    
    Reviewed By: larryliu0820
    
    Differential Revision: D56357117
    
    Pulled By: iseeyuan
    
    fbshipit-source-id: 85544712794681c8006a8f3713b8e0fba712650f
    iseeyuan authored and facebook-github-bot committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    70baafe View commit details
    Browse the repository at this point in the history

Commits on Apr 20, 2024

  1. conv1d, special case

    Summary:
    We follow D50914117 to implement a specific case of conv1d for our needs. Specifically, we require
    - the input tensor to have a single batch
    - groups == in_channels == out_channels
    - weight_sizes.at(1) == 1
    - stride == 1
    - padding == 0
    - dilation == 1
    
    We assume `bias==True`. The `bias==False` case in handled in the next diff.
    
    General cases and optimizations will be enabled later.
    
    Reviewed By: jorgep31415
    
    Differential Revision: D56220143
    
    fbshipit-source-id: a18de3a463875b9617cb7930febf7622fe866536
    copyrightly authored and facebook-github-bot committed Apr 20, 2024
    Configuration menu
    Copy the full SHA
    1d467d0 View commit details
    Browse the repository at this point in the history
  2. Qualcomm AI Engine Direct - Enable SSD300_VGG16 (#3010)

    Summary:
    - Enable SSD300_VGG16
    - Adding new OPs: sqrt, sum_intList
    - Add test cases for SSD300 and new OPs
    - Repository for SSD300_VGG16: https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection
    
    Pull Request resolved: #3010
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56280698
    
    Pulled By: cccclai
    
    fbshipit-source-id: 3de0a3e0c705fd2765401d61577c6e10b4eddb39
    winskuo-quic authored and facebook-github-bot committed Apr 20, 2024
    Configuration menu
    Copy the full SHA
    7b1f10d View commit details
    Browse the repository at this point in the history
  3. conv1d with bias=False

    Summary: Under the same setting as the last diff, we support `bias=false`.
    
    Reviewed By: jorgep31415
    
    Differential Revision: D56285842
    
    fbshipit-source-id: 41636d19d2cd7db07ba924606c9cd33999cffdab
    copyrightly authored and facebook-github-bot committed Apr 20, 2024
    Configuration menu
    Copy the full SHA
    87eb155 View commit details
    Browse the repository at this point in the history
  4. Switch to a dedicated brach for prebuilt packages. (#3184)

    Summary:
    Pull Request resolved: #3184
    
    .
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56383903
    
    fbshipit-source-id: 88bce7f7b4987a5cc8a649480af09a0e1cac90ee
    shoumikhin authored and facebook-github-bot committed Apr 20, 2024
    Configuration menu
    Copy the full SHA
    36453fc View commit details
    Browse the repository at this point in the history

Commits on Apr 21, 2024

  1. Use "latest" as the version for prebuilt frameworks. (#3161)

    Summary:
    Pull Request resolved: #3161
    
    .
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56363475
    
    fbshipit-source-id: f2a56e7baef600ac45793878520d2bf2cbe6bfe7
    shoumikhin authored and facebook-github-bot committed Apr 21, 2024
    Configuration menu
    Copy the full SHA
    d89eabb View commit details
    Browse the repository at this point in the history
  2. Deprecate gpu_sizes_ubo() and extents(); also toggle packing layo…

    …ut via specialization constants (#3181)
    
    Summary:
    Pull Request resolved: #3181
    
    ## Context
    
    This changeset cleans up how shaders consume tensor metadata in two ways:
    
    ### Pass in Packing Layout via Specialization Shader
    
    The packing layout of a tensor determines how to convert between tensor indices and physical texture coordinates. Currently, the packing layout is determined by generating a completely new variant of a shader. However, this is rather expensive for build size.
    
    Specialization constants support was added a while back, which enables packing layout to be communicated to the shader via a specialization constant. This is a much better and natural way for shaders to determine the packing layout of its tensors and vary its behaviour.
    
    The primary benefit of this is that we can vastly reduce the number of variants that are generated. Generating shader variants for combinations of dtypes and memory layouts can lead to combinatorial explosion of build size.
    
    Note that dtype cannot be passed as a specialization constant since it impacts the types used in the layout portion of a shader.
    
    ### Deprecate GPU sizes and Extents
    
    Currently there are 3 representations of the tensor's sizes; `cpu_sizes()`, `gpu_sizes()`, and `extents()`. The GPU sizes is a simple modification of the CPU sizes where the packed dim is aligned to the next multiple of 4. Extents represents the physical extents of the image texture used to store the image.
    
    However, often times shaders need to reference the original sizes of the tensor so we end up passing two different representations of the tensor sizes. The CPU sizes and extents is used to determine out of bounds elements and the GPU sizes is used to convert between logical tensor indices and physical texture coordinates.
    
    Since the GPU sizes and extents are easily determined from the CPU sizes given the packing layout, deprecate GPU sizes and use CPU sizes exclusively as the canonical tensor sizes. Hence `cpu_sizes()` is renamed to simple `sizes()`.
    
    The primary benefit of this change is such:
    
    1. Less confusion over how to reference the tensor sizes
    2. Fewer descriptors to bind when constructing compute pipelines
    3. Fewer uniform buffers to update when resizing tensors between inferences.
    ghstack-source-id: 223317313
    
    Reviewed By: yipjustin
    
    Differential Revision: D56377775
    
    fbshipit-source-id: 31235fbdf0b694e24b8c6fc0b40c56ddb818439d
    SS-JIA authored and facebook-github-bot committed Apr 21, 2024
    Configuration menu
    Copy the full SHA
    c350e58 View commit details
    Browse the repository at this point in the history
  3. Specify OSX deployment target for python package. (#3193)

    Summary:
    Pull Request resolved: #3193
    
    .
    
    Reviewed By: mikekgfb
    
    Differential Revision: D56403324
    
    fbshipit-source-id: 07b29b0b12a8995bce4d45ea9308a5b3c566d7e6
    shoumikhin authored and facebook-github-bot committed Apr 21, 2024
    Configuration menu
    Copy the full SHA
    7c74010 View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2024

  1. Specify OSX deployment target for python package. (#3194)

    Summary:
    Pull Request resolved: #3194
    overriding_review_checks_triggers_an_audit_and_retroactive_review
    Oncall Short Name: executorch
    
    Differential Revision: D56405473
    
    fbshipit-source-id: 785709e8acc1b07e57825b278c3e0a355641e13a
    shoumikhin authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    a7a9ab3 View commit details
    Browse the repository at this point in the history
  2. Fix linter. (#3195)

    Summary:
    Pull Request resolved: #3195
    overriding_review_checks_triggers_an_audit_and_retroactive_review
    Oncall Short Name: executorch
    
    Differential Revision: D56405764
    
    fbshipit-source-id: 284f54c9aabdebb070edf7d6931b43260af8ad24
    shoumikhin authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    ebc38b2 View commit details
    Browse the repository at this point in the history
  3. support emit sym value from delegate (#3103)

    Summary:
    Pull Request resolved: #3103
    
    For dynamic shape, if delegate output is dynamic shape, the return might be something like `(s0, x, y)`, and `s0` is a sym type while others are fake tensor. In this case, we will emit the sym value (including `SymFloat`, `SymBool`, `SymInt`) to a unique Evalue.
    
    Since the sym type node will have an empty spec, we use the `node.meta['val']` to find out it's a sym type node.
    
    Reviewed By: mcr229
    
    Differential Revision: D56176100
    
    fbshipit-source-id: a4ddc7225ed014c59ceb9fa8ba4a9cb394af00e5
    cccclai authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    73599f4 View commit details
    Browse the repository at this point in the history
  4. Update Xcode project to build tiktoken tokenizer for LLaMA 3. (#3197)

    Summary:
    Pull Request resolved: #3197
    
    .
    
    Reviewed By: mikekgfb
    
    Differential Revision: D56408302
    
    fbshipit-source-id: 93b14fbbc70cde4ebaaab0084a78d7bd3b3e4b4a
    shoumikhin authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    8dc54d5 View commit details
    Browse the repository at this point in the history
  5. Add quantized ops to pybindings (#3206)

    Summary: Pull Request resolved: #3206
    
    Test Plan:
    Imported from GitHub, without a `Test Plan:` line.
    
    Test with eval, run pte file through pybindings that uses quantized embeddings
    ```
    python3 -m examples.models.llama2.eval_llama --pte ../pte_files/llama3/llama3_x_int4_128_kv_sdpa_qe4_32.pte  -p ../llama-models/llama3/params_april18.json -t ../llama-models/llama3/tokenizer_april18.model --max_seq_len 127 --limit 5
    ```
    
    Reviewed By: larryliu0820
    
    Differential Revision: D56426846
    
    Pulled By: lucylq
    
    fbshipit-source-id: ced9feaf043cf7beec94a08a109e9709864f15a2
    lucylq authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    d24af2b View commit details
    Browse the repository at this point in the history
  6. Add memory and vector include in managed_tensor.h (#3201)

    Summary:
    Pull Request resolved: #3201
    
    In order to get rid of this patch https://github.com/pytorch/torchchat/blob/main/scripts/install_et.sh#L35-L36
    
    We upstream the changes into ExecuTorch.
    
    Reviewed By: lucylq
    
    Differential Revision: D56424633
    
    fbshipit-source-id: 72e6675b467416753b0fd995d8e514396eef8331
    larryliu0820 authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    90d0c1a View commit details
    Browse the repository at this point in the history
  7. Refactor export_llama_lib.py

    Summary:
    Refactor the hell out of export_llama_lib.py.
    
    All quantizer logic goes into `lib/quant_lib.py`.
    
    All partitioner logic goes into `lib/partitioner_lib.py`.
    
    All source transformation logic goes into `source_transformation/`.
    
    Reviewed By: iseeyuan, cccclai
    
    Differential Revision: D56372411
    
    fbshipit-source-id: bfdf842980c7271aebaadfc445272fa4ca96f0d8
    larryliu0820 authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    67123b6 View commit details
    Browse the repository at this point in the history
  8. Update setup.sh for tokenizer selection (#3207)

    Summary:
    For LLAMA3, users need to use tiktoken. Add a option to load from env var.
    
    Pull Request resolved: #3207
    
    Reviewed By: cccclai
    
    Differential Revision: D56430637
    
    Pulled By: kirklandsign
    
    fbshipit-source-id: cc1cc50100d6142510a455ca29d56a810942f90b
    kirklandsign authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    1a93dee View commit details
    Browse the repository at this point in the history
  9. Qualcomm AI Engine Direct - Fixed uint16 tensor and linear op (#3196)

    Summary:
    - Fixed uint16 data type of tensor
    
    Pull Request resolved: #3196
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56431363
    
    Pulled By: cccclai
    
    fbshipit-source-id: 42d763a18f7288c3ec0f233fcc52dde1476895bd
    shewu-quic authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    3bb591c View commit details
    Browse the repository at this point in the history
  10. Add a pure python wrapper to pybindings.portable_lib (#3137)

    Summary:
    Pull Request resolved: #3137
    
    When installed as a pip wheel, we must import `torch` before trying to import the pybindings shared library extension. This will load libtorch.so and related libs, ensuring that the pybindings lib can resolve those runtime dependencies.
    
    So, add a pure python wrapper that lets us do this when users say `import executorch.extension.pybindings.portable_lib`
    
    We only need this for OSS, so don't bother doing this for other pybindings targets.
    
    Reviewed By: orionr, mikekgfb
    
    Differential Revision: D56317150
    
    fbshipit-source-id: 920382636732aa276c25a76163afb7d28b1846d0
    dbort authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    969aa96 View commit details
    Browse the repository at this point in the history
  11. Remove unused extension/aot_util directory (#3216)

    Summary:
    The AOT util extension was removed a while back, but the directory and README still exist. This PR cleans them up. Note that the aot_util sources were deleted previously, so this is not a functional change.
    
    Pull Request resolved: #3216
    
    Test Plan: CI. This is not a functional change, as it changes only a README file.
    
    Reviewed By: metascroy
    
    Differential Revision: D56436216
    
    Pulled By: GregoryComer
    
    fbshipit-source-id: 2f8b8cee20b7a3efb25a1ef1df3ebd69f3b512c9
    GregoryComer authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    67f3376 View commit details
    Browse the repository at this point in the history
  12. Create dependabot rule to upgrade TorchFix version (#3208)

    Summary:
    The parameters are from From https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
    
    ### Testing
    On my own fork https://github.com/huydhn/executorch/blob/main/.github/dependabot.yml, and the PR to upgrade TorchFix is created successfully huydhn#2
    
    Pull Request resolved: #3208
    
    Reviewed By: kit1980
    
    Differential Revision: D56428297
    
    Pulled By: huydhn
    
    fbshipit-source-id: 8b4f9d638d208fe6f476efdf7667058b2d2ae2fc
    huydhn authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    dbf90c2 View commit details
    Browse the repository at this point in the history
  13. Bring back extents_ubo() as texture_limits_ubo() (#3217)

    Summary:
    Pull Request resolved: #3217
    
    ## Context
    
    #3181 deprecated the `gpu_sizes_ubo()` and `extents_ubo()` functions of `vTensor` in order to standardize how shaders consume shape/size metadata of input tensors. However, this came at the cost of increasing the overhead required for bounds checking, which is needed to support dynamic shapes as shaders now needed to convert the input sizes to texture limits before checking if a given texel position is out of bounds.
    
    Benchmarking revealed that this overhead can be quite significant especially on lower power mobile GPUs. In the interest of preserving performance, `extents_ubo()` is re-introduced since bounds checking is an operation that is common to every single shader. However, some improvements are made:
    
    * instead of `extents`, the nomenclature `texture_limits` is used in order to differentiate from physical image extents of the texture.
    * `texture_limits` is represented via an `ivec3` (previously `uvec4`); this means that to use it for bounds checking, there does not need to be an implicit cast to from `uvec` to `ivec` and there is also no need for swizzling.
    
    Also introduced in this changeset is the convention of passing both the texture limits and tensor sizes instead of using `pos_out_of_bounds()`. Passing in the texture limits is probably cheaper than using `pos_out_of_bounds()`. There are some exceptions though where I choose not to migrate to this pattern to avoid passing in too many variants of tensor metadata.
    
    ### What about `gpu_sizes_ubo`?
    
    I will hold off on re-introducing `gpu_sizes_ubo` for now since converting `sizes` to `gpu_sizes` is much cheaper compared to `pos_out_of_bounds()`:
    
    ```
    ivec4 sizes[packed_dim] = alignup4(sizes[packed_dim])
    ```
    
    Will perform some additional benchmarking on this to see if the overhead of the alignment warrants an explicit API for passing in GPU sizes to shaders.
    ghstack-source-id: 223453651
    exported-using-ghexport
    
    Reviewed By: yipjustin, jorgep31415
    
    Differential Revision: D56435574
    
    fbshipit-source-id: 656f79eecbfc7c77cbe067df6c9ea54c51c50633
    SS-JIA authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    9769386 View commit details
    Browse the repository at this point in the history
  14. backout the schema definition change (#3213)

    Summary:
    Pull Request resolved: #3213
    
    The schame was changed to avoid double register, but it was hiding the symptons by using a differnt schema. Resume the correct the schema
    
    Reviewed By: larryliu0820
    
    Differential Revision: D56432559
    
    fbshipit-source-id: d9d0a92a6c6fa04857ea01916647eb46ed658849
    cccclai authored and facebook-github-bot committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    9d2af4c View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2024

  1. Update some SDK docs from MVP (#3212)

    Summary:
    Pull Request resolved: #3212
    
    doc changes including
    1. Remove instruction for Buck because we're moving away from it and just use CMake now and future;
    2. Remove Coming soon for the realized feature;
    3. Formatting.
    
    Reviewed By: Jack-Khuu
    
    Differential Revision: D56433016
    
    fbshipit-source-id: fffa283b4a04438866d84765a65377dcf8a88837
    Olivia-liu authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    b41f763 View commit details
    Browse the repository at this point in the history
  2. Bump the torch pin (#3199)

    Summary:
    It's requested by torchchat to have a newer version of torch nightly. Bump it from 4/15 to 4/21.
    
    Pull Request resolved: #3199
    
    Reviewed By: malfet
    
    Differential Revision: D56420105
    
    Pulled By: iseeyuan
    
    fbshipit-source-id: 3d2a9b0f8dbb48f0a81c7cdef8e419206b036faf
    iseeyuan authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    03c7a99 View commit details
    Browse the repository at this point in the history
  3. Fix LLAMA app (#3228)

    Summary:
    Pull Request resolved: #3228
    
    Fix a UI thread issue causing crash.
    
    Reviewed By: cccclai
    
    Differential Revision: D56447006
    
    fbshipit-source-id: 02eff27d4b4cd108c95b664d04679d4f92aaf5db
    kirklandsign authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    4389442 View commit details
    Browse the repository at this point in the history
  4. Fix executor_runner_mps and mpsdelegate linking with pybind (#3222)

    Summary:
    Summary of changes:
    - fixes mps_executor_runner build - previously it would fail to build previously due to incorrect linking with portable ops
    - fixes `mpsdelegate` linking with `pybind` lib
    - added tests to check correctness directly through pybind
    - added a helper file (`bench_utils.py`) to help measure models forward pass between PyTorch MPS and ExecuTorch MPS
    
    Testing (will run both AOT and runtime if MPS was built with pybind):
    - `./install_requirements.sh --pybind mps`
    - invoke a single unit test: `python3 -m unittest backends.apple.mps.test.test_mps_indexing_ops -v -k test_mps_indexing_get_1`.
    - invoke all tests from a file: `python3 -m unittest backends.apple.mps.test.test_mps_indexing_ops -v`
    
    cc cccclai , shoumikhin
    
    Pull Request resolved: #3222
    
    Reviewed By: shoumikhin
    
    Differential Revision: D56447888
    
    Pulled By: cccclai
    
    fbshipit-source-id: 5cbbcbf8df34f29e23a1854df72f764337a9df76
    DenisVieriu97 authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    6c30eea View commit details
    Browse the repository at this point in the history
  5. Update to transformers 4.38 (#3227)

    Summary:
    To fix CVE-2024-3568
    
    Pull Request resolved: #3227
    
    Reviewed By: mikekgfb
    
    Differential Revision: D56447728
    
    Pulled By: malfet
    
    fbshipit-source-id: 3758d9def101d58cead7bcae00cc91237abf42dd
    malfet authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    aec2549 View commit details
    Browse the repository at this point in the history
  6. Update TorchNightly to 2024.04.22 (#3225)

    Summary: Pull Request resolved: #3225
    
    Reviewed By: larryliu0820
    
    Differential Revision: D56447049
    
    Pulled By: malfet
    
    fbshipit-source-id: 0e92827f9dead7422334abd84d3bd540cb87fb50
    malfet authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    9783697 View commit details
    Browse the repository at this point in the history
  7. Support llama3 (#3232)

    Summary: Pull Request resolved: #3232
    
    Reviewed By: iseeyuan
    
    Differential Revision: D56450983
    
    Pulled By: kirklandsign
    
    fbshipit-source-id: 94103040321df55d6fb53a2971512fd1bdfd5ec8
    kirklandsign authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    4668b5d View commit details
    Browse the repository at this point in the history
  8. strip symbol when linking (#3234)

    Summary:
    Pull Request resolved: #3234
    
    Refer to https://sourceware.org/binutils/docs/binutils/strip.html
    command to build for android
    ```
    rm -rf cmake-android-out && mkdir cmake-android-out
    
    cmake -DBUCK2="$BUCK" \
        -DCMAKE_INSTALL_PREFIX=cmake-android-out \
        -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" \
        -DANDROID_ABI="arm64-v8a" \
        -DANDROID_PLATFORM=android-29 \
        -DCMAKE_BUILD_TYPE=Release \
        -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
        -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
        -DEXECUTORCH_BUILD_CUSTOM=ON \
        -DEXECUTORCH_BUILD_OPTIMIZED=ON \
        -DEXECUTORCH_BUILD_QUANTIZED=ON \
        -DEXECUTORCH_BUILD_XNNPACK=ON \
        -DEXECUTORCH_ENABLE_LOGGING=ON \
        -Bcmake-android-out .
    
    cmake --build cmake-android-out -j16 --target install --config Release
    
    cmake -DBUCK2="$BUCK" \
        -DCMAKE_INSTALL_PREFIX=cmake-android-out \
        -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" \
        -DANDROID_ABI="arm64-v8a" \
        -DANDROID_PLATFORM=android-23 \
        -DCMAKE_BUILD_TYPE=Release \
        -DEXECUTORCH_BUILD_CUSTOM=ON \
        -DEXECUTORCH_BUILD_OPTIMIZED=ON \
        -DEXECUTORCH_BUILD_XNNPACK=ON \
        -DEXECUTORCH_ENABLE_LOGGING=ON \
        -DEXECUTORCH_USE_TIKTOKEN=ON \
        -Bcmake-android-out/${dir} \
        ${dir}
    
    cmake --build cmake-android-out/${dir} -j16 --config Release
    
    ```
    
    ```
    (executorch) chenlai@chenlai-mbp executorch % du -sh cmake-android-out/examples/models/llama2/*
     44K	cmake-android-out/examples/models/llama2/CMakeCache.txt
    2.2M	cmake-android-out/examples/models/llama2/CMakeFiles
     76K	cmake-android-out/examples/models/llama2/Makefile
    4.0K	cmake-android-out/examples/models/llama2/cmake_install.cmake
    4.0K	cmake-android-out/examples/models/llama2/compile_commands.json
    4.9M	cmake-android-out/examples/models/llama2/custom_ops
    736K	cmake-android-out/examples/models/llama2/lib
     54M	cmake-android-out/examples/models/llama2/llama_main
     16K	cmake-android-out/examples/models/llama2/options-pinned.h
     11M	cmake-android-out/examples/models/llama2/runner
    151M	cmake-android-out/examples/models/llama2/third-party
    ```
    
    Reviewed By: lucylq, kirklandsign
    
    Differential Revision: D56450794
    
    fbshipit-source-id: 79e77732713708f3ced3801d11e30a9141075a76
    cccclai authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    d8e94b0 View commit details
    Browse the repository at this point in the history
  9. fix typo (#3235)

    Summary:
    Pull Request resolved: #3235
    
    It's "Release" not "RELEASE"....
    
    Reviewed By: lucylq
    
    Differential Revision: D56451118
    
    fbshipit-source-id: 63702f6fb906b3bc0e8d79061a7f7f6e849ea162
    cccclai authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    4342cf2 View commit details
    Browse the repository at this point in the history
  10. Bump torchfix from 0.1.1 to 0.5.0 (#3220)

    Summary:
    Bumps [torchfix](https://github.com/pytorch-labs/torchfix) from 0.1.1 to 0.5.0.
    <details>
    <summary>Release notes</summary>
    <p><em>Sourced from <a href="https://github.com/pytorch-labs/torchfix/releases">torchfix's releases</a>.</em></p>
    <blockquote>
    <h2>TorchFix 0.5.0</h2>
    <ul>
    <li>Added rule TOR203 to replace 'import torchvision.models as models' with 'from torchvision import models'</li>
    <li>Added rules TOR104 and TOR105 for calling and importing non-public PyTorch functions that have known public aliases</li>
    <li>Added rules TOR004 and TOR103 for importing removed and deprecated functions (in addition to the existing rules for calling those functions)</li>
    <li>Fixed loading for deprecated symbols config in zipped deployments</li>
    <li>Done several smaller bug fixes and refactorings</li>
    </ul>
    <h2>TorchFix 0.4.0</h2>
    <ul>
    <li>Improvements for the standalone <code>torchfix</code> command:
    <ul>
    <li>Added  <code>--version</code> flag</li>
    <li><code>--select</code> flag now accepts specific rules, not just <code>ALL</code></li>
    <li>Fixed excessive debug output on MacOS</li>
    </ul>
    </li>
    <li>Added PyTorch-internal rule TOR901</li>
    <li>TorchFix explicitly requires at least Python 3.9 now</li>
    <li>Small clean-ups and bugfixes</li>
    </ul>
    <h2>TorchFix 0.3.0</h2>
    <ul>
    <li>Added rule TOR003 about explicitly passing <code>use_reentrant</code> to <code>torch.utils.checkpoint</code></li>
    <li>Added <code>torch.nn.utils.weight_norm</code> to the list of deprecated functions flagged by TOR101</li>
    <li>Updated README with TOR0 rules description</li>
    </ul>
    <h2>TorchFix 0.2.1: first release for pytorch-labs/torchfix repo</h2>
    <p>This is the first release for pytorch-labs/torchfix repo, with the only differences from TorchFix 0.2.0 on PyPI are files related to repo maintenance and project metadata.</p>
    </blockquote>
    </details>
    <details>
    <summary>Commits</summary>
    <ul>
    <li>See full diff in <a href="https://github.com/pytorch-labs/torchfix/commits/v0.5.0">compare view</a></li>
    </ul>
    </details>
    <br />
    
    [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=torchfix&package-manager=pip&previous-version=0.1.1&new-version=0.5.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
    
    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.
    
    [//]: # (dependabot-automerge-start)
    [//]: # (dependabot-automerge-end)
    
     ---
    
    <details>
    <summary>Dependabot commands and options</summary>
    <br />
    
    You can trigger Dependabot actions by commenting on this PR:
    - `dependabot rebase` will rebase this PR
    - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
    - `dependabot merge` will merge this PR after your CI passes on it
    - `dependabot squash and merge` will squash and merge this PR after your CI passes on it
    - `dependabot cancel merge` will cancel a previously requested merge and block automerging
    - `dependabot reopen` will reopen this PR if it is closed
    - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    - `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
    - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    
    </details>
    
    Pull Request resolved: #3220
    
    Reviewed By: kit1980
    
    Differential Revision: D56449277
    
    Pulled By: huydhn
    
    fbshipit-source-id: ad3c86d49f86427c91af28063d5347b37b893e87
    dependabot[bot] authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    0afb73d View commit details
    Browse the repository at this point in the history
  11. Pin CoreMLTools 7.2 (#3170)

    Summary:
    It is more stable to pin a release branch of CoreMLTools. We will periodically update it when necessary
    
    Pull Request resolved: #3170
    
    Reviewed By: cccclai
    
    Differential Revision: D56373108
    
    Pulled By: shoumikhin
    
    fbshipit-source-id: d6a96813f07df97abbf8f4ca75e2aae2666372b1
    yifan_shen3 authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    cb77763 View commit details
    Browse the repository at this point in the history
  12. Expand visibility of targets needed for executorch_llama2 kernel (#3174)

    Summary:
    Pull Request resolved: #3174
    
    See title
    
    Reviewed By: tarun292
    
    Differential Revision: D56361946
    
    fbshipit-source-id: 12d5d9cb3f265173696173073b6d2357dae0848a
    stephenbo-meta authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    7b854b6 View commit details
    Browse the repository at this point in the history
  13. Support tensors in prim_getters (#3203)

    Summary:
    Pull Request resolved: #3203
    
    Adding support for tensors and tensor lists in prim getters
    
    Reviewed By: JacobSzwejbka
    
    Differential Revision: D56426044
    
    fbshipit-source-id: 164e916bc7662d2864cee2a6d1cb06177311438d
    tarun292 authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    6c36f10 View commit details
    Browse the repository at this point in the history
  14. Enable doc upload for tags, disable for release branches (#3153)

    Summary:
    - Disabled doc upload for branches like release/x.x
    - Enabled publishing for tags.
    
    Tested locally:
    ```
    export GITHUB_REF=refs/tags/v3.1.4-rc5
    bash test-version.py
    ```
    ```
    # test-version.py
    if [[ "${GITHUB_REF}" =~ ^refs/tags/v([0-9]+\.[0-9]+)\.* ]]; then
      TARGET_FOLDER="${BASH_REMATCH[1]}"
    else
      TARGET_FOLDER="main"
    fi
    echo "Target folder: ${TARGET_FOLDER}"
    ```
    Output:
    ```
    Target folder: 3.1
    ```
    One more:
    ```
    export GITHUB_REF=refs/tags/v1.15.4
    bash test-version.sh
    ```
    Output:
    ```
    Target folder: 1.15
    ```
    
    Pull Request resolved: #3153
    
    Reviewed By: dbort
    
    Differential Revision: D56445037
    
    Pulled By: svekars
    
    fbshipit-source-id: e7328523dfe308e8921c1e4f365d9a757d053191
    svekars authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    ee8c3a6 View commit details
    Browse the repository at this point in the history
  15. Update Core ML Backend Doc (#3188)

    Summary:
    Update Core ML backend doc on:
    1. Partitioner
    2. Quantizer
    
    Pull Request resolved: #3188
    
    Reviewed By: shoumikhin
    
    Differential Revision: D56481126
    
    Pulled By: cccclai
    
    fbshipit-source-id: 925a107a210094e035a816a15c70d9aedd5bd369
    yifan_shen3 authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    c004efe View commit details
    Browse the repository at this point in the history
  16. bundled program alpha document (#3224)

    Summary:
    Pull Request resolved: #3224
    
    as title
    
    Reviewed By: tarun292, Jack-Khuu
    
    Differential Revision: D56446890
    
    fbshipit-source-id: fc3dc6bb2349cd7ca4a8e998e528176dd9fb7679
    Gasoonjia authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    783e932 View commit details
    Browse the repository at this point in the history
  17. Fix a small inconsistency on the SDK debugging page (#3247)

    Summary:
    Pull Request resolved: #3247
    
    so that the code is consistent with the text description
    
    Reviewed By: dbort
    
    Differential Revision: D56481274
    
    fbshipit-source-id: f303b966ebf3e07b510ef825c7bc09eaecd89554
    Olivia-liu authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    ca8e589 View commit details
    Browse the repository at this point in the history
  18. Update tutorial (#3242)

    Summary:
    Pull Request resolved: #3242
    
    Removed the use of capture_pre_autograd_graph in places where we are not quantizing, since we want to minimize the usage of this API for easier deprecation in the future.
    
    Reviewed By: mergennachin
    
    Differential Revision: D56475332
    
    fbshipit-source-id: bd5cd4969f953d6d8e98ef7f04ad3d4a96bdacf1
    angelayi authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    ee28868 View commit details
    Browse the repository at this point in the history
  19. update sdk delegate integration (#3246)

    Summary:
    Pull Request resolved: #3246
    
    As title
    
    Reviewed By: tarun292
    
    Differential Revision: D56479387
    
    fbshipit-source-id: c324d2b46dc7f849dfb42b3452c6a82f24aa9319
    cccclai authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    cf487f1 View commit details
    Browse the repository at this point in the history
  20. Add iPad support to demo apps. (#3251)

    Summary:
    Pull Request resolved: #3251
    
    .
    
    Reviewed By: cccclai
    
    Differential Revision: D56488666
    
    fbshipit-source-id: d63a08b4abdf055607948229be88f0c7762948ab
    shoumikhin authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    1eaed2b View commit details
    Browse the repository at this point in the history
  21. Add more prebuilt artifacts (#3245)

    Summary:
    Build for different ABI in prebuild.
    
    Pull Request resolved: #3245
    
    Test Plan: CI
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56480274
    
    Pulled By: huydhn
    
    fbshipit-source-id: 451116a0f90745dd9f08ef32be3fe02940d6fbb1
    kirklandsign authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    3b0f271 View commit details
    Browse the repository at this point in the history
  22. SDK tutorial doc update (#3238)

    Summary:
    Pull Request resolved: #3238
    
    fix some links, remove outdated commands
    
    Reviewed By: GregoryComer
    
    Differential Revision: D56453800
    
    fbshipit-source-id: 8bd86a593f8c5b9342e61ab2d129473d315b57a8
    Olivia-liu authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    f89c312 View commit details
    Browse the repository at this point in the history
  23. conv1d general case (#3223)

    Summary:
    Pull Request resolved: #3223
    
    We port jorgep31415's work of conv1d for lite interpreter into ET. The current implementation supports general batch_size, weight_size, stride, padding, dilation and groups.
    
    Reviewed By: jorgep31415
    
    Differential Revision: D56380147
    
    fbshipit-source-id: 62fdc2958d683590317aaec5be3d0366f6df42e4
    copyrightly authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    45fd796 View commit details
    Browse the repository at this point in the history
  24. move code under executorch/example (#3176)

    Summary:
    Pull Request resolved: #3176
    This diff moves llm manual code from outside github (Dave's and Georgey's) to executorch codebase for better pointing to.
    After this diff. //executorch/examples/llm_maunal will become the only source of truth of our llm manual code.
    
    Reviewed By: byjlw, dbort
    
    Differential Revision: D56365058
    
    fbshipit-source-id: 97280fc0ca955caabb6056cddbb72102ed711f2c
    Gasoonjia authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    b6e54d0 View commit details
    Browse the repository at this point in the history
  25. update XNNPACK/README.md (#3236)

    Summary:
    Pull Request resolved: #3236
    
    Fixing the XNNPACK/README
    - Updated the file layout overview
    - Added end-to-end tutorial flow for quick starts
    - Added See more section linking to static docs
    
    Reviewed By: metascroy
    
    Differential Revision: D56431923
    
    fbshipit-source-id: 4f3e35d85c27330ed46fe189351b3aa570c5aa43
    mcr229 authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    8748d57 View commit details
    Browse the repository at this point in the history
  26. Update Profiling Section in XNNPACK Delegate Docs (#3237)

    Summary:
    Pull Request resolved: #3237
    
    Updating Profiling Section of the docs
    
    Main point is pointing the the SDK Profiling Tutorial on how to get XNNPACK profiling information
    
    Reviewed By: metascroy, cccclai
    
    Differential Revision: D56439491
    
    fbshipit-source-id: 1d724ffae6d89e8769ea427cb37b4ec85fe3452f
    mcr229 authored and facebook-github-bot committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    329184a View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2024

  1. Add allocate_temp method to KernelRuntimeContext (#3209)

    Summary:
    Pull Request resolved: #3209
    
    This adds an `allocate_temp` method to KernelRuntimeContext, and passes the temporary memory allocator from `execute_instruction`. The method returns a result that errors if the temporary `MemoryAllocator` was not provided or the memory could not be allocated.
    
    Reviewed By: dbort
    
    Differential Revision: D56421957
    
    fbshipit-source-id: 6da73bdb8e31638fc6d575e98cfc08c27b25f09c
    David Lin authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    719b368 View commit details
    Browse the repository at this point in the history
  2. Inspector APIs page

    Summary:
    The old screenshot has outdated event block name and event names. New screenshot was taken from a recent real run.
    
    bypass-github-export-checks
    bypass-github-pytorch-ci-checks
    bypass-github-executorch-ci-checks
    
    Reviewed By: tarun292, Jack-Khuu
    
    Differential Revision: D56447799
    
    fbshipit-source-id: 040fe45311c9aa8e8a1a0f6756ebda5f0ebbdebf
    Olivia-liu authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    9c99fe1 View commit details
    Browse the repository at this point in the history
  3. fix qnn install link (#3260)

    Summary:
    Pull Request resolved: #3260
    
    As title, the link was wrong...
    
    Reviewed By: kirklandsign
    
    Differential Revision: D56498322
    
    fbshipit-source-id: 42708b5f7a634f1c01e05af4c897d0c6da54d724
    cccclai authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    e9d7868 View commit details
    Browse the repository at this point in the history
  4. Add index.Tensor and aten.logical_not (#3221)

    Summary:
    Add missing llama ops for MPS delegate:
    - `index.Tensor`
    - `logical_not`
    
    `index.put` works correctly for generating 1 token, but gives incorrect results on 2nd token. This remains disabled.
    
    Summary of changes:
    - Adds missing llama2 ops
    - Adds support for launching Metal kernels instead of MPSGraph ops (if MPSGraph doesn't have the support)
    
    cc cccclai , shoumikhin
    
    Pull Request resolved: #3221
    
    Reviewed By: shoumikhin
    
    Differential Revision: D56447710
    
    Pulled By: cccclai
    
    fbshipit-source-id: 778a485df5e67d1afd006b42f07b69c8a3961223
    DenisVieriu97 authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    02a6b66 View commit details
    Browse the repository at this point in the history
  5. Fix broken links on the coreml tutorial page (#3250)

    Summary: Pull Request resolved: #3250
    
    Reviewed By: dbort
    
    Differential Revision: D56487125
    
    fbshipit-source-id: 502019365de043a7e07bb0d766134b334ee115ba
    Olivia-liu authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    ba0caf8 View commit details
    Browse the repository at this point in the history
  6. Fix compilation with gcc-9+ (#3262)

    Summary:
    To fix `cannot resolve overloaded function ‘isinf’ based on conversion to type ‘torch::executor::FunctionRef<bool(double)>’` error
    
    Not sure how it ever worked before see https://godbolt.org/z/939YKdjqW
    
    Pull Request resolved: #3262
    
    Reviewed By: kimishpatel, manuelcandales
    
    Differential Revision: D56501235
    
    Pulled By: malfet
    
    fbshipit-source-id: 6f89beef9fd56a80ecbb2df573821da95b2da746
    malfet authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    d98dc01 View commit details
    Browse the repository at this point in the history
  7. Add delegate time scale converter to Inspector (#3240)

    Summary:
    Pull Request resolved: #3240
    
    The time scale of delegate events reported might be different from the timescale of CPU events. This diff adds support for providing a callable that can be invoked by Inspector to modify the timescale of delegated events to ensure consistency in timescales across delegated and non-delegated events.
    
    Reviewed By: Jack-Khuu
    
    Differential Revision: D55298701
    
    fbshipit-source-id: e888e51b602c7e1ec8cb9e05ac052280daa12823
    tarun292 authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    b7b40ac View commit details
    Browse the repository at this point in the history
  8. Tie quantization of add operands and result together (#3091)

    Summary:
    Change-Id: Ie2662ebd6555821fa1d813163daf4b209a319b44
    
    Pull Request resolved: #3091
    
    Reviewed By: mergennachin
    
    Differential Revision: D56476825
    
    Pulled By: digantdesai
    
    fbshipit-source-id: 7f1e7d8ab9051c30c69189244ea927ed49440d93
    per authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    8b1f49a View commit details
    Browse the repository at this point in the history
  9. Add semihosting to cmake for executor_runner (#3008)

    Summary:
    Add cmake option to enable semihosting for the executor runner application.
    
    Change-Id: I5db7271413b39e5122f86f321d15dd2a1086a547
    
    Pull Request resolved: #3008
    
    Reviewed By: mergennachin
    
    Differential Revision: D56476642
    
    Pulled By: digantdesai
    
    fbshipit-source-id: 5cc60da33d1999bb3e3baff2d57e196c65e4b819
    per authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    6712185 View commit details
    Browse the repository at this point in the history
  10. Capture output of Vela and print on error (#3057)

    Summary:
    Change-Id: I0443a6ab26766a51511d9e4ea532fc8e76836ede
    
    Pull Request resolved: #3057
    
    Reviewed By: mergennachin
    
    Differential Revision: D56476746
    
    Pulled By: digantdesai
    
    fbshipit-source-id: 4b6d9738a9202980fa06bb8f4232fb4a916a7633
    Erik-Lundell authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    2f5cbd4 View commit details
    Browse the repository at this point in the history
  11. Fix for TOSA BI clamp ops (#3092)

    Summary:
    Min/max range values need to be on quantized form.
    
    Pull Request resolved: #3092
    
    Reviewed By: mergennachin
    
    Differential Revision: D56476931
    
    Pulled By: digantdesai
    
    fbshipit-source-id: 80fe1e4981c048653f808ef1ad9339997eb853a6
    freddan80 authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    b0a400c View commit details
    Browse the repository at this point in the history
  12. delegation debug page (#3254)

    Summary:
    Pull Request resolved: #3254
    
    Create a new page for the new util functions Chen and I made to debug delegations. These functions were well-received within the team as well as by partner teams including modai, thus I think it's important to call them out in our documentation. The examples were copied from the llm manual, but reworded a little bit to flow naturally in this doc.
    
    bypass-github-export-checks
    bypass-github-pytorch-ci-checks
    bypass-github-executorch-ci-checks
    
    Reviewed By: cccclai
    
    Differential Revision: D56491214
    
    fbshipit-source-id: 162b4ae75e79730218b0d669d1ec2a7a914b933c
    Olivia-liu authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    bf9888f View commit details
    Browse the repository at this point in the history
  13. update memory planning docs (#3270)

    Summary: Pull Request resolved: #3270
    
    Reviewed By: JacobSzwejbka
    
    Differential Revision: D56503511
    
    Pulled By: lucylq
    
    fbshipit-source-id: d9e39f32adf39761652feaccdb73344b4550a094
    lucylq authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    de0c233 View commit details
    Browse the repository at this point in the history
  14. DynamicShim for dlsym user (#3136)

    Summary:
    Add a shim layer so that users just need the header and load the symbol with dlsym.
    
    we will have two libraries:
    - header, where declarations and class (shim) are to be compiled with their codebase statically. Want to keep this minimal.
    - implementation, which pulls in the ET libraries and shim implementation. It’s compiled separately as a .so file and they can load and find symbols with dlopen and dlsym.
    
    Note that users only need to compile the header dynamic_shim.h into their code in compile time. dynamic_shim.h contains minimal dependency from ExecuTorch, so it won't impact static time binary size or startup time. The actual implementation dynamic_shim_impl is compiled into a separate shared library, which has all the ExecuTorch libraries. The shared library can be loaded later with dlopen.
    
    For users, they can now only load the so library, and just use dysym to look for exposed API `create_executorch_dynamic_shim` and `free_executorch_dynamic_shim`, and use API code in DynamicShim (as a pointer to an interface), and the DynamicShimImpl will invoke the actual ET Module code in its implementation details.
    
    Pull Request resolved: #3136
    
    Reviewed By: kimishpatel
    
    Differential Revision: D55025594
    
    Pulled By: kirklandsign
    
    fbshipit-source-id: a0b1fa90997dee920920e6f582dd51719c2958eb
    kirklandsign authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    b5bb921 View commit details
    Browse the repository at this point in the history
  15. Unsqueeze (#3172)

    Summary:
    Pull Request resolved: #3172
    
    Exploit the fact that, we reduce the unsqueeze operation to permute.
    
    ```
    torch.all(torch.permute(x.unsqueeze(0), [1, 0, 2, 3]) == x.unsqueeze(1))
    torch.all(torch.permute(x.unsqueeze(0), [1, 2, 0, 3]) == x.unsqueeze(2))
    torch.all(torch.permute(x.unsqueeze(0), [1, 2, 3, 0]) == x.unsqueeze(3))
    ```
    
    This diff introduce a minor change to the Permute implementation that it no longer requires the input dimension length to match the length of the permute array. This allows the `unsqueeze` operation to achieve a no-op `unsqueeze(0)` and then apply a permute.
    ghstack-source-id: 223698863
    
    Reviewed By: kimishpatel, SS-JIA
    
    Differential Revision: D56347734
    
    fbshipit-source-id: 7decc88aa74b4f355fb9497798d304cf5c0d6db1
    yipjustin authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    d053611 View commit details
    Browse the repository at this point in the history
  16. clone node (#3219)

    Summary:
    Pull Request resolved: #3219
    
    Introduce a clone node for copy operation.
    
    Also register `aten.clone` to this node. Important to note that during model export, possible to point the lvalue of `aten.clone` to the underlying shared object of the rvalue to achieve no-copy.
    ghstack-source-id: 223698862
    
    Reviewed By: copyrightly, SS-JIA, jorgep31415
    
    Differential Revision: D56441547
    
    fbshipit-source-id: a6d05e37ca7a0a0f15e50355e4e2a90a1735a962
    yipjustin authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    2dac5f3 View commit details
    Browse the repository at this point in the history
  17. add dynamic export into llm manual (#3202)

    Summary:
    Pull Request resolved: #3202
    
    This diff adds dynamic export into llm manual, including code and related comments.
    Also update other documentations for better understanding.
    
    Reviewed By: dbort
    
    Differential Revision: D56365041
    
    fbshipit-source-id: 5ce4c15206a2923c4d54811cefca03f72869c719
    Gasoonjia authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    66a350b View commit details
    Browse the repository at this point in the history
  18. Update readme. (#3301)

    Summary:
    Pull Request resolved: #3301
    overriding_review_checks_triggers_an_audit_and_retroactive_review
    Oncall Short Name: executorch
    
    Differential Revision: D56517032
    
    fbshipit-source-id: ec2f7fbb1111daf8bd529e0917be698bac3435f4
    shoumikhin authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    5b0030f View commit details
    Browse the repository at this point in the history
  19. Fix portable is[inf|nan|_out compilation on older Linux (#3272)

    Summary:
    By wrapping a potentially non-compliant `isinf`/`isnan` implementations into a lambda with a defined return type
    
    Compiler should be able to optimize it out into direct function call, see https://godbolt.org/z/bqYGd47Mx
    
    Pull Request resolved: #3272
    
    Reviewed By: GregoryComer
    
    Differential Revision: D56504717
    
    Pulled By: malfet
    
    fbshipit-source-id: 72da456027dbc837c3cfac83b18a5f002fedc3a5
    malfet authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    e25e5d2 View commit details
    Browse the repository at this point in the history
  20. Use relative links in llm/getting-started.md (#3244)

    Summary:
    Use relative markdown links instead of full URLs. This way, the docs will always point to a consistent branch.
    
    Pull Request resolved: #3244
    
    Test Plan: Clicked on all modified links in the rendered docs preview: https://docs-preview.pytorch.org/pytorch/executorch/3244/llm/getting-started.html
    
    Reviewed By: Gasoonjia
    
    Differential Revision: D56479234
    
    Pulled By: dbort
    
    fbshipit-source-id: 45fb25f017c73df8606c3fb861acafbdd82fec8c
    dbort authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    b560864 View commit details
    Browse the repository at this point in the history
  21. Update examples/README.md with Llama 3 and names (#3275)

    Summary:
    - Added Llama 3 8B
    - Added llm_manual in the list
    - changed name from Extensa to Cadence
    
    Pull Request resolved: #3275
    
    Reviewed By: Gasoonjia
    
    Differential Revision: D56524960
    
    Pulled By: iseeyuan
    
    fbshipit-source-id: 2b4464028fe3cdf3c2b524d233fa3e87b2561dda
    iseeyuan authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    98a7e66 View commit details
    Browse the repository at this point in the history
  22. Revert D56480274: Add more prebuilt artifacts

    Differential Revision:
    D56480274
    
    Original commit changeset: 451116a0f907
    
    Original Phabricator Diff: D56480274
    
    fbshipit-source-id: e9603e5076113560b1224a56432abf321f82e284
    malfet authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    727a68d View commit details
    Browse the repository at this point in the history
  23. update typos (#3300)

    Summary:
    Pull Request resolved: #3300
    
    This diff solves part of Ali's comments in our tracer sheet (https://docs.google.com/spreadsheets/d/1PoJt7P9qMkFSaMmS9f9j8dVcTFhOmNHotQYpwBySydI/edit#gid=0). Specifically speaking:
    
    "NanoGPT" -> "nanoGPT"
    "CoreML" -> "Core ML"
    "ExecuTorch Codebase" -> "ExecuTorch codebase"
    "Android Phone" -> "Android phone"
    "How to build Mobile Apps" -> "How to Build Mobile Apps"
    
    also shorten the following two column names for avoid overlapping.
    "occurrences_in_delegated_graphs" ->  "# in_delegated_graphs" "occurrences_in_non_delegated_graphs" -> # in_non_delegated_graphs
    
    Reviewed By: Jack-Khuu
    
    Differential Revision: D56513601
    
    fbshipit-source-id: 7015c2c5b94b79bc6c57c533ee812c9e58ab9d56
    Gasoonjia authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    b669056 View commit details
    Browse the repository at this point in the history
  24. Update readme.

    Summary: .
    
    Reviewed By: cccclai
    
    Differential Revision: D56532283
    
    fbshipit-source-id: 62d7c9e8583fdb5c9a1b2e781e80799c06682aae
    shoumikhin authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    ce1e9c1 View commit details
    Browse the repository at this point in the history
  25. Update custom kernel registration API

    Summary: As titled
    
    Reviewed By: lucylq, Gasoonjia, guangy10
    
    Differential Revision: D56532035
    
    fbshipit-source-id: ddf4f3864db0f200b97e67673a7086dac790eb82
    larryliu0820 authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    f6758fc View commit details
    Browse the repository at this point in the history
  26. llama2 readme (#3315)

    Summary:
    - add note for embedding quantize, for llama3
    - re-order export args to be the same as llama2, group_size missing `--`
    
    Pull Request resolved: #3315
    
    Reviewed By: cccclai
    
    Differential Revision: D56528535
    
    Pulled By: lucylq
    
    fbshipit-source-id: 4453070339ebdb3d782b45f96fe43d28c7006092
    lucylq authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    34f59ed View commit details
    Browse the repository at this point in the history
  27. Fix sdk_example_runner.sh (#3298)

    Summary: Pull Request resolved: #3298
    
    Reviewed By: Olivia-liu
    
    Differential Revision: D56509749
    
    Pulled By: tarun292
    
    fbshipit-source-id: 36b56e7cc039144105d64431697a16a793029af8
    tarun292 authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    aa3e736 View commit details
    Browse the repository at this point in the history
  28. Update readme.

    Summary: .
    
    Reviewed By: cccclai
    
    Differential Revision: D56535633
    
    fbshipit-source-id: 070a3b0af9dea234f8ae4be01c37c03b4e0a56e6
    shoumikhin authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    035aee4 View commit details
    Browse the repository at this point in the history
  29. Update MPS documentation; add helper script to build mps_executor_run…

    …ner (#3324)
    
    Summary:
    **Summary of changes**:
    - Update MPS documentation to reflect all changes since previous release
    - Add helper script to build `mps_executor_runner`
    
    **Testing**:
    - Verified that mps_executor_runner builds correctly:
    ```
    ./examples/apple/mps/scripts/build_mps_executor_runner.sh
    /examples/apple/mps/scripts/build_mps_executor_runner.sh --Debug
    ```
    Verified that the docs are building correctly:
    ```
    cd docs
    make html
    ```
    
    cc shoumikhin, cccclai
    
    Pull Request resolved: #3324
    
    Reviewed By: shoumikhin
    
    Differential Revision: D56535774
    
    Pulled By: cccclai
    
    fbshipit-source-id: 5974795732dbe1089e3d63cd1b618cadf7a2573e
    DenisVieriu97 authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    453ebad View commit details
    Browse the repository at this point in the history
  30. Remove the sorting of the nodes from partitioning (not needed for now…

    … as Custom Metal kernels are yet not enabled) (#3328)
    
    Summary:
    Remove the sorting of the nodes from partitioning (not needed for now as Custom Metal kernels are yet not enabled)
    
    **Testing**:
    Verified that tracing works correctly with release branch:  `python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3"`
    
    cc shoumikhin , cccclai
    
    Pull Request resolved: #3328
    
    Reviewed By: shoumikhin
    
    Differential Revision: D56540389
    
    Pulled By: cccclai
    
    fbshipit-source-id: e8a53f624b58ac4d2348c87e08acd5f2fb3de5b2
    DenisVieriu97 authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    9811eea View commit details
    Browse the repository at this point in the history
  31. copy node, aten.repeat (#3299)

    Summary:
    Pull Request resolved: #3299
    
    1. Introduce a `CopyNode` for generic copy-with-offset operations.
    2. `aten.repeat` on all dimensions.
    2.1 Use `CopyNode` where possible.
    2.2. Specialized `repeat_channel` shader to handle packings
    3. Update codegen to support `Methods` variant only operations. Need a new route to trigger the dispatch.
    ghstack-source-id: 223812048
    
    Reviewed By: copyrightly
    
    Differential Revision: D56499329
    
    fbshipit-source-id: 72936e621940588ce398dd62669ec9aa637e98ba
    yipjustin authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    b2c794a View commit details
    Browse the repository at this point in the history
  32. add buck2 installation into setup.md

    Summary: bring buck2 installation back, and scrub any "-DBUCK2=buck2" in our docs, to unblock users from using buck2
    
    Reviewed By: guangy10
    
    Differential Revision: D56540769
    
    fbshipit-source-id: 363e592c17dd2747a693e59d8d6b6d20f43c8451
    Gasoonjia authored and facebook-github-bot committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    590cbce View commit details
    Browse the repository at this point in the history

Commits on Apr 25, 2024

  1. register view, reshape and select

    Summary:
    - We register `select`, `unsqueeze` and `view` in `vulkan_partitioner.py` in order to run vulkan_delegate test (Python e2e test). The latter two might be used to implement `bmm` and `addmm`, so I want to make sure they work.
    - We register `reshape` in `View.cpp` explicitly. `reshape` is implemented through `_reshape_alias` (see [this](https://www.internalfb.com/code/fbsource/[a3dd6401f00d73f09bbdea63887fef54ea2c6dd2]/fbcode/caffe2/aten/src/ATen/native/native_functions.yaml?lines=4872-4881)) which is [decomposed as `view`](https://www.internalfb.com/code/fbsource/[bbb783ae1cff98b3b549da3edd845dde946d3da8]/xplat/caffe2/torch/_decomp/decompositions.py?lines=3669-3672). For codegen test, we still need to register the op, otherwise there is error
    ```
    C++ exception with description "Exception raised from get_op_fn at xplat/executorch/backends/vulkan/runtime/graph/ops/OperatorRegistry.cpp:20: (it != table_.end()) is false! Could not find operator with name aten.reshape.default" thrown in the test body.
    ```
    
    Reviewed By: yipjustin, liuk22
    
    Differential Revision: D56454941
    
    fbshipit-source-id: c83e6fb97d9cf9019cc6e786508f353a22236931
    copyrightly authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    b2a7243 View commit details
    Browse the repository at this point in the history
  2. Update llama2 readme file - main branch (#3340)

    Summary: Pull Request resolved: #3340
    
    Reviewed By: orionr, kimishpatel, cccclai
    
    Differential Revision: D56553088
    
    Pulled By: mergennachin
    
    fbshipit-source-id: 2994dd3ab2692c5b972316af1617bd06d647af96
    mergennachin authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    79b79cb View commit details
    Browse the repository at this point in the history
  3. Build custom ops in pybinding (#3263)

    Summary:
    Right now we are not building it and it is causing missing ops in torchchat.
    
    This PR adds it into pybinding.
    
    Pull Request resolved: #3263
    
    Reviewed By: lucylq
    
    Differential Revision: D56500693
    
    Pulled By: larryliu0820
    
    fbshipit-source-id: 0ed0e28fcccb6002ef48e6a38b60e92d8af4def6
    larryliu0820 authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    30128f3 View commit details
    Browse the repository at this point in the history
  4. Enable doc job to run on -rc tags. (#3345)

    Summary: Pull Request resolved: #3345
    
    Reviewed By: dbort
    
    Differential Revision: D56557091
    
    Pulled By: svekars
    
    fbshipit-source-id: 4300ca86d01ec110fc6934588cd691c12661a730
    svekars authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    fd63d0c View commit details
    Browse the repository at this point in the history
  5. Eliminate deprecated api usage (#2695)

    Summary: Pull Request resolved: #2695
    
    Reviewed By: mergennachin
    
    Differential Revision: D55091814
    
    fbshipit-source-id: 04b2a888c6bbdaa195cb916c6564aa93daca2514
    kirklandsign authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    8fcba36 View commit details
    Browse the repository at this point in the history
  6. Remove unneeded _to_copy in edge dialect.

    Summary: In executorch we will dtype-specialize the kernels and also run on a single device with export. Therefore _to_copy is not needed in edge dialect.
    
    Reviewed By: tugsbayasgalan
    
    Differential Revision: D56579169
    
    fbshipit-source-id: 5a2e3cd453a11bd2ad009b439587b0fc589f7fe4
    zhxchen17 authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    319a4f2 View commit details
    Browse the repository at this point in the history
  7. Extend setup cmake ability (#3349)

    Summary:
    For executorch users, we see a common pattern that they have to:
    
    ```bash
    bash install_requirements.sh --pybind xnnpack
    
    cmake -S . -Bcmake-out ...
    
    cmake --build ...
    ```
    
    This is repeating cmake build twice, the first one is inside setup.py.
    
    Here I'm adding a way to allow setup.py to install the libraries seperately, by passing `CMAKE_ARGS` and `CMAKE_BUILD_ARGS` into setup.py, through `install_requirements.sh`.
    
    After this change, user can do:
    
    ```bash
    export CMAKE_ARGS="-DCMAKE_INSTALL_PREFIX=<install dir> \
      -DEXECUTORCH_BUILD_OPTIMIZED=ON \
      ..."
    
    export CMAKE_BUILD_ARGS="--target install"
    
    bash install_requirements.sh --pybind xnnpack
    ```
    
    Then we should be able to find `libxnnpack.a` `liboptimized_ops_lib.a` etc under install dir.
    
    Pull Request resolved: #3349
    
    Reviewed By: mikekgfb
    
    Differential Revision: D56560786
    
    Pulled By: larryliu0820
    
    fbshipit-source-id: fb6cd230df2317067f07ae0f1e72d0596b7b454b
    larryliu0820 authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    8ec0af9 View commit details
    Browse the repository at this point in the history
  8. Half support for index op

    Reviewed By: cccclai
    
    Differential Revision: D56543186
    
    fbshipit-source-id: 4fed6b9b3ede3cdcb67a9a52150e3f22cc02b180
    manuelcandales authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    7b3b485 View commit details
    Browse the repository at this point in the history
  9. Add EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT cmake option (#3356)

    Summary:
    Currently, we always build two copies of the flatcc targets, just in case we happen to be cross-compiling. But because the flatcc project puts its binaries in the source directory, those two copies can interfere with each other.
    
    We don't need to build two copies when not cross-compiling, so add a new option to avoid the second "host" build.
    
    Eventually we should only enable this when cross-compiling, but for now disable it when building the pip package (which is never cross-compiled).
    
    Pull Request resolved: #3356
    
    Test Plan: `rm -rf pip-out && ./install_requirements.sh` succeeded. Looking in the `pip-out/temp.*/cmake-out` directory, there is no `_host_build` directory, but the etdump headers were successfully generated under `pip-out/temp.*/cmake-out/sdk/include/executorch/sdk/etdump/`.
    
    Reviewed By: malfet, larryliu0820
    
    Differential Revision: D56582507
    
    Pulled By: dbort
    
    fbshipit-source-id: 4ce6c680657bc57cfcf016826364a3f46c4c953e
    dbort authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    80d72f2 View commit details
    Browse the repository at this point in the history
  10. Export the ET_VERSION_DOCS variable in doc build (#3358)

    Summary: Pull Request resolved: #3358
    
    Reviewed By: dbort
    
    Differential Revision: D56584847
    
    Pulled By: svekars
    
    fbshipit-source-id: 77c4105edf15503bf1b29c1f120111a73b973c4c
    svekars authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    c32b0a2 View commit details
    Browse the repository at this point in the history
  11. Fix extension/data_loader installation (#3355)

    Summary:
    `libextension_data_loader.a` is not installed properly. This PR removes the prefix so that it can be properly installed
    
    Pull Request resolved: #3355
    
    Test Plan: See `libextension_data_loader.a` showing up under executorch/cmake-out/lib.
    
    Reviewed By: lucylq, mikekgfb
    
    Differential Revision: D56580943
    
    Pulled By: larryliu0820
    
    fbshipit-source-id: b771192d03799fd576e8591ec7c45fae23f20762
    larryliu0820 authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    c209e12 View commit details
    Browse the repository at this point in the history
  12. Reword "preview release" notice now that we are at alpha (#3364)

    Summary: Pull Request resolved: #3364
    
    Test Plan: https://docs-preview.pytorch.org/pytorch/executorch/3364/index.html
    
    Reviewed By: svekars
    
    Differential Revision: D56596949
    
    Pulled By: dbort
    
    fbshipit-source-id: f6c71e072bcefbb7d04354d1ef78d780c14facb5
    dbort authored and facebook-github-bot committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    7b3f5c6 View commit details
    Browse the repository at this point in the history

Commits on Apr 26, 2024

  1. Fix quantized_linear cpp op schema

    Summary: The cpp op schema does not match the registered one. Fix that.
    
    Reviewed By: tarun292, cccclai
    
    Differential Revision: D56594373
    
    fbshipit-source-id: cb4853030715245e7a0177c0f193c4558f19584d
    mcremon-meta authored and facebook-github-bot committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    44d4bac View commit details
    Browse the repository at this point in the history
  2. Add Disclaimer

    mergennachin committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    3fe25df View commit details
    Browse the repository at this point in the history