CI: Spot fixes related to nightly and stable PyTorch builds #2190

ashay · 2023-06-02T04:13:56Z

This PR contains two patches:

CI: Skip (redundant) libtorch build when using stable PyTorch version

When we use PyTorch stable builds, there is no need to build libtorch
from source, making the stable-pytorch-with-torch-binary-OFF
configuration redundant with stable-pytorch-with-torch-binary-ON. This
patch drops the redundant configuration from CI.

CI: Simplify guard conditions for creating and using libtorch cache

Whether libtorch is enabled or not is predicated on a host of conditions
such as the platform, in-tree versus out-of-tree build, and stable
versus nightly PyTorch builds. Instead of repeating these conditions to
guard whether to create or use the libtorch cache artifacts (and getting
them almost incorrect), this patch predicates the relevant pipeline
steps to whether libtorch is enabled, thus making the conditions far
simpler.

When we use PyTorch stable builds, there is no need to build libtorch from source, making the stable-pytorch-with-torch-binary-OFF configuration redundant with stable-pytorch-with-torch-binary-ON. This patch drops the redundant configuration from CI.

Whether libtorch is enabled or not is predicated on a host of conditions such as the platform, in-tree versus out-of-tree build, and stable versus nightly PyTorch builds. Instead of repeating these conditions to guard whether to create or use the libtorch cache artifacts (and getting them almost incorrect), this patch predicates the relevant pipeline steps to whether libtorch is enabled, thus making the conditions far simpler.

ashay · 2023-06-02T04:16:07Z

@maxbartel Somehow I am not able to add you as a reviewer. Could you still take a look at these changes please?

powderluv · 2023-06-02T05:58:38Z

landing to catch the next builds

maxbartel · 2023-06-02T08:12:11Z

LGTM. Sorry I wasn't fast enough because of time zone differences

powderluv · 2023-06-02T16:12:04Z

all good we are all one team :)

* update PyTorch version to 2.1.0.dev20230523 (llvm#2148) - torch version: 2.1.0.dev20230523 - torch commit hash: 981d4c2578d10d8a96d173471802fc2812541fb1 - torchvision version: 0.16.0.dev20230523 Co-authored-by: Roll PyTorch Action <torch-mlir@users.noreply.github.com> * [Torch Dialect] Add split.tensor support + recompose rules (llvm#2102) * add split.tensor support + recompose rules * add e2e test * address comments * address comments * erase op in recomposeOp --------- Co-authored-by: zhekun.zhang <zhekun.zhang@bytedance.com> * [Stablehlo] Add `AtenIndexTensor` StableHlo support (llvm#2107) * Add AtenIndexTensor StableHlo support * clean up * Empty commit, trigger test * try to debug hanging test * fix segfulat * fix bad include --------- Co-authored-by: zhekun.zhang <zhekun.zhang@bytedance.com> * [arm64] Fix release builds for ARM64 (llvm#2157) Tested on Ubuntu 23.04 on Ampere Altra instance. * [Stablehlo] Add aten.uniform lowering (llvm#2101) * add uniform stablehlo lowering * add unit test * new line * rm redundant file * Empty commit, trigger test * fix include * address comments --------- Co-authored-by: zhekun.zhang <zhekun.zhang@bytedance.com> * update PyTorch version to 2.1.0.dev20230525 (llvm#2167) - torch version: 2.1.0.dev20230525 - torch commit hash: eb2ef134b4e834a9b8a8b6de86ddd7d2780ce0ac - torchvision version: 0.16.0.dev20230525 Co-authored-by: Roll PyTorch Action <torch-mlir@users.noreply.github.com> * CI: disable caching for release builds (llvm#2168) This patch adds a (default-true) input called `cache-enabled` to the setup-build action, so that when the input is false, ccache is not setup on the host machine. This patch also sets the input to be false for the release builds. * Add alias analysis for cast-like ops to maximize-value-semantics (llvm#2160) When `use_tracing=True` is used to import a model into Torch-MLIR, several casts get inserted in the IR to bridge the untyped inputs and outputs with the typed body of the computation. These casts create extra aliases of tensors that cause the current analysis in `maximize-value-semantics` to fail. In particular, the `maximize-value-semantics` analysis assumes that the only valid alias right after an overwrite is the overwritten alias. So, if there is a use of a casted version of the overwritten alias after the overwrite, the analysis fails. This commit improves the analysis by identifying all cast-like aliases of the overwritten alias and allowing such aliases to be used after an overwrite. Because this issue only arises when using tracing, it cannot be currently tested e2e, so only lit test is added. * only setup python for non-docker platforms (llvm#2171) Original PR was accidentally merged to a branch. Re-landing same PR to main now * Remove spurious pip in Release builds (llvm#2172) (left over from a previous commit that was approved and landed in a branch on accident) * [Torch Op] Add AtenChunkOp support (llvm#2152) * add chunkOp support * update LTC xfail list * address comments * address comments --------- Co-authored-by: zhekun.zhang <zhekun.zhang@bytedance.com> * Add ARM64 release builds (llvm#2159) Creates a build_linux_arm64 job that builds the release on an arm64 self-hosted runner. Drop Python 3.10 support Pass TM_TORCH_VERSION to choose the Stable PyTorch version (since arm64 doesn't have nightly builds) Borrows nightly / stable Pytorch switch from the WIP llvm#2038 * Delete another spurious pip (llvm#2173) * update PyTorch version to 2.1.0.dev20230526 (llvm#2175) - torch version: 2.1.0.dev20230526 - torch commit hash: 10b46f7c7f69f9bf705d2b6ea53efb9c59145685 - torchvision version: 0.16.0.dev20230526 Co-authored-by: Roll PyTorch Action <torch-mlir@users.noreply.github.com> * [Stablehlo] Enable Stablehlo backend with arith dialect (llvm#2139) * Add correct type checking for tm_tensor.attention * [TM_TENSOR] Add `aten.scatter.[src|value]` op This commit adds support of `aten.scatter.src` and `aten.scatter.value` ops. Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com> * [MLIR][TORCH] Add support for the total_weight for aten.nll_loss_forward op Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com> * Add Stable PyTorch CI Pipeline (llvm#2038) * feat: split pytorch requirements into stable and nightly * fix: add true to tests to see full output * refactor: add comments to explain true statement * feat: move some tests to experimental mode * refactor: refactor pipeline into more fine grained difference * feat: add version differentiation for some tests * feat: activate more configs * refactor: change implementation to use less requirement files * refactor: remove contraints used for testing * fix: revert some requirement file names * refactor: remove unnecessary ninja install * fix: fix version parsing * refactor: remove dependency on torchvision in main requirements file * refactor: remove index url * style: remove unnecesary line switch * fix: readd index url * Add `ReadOnly` trait to `copy.to_vtensor` (llvm#2179) Before inlining a global slot, the users of the global slot are checked to see if they are `ReadOnly` or `MemoryEffectFree` to make sure that the global slot is not being mutated. Because the op `copy.to_vtensor` currently does not have the `ReadOnly` trait, if a global slot is passed to `copy.to_vtensor`, the pass `InlineGlobalSlots` will fail. The op `copy.to_vtensor` is `ReadOnly`, since it does not modify the contents of the input tensor; it simply makes a new copy. This commit adds the trait as well as an e2e test that generates the case of a global slot being passed to a `copy.to_vtensor`. * [Importer] import constant tuple (llvm#2132) * [Importer] import constant tuple * update * update * update * update PyTorch version to 2.1.0.dev20230531 (llvm#2188) - torch version: 2.1.0.dev20230531 - torch commit hash: 48552338649ccc467060f5f93dbe19e2acbc4d1a - torchvision version: 0.16.0.dev20230531 Co-authored-by: Roll PyTorch Action <torch-mlir@users.noreply.github.com> * [Torch Dialect] Add support for AtenScalarTensorOp (llvm#2085) * add scalar_tensor op * add dynamo pass test; needs PR2062 * try to fix * Empty commit, trigger test * Empty commit, trigger test * address comments * use dtype function * fix decompose rule * remove unused include * Empty commit, trigger test * fix test * disable ltc * fix dtype --------- Co-authored-by: zhekun.zhang <zhekun.zhang@bytedance.com> * update PyTorch version to 2.1.0.dev20230601 (llvm#2189) * [LINALG] Add dynamic support for `PrimMinIntOp` * Fix types + off-by-1 error, clamp `end` in slice+copy_ recomposition The `copy_` op being replaced by `RecomposeSliceCopy_` operates on a subset of the tensor being mutated, while the `index_put` op being used to replace the `copy_` op operates on the entire tensor being mutated. This means that the result type of the `index_put` should be the type of the input to `index_put` and we need to make sure that `copy_` does not have users before replacing to avoid type conflicts. This commit also fixes the result type used for the `AtenArangeStartStepOp`, and an off-by-1 error when creating the indices vector. Lastly, this commit also clamps the `end` value from the slice to the size of the dimension. * CI: Spot fixes related to nightly and stable PyTorch builds (llvm#2190) * CI: Skip (redundant) libtorch build when using stable PyTorch version When we use PyTorch stable builds, there is no need to build libtorch from source, making the stable-pytorch-with-torch-binary-OFF configuration redundant with stable-pytorch-with-torch-binary-ON. This patch drops the redundant configuration from CI. * CI: Simplify guard conditions for creating and using libtorch cache Whether libtorch is enabled or not is predicated on a host of conditions such as the platform, in-tree versus out-of-tree build, and stable versus nightly PyTorch builds. Instead of repeating these conditions to guard whether to create or use the libtorch cache artifacts (and getting them almost incorrect), this patch predicates the relevant pipeline steps to whether libtorch is enabled, thus making the conditions far simpler. * update PyTorch version to 2.1.0.dev20230602 (llvm#2191) - torch version: 2.1.0.dev20230602 - torch commit hash: 52c7a761c5cb6ae94acf2298827309fba3dbc0f4 - torchvision version: 0.16.0.dev20230602 Co-authored-by: Roll PyTorch Action <torch-mlir@users.noreply.github.com> * update PyTorch version to 2.1.0.dev20230603 (llvm#2193) - torch version: 2.1.0.dev20230603 - torch commit hash: 7726721661ea114acb81a860519d0a1501d88fca - torchvision version: 0.16.0.dev20230603 Co-authored-by: Roll PyTorch Action <torch-mlir@users.noreply.github.com> * update PyTorch version to 2.1.0.dev20230604 (llvm#2195) - torch version: 2.1.0.dev20230604 - torch commit hash: 810edae5137bdc0cd25ac2f133d6633d6146b1e9 - torchvision version: 0.16.0.dev20230604 Co-authored-by: Roll PyTorch Action <torch-mlir@users.noreply.github.com> --------- Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com> Co-authored-by: Sean Silva <silvasean@google.com> Co-authored-by: Roll PyTorch Action <torch-mlir@users.noreply.github.com> Co-authored-by: Zhekun Zhang <32320144+zhekunz2@users.noreply.github.com> Co-authored-by: zhekun.zhang <zhekun.zhang@bytedance.com> Co-authored-by: powderluv <powderluv@users.noreply.github.com> Co-authored-by: Ashay Rane <ashay@users.noreply.github.com> Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com> Co-authored-by: Yuanqiang Liu <liuyuanqiang.yqliu@bytedance.com> Co-authored-by: George Petterson <gpetters@protonmail.com> Co-authored-by: Gaurav Shukla <gaurav@nod-labs.com> Co-authored-by: Vivek Khandelwal <vivekkhandelwal1424@gmail.com> Co-authored-by: maxbartel <maximilian.bartel@amd.com>

* CI: Skip (redundant) libtorch build when using stable PyTorch version When we use PyTorch stable builds, there is no need to build libtorch from source, making the stable-pytorch-with-torch-binary-OFF configuration redundant with stable-pytorch-with-torch-binary-ON. This patch drops the redundant configuration from CI. * CI: Simplify guard conditions for creating and using libtorch cache Whether libtorch is enabled or not is predicated on a host of conditions such as the platform, in-tree versus out-of-tree build, and stable versus nightly PyTorch builds. Instead of repeating these conditions to guard whether to create or use the libtorch cache artifacts (and getting them almost incorrect), this patch predicates the relevant pipeline steps to whether libtorch is enabled, thus making the conditions far simpler.

ashay added 2 commits June 1, 2023 22:59

ashay requested a review from powderluv June 2, 2023 04:16

powderluv approved these changes Jun 2, 2023

View reviewed changes

powderluv merged commit 755d0c4 into llvm:main Jun 2, 2023

ashay deleted the ashay/pytorch-cache-artifact branch June 2, 2023 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Spot fixes related to nightly and stable PyTorch builds #2190

CI: Spot fixes related to nightly and stable PyTorch builds #2190

ashay commented Jun 2, 2023

ashay commented Jun 2, 2023

powderluv commented Jun 2, 2023

maxbartel commented Jun 2, 2023

powderluv commented Jun 2, 2023

CI: Spot fixes related to nightly and stable PyTorch builds #2190

CI: Spot fixes related to nightly and stable PyTorch builds #2190

Conversation

ashay commented Jun 2, 2023

ashay commented Jun 2, 2023

powderluv commented Jun 2, 2023

maxbartel commented Jun 2, 2023

powderluv commented Jun 2, 2023