Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix as_strided symint #1401

Merged

Conversation

antoniojkim
Copy link
Collaborator

@antoniojkim antoniojkim commented Sep 22, 2022

Upstream PyTorch pushed out a change to as_strided_copy and as_strided_scatter for their Symbolic Integer effort. This caused a break in LTC.

Adding those ops to the symint list should fix things.

Also adding support for hardtanh to pass tests

#1396

@antoniojkim antoniojkim self-assigned this Sep 22, 2022
@antoniojkim antoniojkim mentioned this pull request Sep 22, 2022
Copy link
Contributor

@silvasean silvasean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix!

@silvasean
Copy link
Contributor

I'm curious: how did you figure out that updating the yaml file was the correct fix?

This is the error I see and it's not obvious to me how this PR fixes it:

2022-09-21T15:38:32.0663610Z /main_checkout/torch-mlir/libtorch/include/c10/util/Optional.h:786:29: error: no matching function for call to 'convert'
2022-09-21T15:38:32.0664092Z     return *this ? **this : detail_::convert<T>(constexpr_forward<V>(v));
2022-09-21T15:38:32.0664473Z                             ^~~~~~~~~~~~~~~~~~~
2022-09-21T15:38:32.0665534Z /main_checkout/torch-mlir/build_oot/python/torch_mlir/csrc/base_lazy_backend/generated/LazyNativeFunctions.cpp:726:89: note: in instantiation of function template specialization 'c10::optional<long>::value_or<at::Tensor>' requested here
2022-09-21T15:38:32.0666192Z         LazyTensorPtr lazy_storage_offset = torch::lazy::TryGetLtcTensor(storage_offset.value_or(at::Tensor()));
2022-09-21T15:38:32.0666576Z                                                                                         ^
2022-09-21T15:38:32.0667309Z /main_checkout/torch-mlir/libtorch/include/c10/util/Optional.h:147:13: note: candidate function template not viable: no known conversion from 'at::Tensor' to 'long' for 1st argument
2022-09-21T15:38:32.0668030Z constexpr U convert(U v) {
2022-09-21T15:38:32.0668244Z             ^
2022-09-21T15:38:32.0668998Z /main_checkout/torch-mlir/build_oot/python/torch_mlir/csrc/base_lazy_backend/generated/LazyNativeFunctions.cpp:726:74: error: reference to type 'const at::Tensor' could not bind to an rvalue of type 'long'
2022-09-21T15:38:32.0669594Z         LazyTensorPtr lazy_storage_offset = torch::lazy::TryGetLtcTensor(storage_offset.value_or(at::Tensor()));
2022-09-21T15:38:32.0669986Z                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-09-21T15:38:32.0670531Z /main_checkout/torch-mlir/libtorch/include/torch/csrc/lazy/core/tensor.h:219:59: note: passing argument to parameter 'tensor' here
2022-09-21T15:38:32.0671116Z TORCH_API LazyTensorPtr TryGetLtcTensor(const at::Tensor& tensor);
2022-09-21T15:38:32.0671455Z                                                           ^
2022-09-21T15:38:32.0672114Z /main_checkout/torch-mlir/build_oot/python/torch_mlir/csrc/base_lazy_backend/generated/LazyNativeFunctions.cpp:730:103: error: no viable conversion from 'ArrayRef<int64_t>' to 'ArrayRef<SymInt>'
2022-09-21T15:38:32.0672744Z         auto out_meta = at::compositeexplicitautogradnonfunctional::as_strided_copy_symint(self_meta, size, stride, storage_offset);
2022-09-21T15:38:32.0673181Z                                                                                                       ^~~~

@antoniojkim
Copy link
Collaborator Author

antoniojkim commented Sep 22, 2022

I built it locally and looked to see what op the error was happening for

/main_checkout/torch-mlir/build_oot/python/torch_mlir/csrc/base_lazy_backend/generated/LazyNativeFunctions.cpp:726

It happened to be as_strided_copy which lined up with this commit added yesterday to upstream PyTorch
pytorch/pytorch@3eb2722

And reading through that commit, they changed as_strided_copy/scatter to have symint support, so all we needed to do was make sure we were generating the symint versions of those ops too

@silvasean
Copy link
Contributor

"e2e_testing.torchscript.main" should be "e2e_testing.main" to fix the python issue.

@ashay
Copy link
Collaborator

ashay commented Sep 22, 2022

@antoniojkim Is this likely to take longer? If so, we should disable LTC and try this patch offline, so that we don't block folks from merging PRs.

@dellis23
Copy link
Collaborator

Can we come up with a generally applicable flow for these situations? I lean toward what Sean proposed of get the "rollback done" (though it's not exactly a rollback in this case) and then fix separately.

Is the worry that we might end up committing additional code while the LTC tests are disabled that would break LTC more and make fixing even worse? Or would we also potentially generate releases that are broken for LTC?

@powderluv
Copy link
Collaborator

IMHO Revert first. Disable second. Fix third to get a tree green.

@antoniojkim antoniojkim force-pushed the antoniojkim/fix_as_strided_copy_symint branch from 30c3816 to f35a956 Compare September 22, 2022 21:16
@ashay
Copy link
Collaborator

ashay commented Sep 22, 2022

+1 to getting the build green ASAP and fixing/reenabling it right after. There's perhaps just an implicit agreement among folks here that broken builds are a big deal because they prevent everyone from making progress, but maybe we've never formalized it.

There have been some prior discussions about working around API-breaking changes in PyTorch, but I fail to recollect the conclusions. Perhaps we can revisit how to handle PyTorch updates in the Monday meeting? I'm happy to help with changes to our builds/tests, if those are necessary.

@powderluv
Copy link
Collaborator

@ashay yeah I think it is time to Pin Pytorch and roll only when tests pass. There is a tracker for it #1328 . I had a WIP idea I will post in there to discuss and if you are willing to take over to land it please do.

@silvasean
Copy link
Contributor

@ashay yeah I think it is time to Pin Pytorch and roll only when tests pass. There is a tracker for it #1328 . I had a WIP idea I will post in there to discuss and if you are willing to take over to land it please do.

Yeah, I think that is a good idea. I've added a note to discuss this in the community meeting two weeks from now. We can discuss at the developer hour too.

@antoniojkim
Copy link
Collaborator Author

I think it is time to Pin Pytorch and roll only when tests pass.

I'm in favour of this idea as well

@antoniojkim antoniojkim force-pushed the antoniojkim/fix_as_strided_copy_symint branch 4 times, most recently from 3223bf4 to e6aa311 Compare September 23, 2022 22:22
@antoniojkim antoniojkim force-pushed the antoniojkim/fix_as_strided_copy_symint branch from e6aa311 to a525993 Compare September 26, 2022 13:23
@antoniojkim antoniojkim merged commit 3e27aa2 into llvm:main Sep 26, 2022
qedawkins pushed a commit to nod-ai/torch-mlir that referenced this pull request Oct 3, 2022
Signed-off-by: Whitney Tsang <whitneyt@ca.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants