support `aten._trilinear` and improve `einsum` decomposition #3784

stbaione · 2024-10-11T14:02:36Z

Tracking

Issue
TorchToLinalg Op Support

Description

Aten_TrilinearOp is an implementation of a "trilinear einstein sum". Essentially, just an einsum across 3 tensors.

There are a few inputs:

Tensor Inputs

i1, i2, i3 - The three input tensors for the _trilinear op.

Expands

These inputs allow you to unsqueeze an input tensor at the specified dims as a pre-processing step to make the shapes compatible for the rest of the op:

expand1: List[int], expand2: List[int], expand3: List[int]

sumdim

sumdim: List[int] - After applying element wise multiplication, the values in sumdim denote where to collapse a dimension by summing over it

unroll_dim

unroll_dim: int - In the PyTorch implementation, this specifies a dimension where you could slice the input tensors, multiply and sum them, then concatenate the results in an output tensor. This complicates the implementation significantly, but doesn't change the result, so I opted against it. Along with that, a previously accepted path for solving this involved reusing the AtenEinsumOp, which also would also ignore this input.

Solution

After trying a bunch of more complicated approaches for it, this op actually ended up being quite simple: See _trilinear

_trilinear = (i1.unsqueeze(expand1) * i2.unsqueeze(expand2) * i3.unsqueeze(expand3)).sum(sumdim)

Wish I saw this earlier, but watcha gonna do: 🙃

Not Reusing AtenEinsumOp

Frankly, I found multiple cases where valid inputs would have numerical mismatches for EinsumOp, even when running tests against EinsumOp directly. I think it has something to do with the singleton dimensions. Will need to look into this further, but once I realized the simplified approach, it appeared to be more reliable and much simpler.

Either way (credit to @zjgarvey), there are improvements to the einsum op here. When I was originally trying to use the op, intermediate tensors were being flattened properly, but then its 0th dimension was being cast from a static dim to a dynamic dim due to integers not folding correctly in the MLIR. Figured it's worth keeping these improvements for future reusers of EinsumOp.

The zero'd out dim "bug"

For some reason, if you specify a dimension in all expands,

[expand1=[0], expand2=[0], expand3=[0]],
[expand1=[1], expand2=[1], expand3=[1]]

The _trilinear op would specify 0 for that dimension in the output shape, unless it was also included in sumdim. This goes against the implementation of torch.einsum:

>>> a, b, c = [torch.rand(1, 3, 3, 3) for i in range(3)] # Simulate expand at dim=0 for all input tensors
>>> torch.einsum('abcd,abcd,abcd->abcd', a, b, c).shape
torch.Size([1, 3, 3, 3])

And is just straight up incorrect mathematically. I considered "replacing" singleton dims with zeroed out dims, but that seemed like carrying over a bug. Instead, I included a test for the case, verified that the singleton dimensions were handled the way that torch.einsum handles it, instead of torch._trilinear, and xfailed it with a note as to why.

…ts a trilinear einstein sum. WIP, it currently builds, but fails at lowering to linalg

Lowers to torch backend, but unable to lower to linalg

…ephen-aten-_trilinear-op

There's a discrepancy between the way that _trilinear and einsum op handles the second test case (in torch python). Troubleshooting this discrepancy to try and figure out why/where the two ops differ.

Add more test cases, Add PyTorch _trilinear "bug" to xfail set

…ephen-aten-_trilinear-op

zjgarvey

Glad you got something working Stephen! The major review points here:

We need to fail the conversion in the cases we don't support. It's not sufficient to xfail the tests for unsupported cases, because a downstream user of the tool isn't going to run a big model and say "Oh this random shape is messed up, it must be that one esoteric e2e test for this one op that I saw one day". We need to report a match failure so that the op actually doesn't get converted, so we don't have model support people spending days debugging a silently failing conversion.
Related to 1. What does unroll dim do? It needs to be included, or if unrolldim !=0 we also need to report an "unimplemented" match failure.
Not really major, but glad we don't have to use einsum. I think the einsum changes are generally good, but it might be better to move them into a different patch. I'm fine leaving them in here, but the commit messaging will seem odd if anyone wants to trace back the history.

lib/Dialect/Torch/Transforms/DecomposeComplexOps.cpp

stbaione · 2024-10-18T16:27:34Z

Glad you got something working Stephen! The major review points here:

We need to fail the conversion in the cases we don't support. It's not sufficient to xfail the tests for unsupported cases, because a downstream user of the tool isn't going to run a big model and say "Oh this random shape is messed up, it must be that one esoteric e2e test for this one op that I saw one day". We need to report a match failure so that the op actually doesn't get converted, so we don't have model support people spending days debugging a silently failing conversion.

Related to 1. What does unroll dim do? It needs to be included, or if unrolldim !=0 we also need to report an "unimplemented" match failure.

Not really major, but glad we don't have to use einsum. I think the einsum changes are generally good, but it might be better to move them into a different patch. I'm fine leaving them in here, but the commit messaging will seem odd if anyone wants to trace back the history.

@zjgarvey

Is this referring to: "What about the case where something lies in the triple intersecton of the expand sets? I thought we were going to handle that case."? If so, I can go ahead and add a match failure. Explained the reasoning for why I left it as-is above, but makes sense to be more explicit about that case as to not cause downstream confusion.
Copying reply from above to make it easier to track responses:

The unrollDim allows slicing along a dimension across all tensors. 
Then you can do (slice1 * slice2 * slice3).sum(sumdim), and concat the result to the output tensor. 
It doesn't change the output of the function, and wasn't used in the EinsumOp approach, 
but its intent is to save space by processing the tensors in batches instead of the entire tensors at once.

I can look into extending the solution to use this

I agree, it's way more straightforward doing it this way. Maybe I should edit my PR title to include the changes for einsum op? After merging main, the changes actually seem to have fixed 5 tests that were xfailed in fx_importer_stablehlo pipeline.

zjgarvey · 2024-10-18T17:08:14Z

1. Is this referring to: "What about the case where something lies in the triple intersecton of the expand sets? I thought we were going to handle that case."? If so, I can go ahead and add a match failure. Explained the reasoning for why I left it as-is above, but makes sense to be more explicit about that case as to not cause downstream confusion.

If it is a genuine bug, let's at least file an issue in pytorch and emit a warning.
If it is not a genuine bug, then we need to mimic the pytorch behavior.

2. Copying reply from above to make it easier to track responses:

The unrollDim allows slicing along a dimension across all tensors. 
Then you can do (slice1 * slice2 * slice3).sum(sumdim), and concat the result to the output tensor. 
It doesn't change the output of the function, and wasn't used in the EinsumOp approach, 
but its intent is to save space by processing the tensors in batches instead of the entire tensors at once.

I can look into extending the solution to use this

I don't think we will need to implement this, but no point in reporting a match failure if the unrollDim is non-constant. Just make a comment somewhere in the conversion that the unrollDim does not change the result of the operation, so we do not use it in the conversion.

3. I agree, it's way more straightforward doing it this way. Maybe I should edit my PR title to include the changes for einsum op? After merging main, the changes actually seem to have fixed 5 tests that were xfailed in `fx_importer_stablehlo` pipeline.

Ah, good that it resolves some failing tests. We should rename the title to something like "support aten._trilinear and improve einsum decomposition". No TorchToLinalg flag, since this is a decomposition that affects other backends too.

…s not included in sumDim, Add note in func description that `unrollDim` is unused

stbaione · 2024-10-18T19:26:22Z

1. Is this referring to: "What about the case where something lies in the triple intersecton of the expand sets? I thought we were going to handle that case."? If so, I can go ahead and add a match failure. Explained the reasoning for why I left it as-is above, but makes sense to be more explicit about that case as to not cause downstream confusion.
If it is a genuine bug, let's at least file an issue in pytorch and emit a warning. If it is not a genuine bug, then we need to mimic the pytorch behavior.

PyTorch bug filed here and emitWarning added.

2. Copying reply from above to make it easier to track responses:
The unrollDim allows slicing along a dimension across all tensors. 
Then you can do (slice1 * slice2 * slice3).sum(sumdim), and concat the result to the output tensor. 
It doesn't change the output of the function, and wasn't used in the EinsumOp approach, 
but its intent is to save space by processing the tensors in batches instead of the entire tensors at once.

I can look into extending the solution to use this
I don't think we will need to implement this, but no point in reporting a match failure if the unrollDim is non-constant. Just make a comment somewhere in the conversion that the unrollDim does not change the result of the operation, so we do not use it in the conversion.

Comment that unrollDim does not impact output and is unused included in function description.

3. I agree, it's way more straightforward doing it this way. Maybe I should edit my PR title to include the changes for einsum op? After merging main, the changes actually seem to have fixed 5 tests that were xfailed in `fx_importer_stablehlo` pipeline.
Ah, good that it resolves some failing tests. We should rename the title to something like "support aten._trilinear and improve einsum decomposition". No TorchToLinalg flag, since this is a decomposition that affects other backends too.

Updated title of PR to: "support aten._trilinear and improve einsum decomposition"

stbaione added 12 commits October 10, 2024 15:03

CHECKPOINT: Initial implementation of ate::_trilinear, which implemen…

f84cc49

…ts a trilinear einstein sum. WIP, it currently builds, but fails at lowering to linalg

Add op to populateLinearPatternsAndLegality

9db2ea4

Add description of method

e3ff689

Current state

f2a4e9b

Remove left over debug print statements

cfe34ba

Use existing AtenEinsumOp for lowering _trilinear,

b9cf6d8

Lowers to torch backend, but unable to lower to linalg

Merge branch 'main' of https://github.com/stbaione/torch-mlir into st…

d8a45c7

…ephen-aten-_trilinear-op

WIP, reimplement trilinear einsum to use AtenEinsumOp directly.

5dc671d

There's a discrepancy between the way that _trilinear and einsum op handles the second test case (in torch python). Troubleshooting this discrepancy to try and figure out why/where the two ops differ.

Simplify implementation of _trilinear,

5fe7a95

Add more test cases, Add PyTorch _trilinear "bug" to xfail set

Add trilinear tests to onnx xfail

a41b939

Add trilinear to appropriate xfail sets

27aeb0e

Merge branch 'main' of https://github.com/stbaione/torch-mlir into st…

d4a18e8

…ephen-aten-_trilinear-op

stbaione changed the title ~~Implementation of torch.ops.aten._trilinear~~ [TorchToLinalg] Implementation of torch.ops.aten._trilinear Oct 17, 2024

Add to appropriate xfail sets for fx_import and fx_importer_stablehlo

c58293c

stbaione marked this pull request as ready for review October 18, 2024 00:34

stbaione and others added 3 commits October 18, 2024 09:09

Fix function description

99d74e6

Merge branch 'main' into stephen-aten-_trilinear-op

a6026b6

Remove passing Einsum ops from fx_importer_stablehlo xfail set

b78fca8

zjgarvey requested changes Oct 18, 2024

View reviewed changes

stbaione changed the title ~~[TorchToLinalg] Implementation of torch.ops.aten._trilinear~~ support aten._trilinear and improve einsum decomposition Oct 18, 2024

Add check for case with a triple intersecting expand dimension that i…

4df4f16

…s not included in sumDim, Add note in func description that `unrollDim` is unused

stbaione requested a review from zjgarvey October 18, 2024 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support `aten._trilinear` and improve `einsum` decomposition #3784

support `aten._trilinear` and improve `einsum` decomposition #3784

stbaione commented Oct 11, 2024 •

edited

Loading

zjgarvey left a comment

stbaione commented Oct 18, 2024 •

edited

Loading

zjgarvey commented Oct 18, 2024

stbaione commented Oct 18, 2024

support aten._trilinear and improve einsum decomposition #3784

Are you sure you want to change the base?

support aten._trilinear and improve einsum decomposition #3784

Conversation

stbaione commented Oct 11, 2024 • edited Loading

Tracking

Description

Tensor Inputs

Expands

sumdim

unroll_dim

Solution

Not Reusing AtenEinsumOp

The zero'd out dim "bug"

zjgarvey left a comment

Choose a reason for hiding this comment

stbaione commented Oct 18, 2024 • edited Loading

zjgarvey commented Oct 18, 2024

stbaione commented Oct 18, 2024

support `aten._trilinear` and improve `einsum` decomposition #3784

support `aten._trilinear` and improve `einsum` decomposition #3784

stbaione commented Oct 11, 2024 •

edited

Loading

stbaione commented Oct 18, 2024 •

edited

Loading