optimize convolution of transpose #103

Pangoraw · 2024-07-17T21:04:39Z

I wrote the optimization we discussed about but for stablehlo.convolution.

    %0 = stablehlo.transpose %arg0, dims = [3, 2, 1, 0] : (tensor<5x3x224x224xf32>) -> tensor<224x224x3x5xf32>
    %1 = stablehlo.transpose %arg1, dims = [3, 2, 1, 0] : (tensor<2x3x10x10xf32>) -> tensor<10x10x3x2xf32>
    %2 = stablehlo.convolution(%0, %1) dim_numbers = [0, 1, f, b]x[0, 1, i, o]->[0, 1, f, b], window = {stride = [1, 1], lhs_dilate = [1, 1], rhs_dilate = [1, 1]} {batch_group_count = 1 : i64, feature_group_count = 1 : i64} : (tensor<224x224x3x5xf32>, tensor<10x10x3x2xf32>) -> tensor<215x215x2x5xf32>

to

    %0 = stablehlo.convolution(%arg0, %arg1) dim_numbers = [b, f, 1, 0]x[o, i, 1, 0]->[0, 1, f, b], window = {stride = [1, 1], lhs_dilate = [1, 1], rhs_dilate = [1, 1]} {batch_group_count = 1 : i64, feature_group_count = 1 : i64} : (tensor<5x3x224x224xf32>, tensor<2x3x10x10xf32>) -> tensor<215x215x2x5xf32>

There is still the case where we have transpose(conv) which could be optimized if there is only one user of the convolution to be implemented next.

@mofeing We can call to do a similar optimization for Einsum, it seems it is not present in the spec so you may be able to help me here.

wsmoses · 2024-07-17T21:23:55Z

Are there any special cases by chance where we don't need to add a transpose to the end? this is still beneficial, but moving the transpose isn't always guaranteed to improve perf since when something is transposed can change how a buffer is materialized [e.g. choosing to create the explicit new transpose in memory is expensive, but fusing it with a previous op may be cheap -- so its value as a perf boost becomes context dependent].

At minimum I suppose if there's also a transpose after the convolve it would be free, since the two later transposes would be fused.

wsmoses · 2024-07-17T21:25:11Z

By a similar token, it might be interesting to also have a transpose(conv(x, y)) -> conv(transpose(x), transpose(y)) option -- but probably not both turned on at the same time.

I think einsum may end up being an interesting special case where it may be possible to always fuse the transpose into the einsum, making it strictly beneficial if so.

Other cool opts this make me think of could include:
einsum -> dotgeneral/convolve/sum/etc
convolve(convolve(x)) -> convolve(x)

mofeing · 2024-07-18T08:47:44Z

@mofeing We can call to do a similar optimization for Einsum, it seems it is not present in the spec so you may be able to help me here.

einsum and unary_einsum (among others) are deprecated from StableHLO. Mainly because they can be rewritten with other operations (i.e. dot_general and transpose).

I need to rewrite my code so it uses dot_general instead but meanwhile, doing a pass for unary_einsum and einsum should be almost equal to the DotGeneralTranspose pass.

Pangoraw · 2024-07-19T07:32:22Z

Are there any special cases by chance where we don't need to add a transpose to the end?

There isn't actually a need to add a transpose after the convolution. The test input has a transpose since it was actually generated with Reactant (I removed it for clarity).

The optimization applies the transpose permutation to the dimension_numbers attribute of the convolution operation which specifies which dimensions are feature/batch/spatials in the inputs/output. This will effectively change the strides of the accesses during the convolution computation instead of materializing a transposed buffer.

wsmoses · 2024-07-27T17:55:05Z

unfortunately needs rebase per just merging einsum one but otherwise lgtm [and not sure why CI is borked but we can ignore for now]

Pangoraw added 2 commits July 27, 2024 22:23

optimize convolution of transpose

242b958

remove transpose after conv in test input

ba12928

Pangoraw force-pushed the conv-transpose branch from 3282b2c to ba12928 Compare July 27, 2024 20:24

wsmoses merged commit 1886bb4 into EnzymeAD:main Jul 27, 2024
2 of 4 checks passed

Pangoraw deleted the conv-transpose branch July 27, 2024 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize convolution of transpose #103

optimize convolution of transpose #103

Uh oh!

Pangoraw commented Jul 17, 2024

Uh oh!

wsmoses commented Jul 17, 2024

Uh oh!

wsmoses commented Jul 17, 2024 •

edited

Loading

Uh oh!

mofeing commented Jul 18, 2024

Uh oh!

Pangoraw commented Jul 19, 2024

Uh oh!

wsmoses commented Jul 27, 2024

Uh oh!

Uh oh!

Uh oh!

optimize convolution of transpose #103

optimize convolution of transpose #103

Uh oh!

Conversation

Pangoraw commented Jul 17, 2024

Uh oh!

wsmoses commented Jul 17, 2024

Uh oh!

wsmoses commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mofeing commented Jul 18, 2024

Uh oh!

Pangoraw commented Jul 19, 2024

Uh oh!

wsmoses commented Jul 27, 2024

Uh oh!

Uh oh!

Uh oh!

wsmoses commented Jul 17, 2024 •

edited

Loading