Skip to content

optimize convolution of transpose #103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 27, 2024
Merged

Conversation

Pangoraw
Copy link
Collaborator

I wrote the optimization we discussed about but for stablehlo.convolution.

    %0 = stablehlo.transpose %arg0, dims = [3, 2, 1, 0] : (tensor<5x3x224x224xf32>) -> tensor<224x224x3x5xf32>
    %1 = stablehlo.transpose %arg1, dims = [3, 2, 1, 0] : (tensor<2x3x10x10xf32>) -> tensor<10x10x3x2xf32>
    %2 = stablehlo.convolution(%0, %1) dim_numbers = [0, 1, f, b]x[0, 1, i, o]->[0, 1, f, b], window = {stride = [1, 1], lhs_dilate = [1, 1], rhs_dilate = [1, 1]} {batch_group_count = 1 : i64, feature_group_count = 1 : i64} : (tensor<224x224x3x5xf32>, tensor<10x10x3x2xf32>) -> tensor<215x215x2x5xf32>

to

    %0 = stablehlo.convolution(%arg0, %arg1) dim_numbers = [b, f, 1, 0]x[o, i, 1, 0]->[0, 1, f, b], window = {stride = [1, 1], lhs_dilate = [1, 1], rhs_dilate = [1, 1]} {batch_group_count = 1 : i64, feature_group_count = 1 : i64} : (tensor<5x3x224x224xf32>, tensor<2x3x10x10xf32>) -> tensor<215x215x2x5xf32>

There is still the case where we have transpose(conv) which could be optimized if there is only one user of the convolution to be implemented next.

@mofeing We can call to do a similar optimization for Einsum, it seems it is not present in the spec so you may be able to help me here.

@wsmoses
Copy link
Member

wsmoses commented Jul 17, 2024

Are there any special cases by chance where we don't need to add a transpose to the end? this is still beneficial, but moving the transpose isn't always guaranteed to improve perf since when something is transposed can change how a buffer is materialized [e.g. choosing to create the explicit new transpose in memory is expensive, but fusing it with a previous op may be cheap -- so its value as a perf boost becomes context dependent].

At minimum I suppose if there's also a transpose after the convolve it would be free, since the two later transposes would be fused.

@wsmoses
Copy link
Member

wsmoses commented Jul 17, 2024

By a similar token, it might be interesting to also have a transpose(conv(x, y)) -> conv(transpose(x), transpose(y)) option -- but probably not both turned on at the same time.

I think einsum may end up being an interesting special case where it may be possible to always fuse the transpose into the einsum, making it strictly beneficial if so.

Other cool opts this make me think of could include:
einsum -> dotgeneral/convolve/sum/etc
convolve(convolve(x)) -> convolve(x)

@mofeing
Copy link
Collaborator

mofeing commented Jul 18, 2024

@mofeing We can call to do a similar optimization for Einsum, it seems it is not present in the spec so you may be able to help me here.

einsum and unary_einsum (among others) are deprecated from StableHLO. Mainly because they can be rewritten with other operations (i.e. dot_general and transpose).

I need to rewrite my code so it uses dot_general instead but meanwhile, doing a pass for unary_einsum and einsum should be almost equal to the DotGeneralTranspose pass.

@Pangoraw
Copy link
Collaborator Author

Are there any special cases by chance where we don't need to add a transpose to the end?

There isn't actually a need to add a transpose after the convolution. The test input has a transpose since it was actually generated with Reactant (I removed it for clarity).

The optimization applies the transpose permutation to the dimension_numbers attribute of the convolution operation which specifies which dimensions are feature/batch/spatials in the inputs/output. This will effectively change the strides of the accesses during the convolution computation instead of materializing a transposed buffer.

@wsmoses
Copy link
Member

wsmoses commented Jul 27, 2024

unfortunately needs rebase per just merging einsum one but otherwise lgtm [and not sure why CI is borked but we can ignore for now]

@wsmoses wsmoses merged commit 1886bb4 into EnzymeAD:main Jul 27, 2024
2 of 4 checks passed
@Pangoraw Pangoraw deleted the conv-transpose branch July 27, 2024 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants