Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][AMDAIEConvertToDma] Support memref shape collapse/expand #800

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

newling
Copy link
Contributor

@newling newling commented Sep 25, 2024

This PR extends iree-amdaie-convert-to-dma to handle more situations.

The goal of the pass iree-amdaie-convert-to-dma is to convert iree_linalg_ext.pack, iree_linalg_ext.unpack and linalg.copy operations into amdaie.dma_cpy_nd operations. The linalg.copy ops are of no concern in this PR, as they are converted to pack and unpack ops very early in this pass. The logic for iree_linalg_ext.unpack is essentially the same as for iree_linalg_ext.pack, so I will only discuss iree_linalg_ext.pack in the next paragraphs.

There are 2 main differences between iree_linalg_ext.pack and amdaie.dma_cpy_nd which the pass needs to handle.

The first is that operands of amdaie.dma_cpy_nd are amdaie.logicalobjectfifo, which are essentially just memref.alloc ops (tied to a set of tiles). The operation iree_linalg_ext.pack on the other hand has operands which are not as 'directly' connected to memref.alloc ops, as they can can be memref.subviews of allocations, or indeed any arbitrary chain of memref.subview, memref.expand_shape, memref.collapse_shape, etc. The pass therefore needs to find the memref.alloc at the start of the chain which ultimately defines the operand of the iree_linalg_ext.pack, and build the amdaie.dma_cpy_nd based on that memref.alloc. Before this PR, it was assumed that the chain connecting a memref.alloc to iree_linalg_ext.pack was at most a single memref.subview op. This PR extends this to a chain of any length. It avoids recursion. Please see lit test for examples.

The second is that amdaie.dma_cpy_nd has offsets, sizes and strides. These need to be derived from the iree_linalg_ext.pack and all of the operations in the chain from the memref.alloc to the iree_linalg_ext.pack. With this PR, each of the operations memref.subview, memref.collapse_shape and memref.expand_shape in a chain from memref.alloc to iree_linalg_ext.pack has specific logic for modifying offsets, sizes and strides. Vectors offsets, sizes and strides are initialized at the memref.alloc, and then each op in the chain to the iree_linalg_ext.pack mutates them. And the they are mutated one last time based on the iree_linalg_ext.pack ops permutation and tile sizes.

Please see the lit tests for examples of these modifications.

@newling newling force-pushed the support_collapse_and_expand_in_dma_conversion branch from 762fa08 to 764d8e6 Compare September 25, 2024 20:06
@newling newling changed the title [WIP][AMDAIEConvertToDma] Support memref shape collapse/expand [AMDAIEConvertToDma] Support memref shape collapse/expand Sep 25, 2024
@yzhang93
Copy link
Contributor

yzhang93 commented Sep 26, 2024

Could you add some motivation/comments/logic so it is easy to follow?

@newling
Copy link
Contributor Author

newling commented Sep 26, 2024

Could you add some motivation/comments/logic so it is easy to follow?

I've added a description to the PR, but I'm not sure if that's what you're requesting?

Motivation: memref.expand_shape and memref.collapse_shape enter when I add linalg::populateFoldUnitExtentDimsPatterns(ps, options); which is needed to get vectorization working for convolution.

Would you like more comments in the code? I am happy to add these if so.

// CHECK-SAME: [0, 0, 0, 0] [20, 5, 10, 10] [500, 100, 10, 1]
// src of dma cpy:
// CHECK-SAME: [0, 0, 0, 0] [20, 5, 10, 10] [500, 10, 50, 1]
func.func @multidim_without_expand() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the test name with keyword pack/unpack. Same for the tests below.

if (size.value() % innerTiles[i] != 0) {
std::optional<int64_t> maybeSize =
getConstantIntValue(sizes[innerDimsPos[i]]);
assert(maybeSize.has_value() && "size expected to be constant here.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to emit an error too to be consistent with the strides below.

Comment on lines 70 to 73
std::optional<int64_t> stride =
getConstantIntValue(strides[innerDimsPos[i]]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect here the same name format (maybeStrides) as the previous. And then take stride = maybeStride.value()


OpBuilder builder(subviewOp.getContext());
builder.setInsertionPoint(subviewOp);
offsets = getIndexOpFoldResultSum(builder, subviewOp.getLoc(), offsets,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add some comments here or above the utility function of how offsets are generated.

}

// Starting from the allocation, update the offsets, sizes, and strides.
for (auto iter = chain.rbegin(); iter != chain.rend(); ++iter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with your method first going up the chain to find the definingOp and then going down the chain to update the address. But I think a recursion could make the codes cleaner.

rewriter.eraseOp(op);
if (failed(mlir::verify(src)) || failed(mlir::verify(dst))) {
return failure();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, left over from debugging.

Comment on lines 601 to 603
if (failed(mlir::verify(packOrUnackOp))) {
return failure();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question, why is this needed?

SmallVector<OpFoldResult> &sizes,
SmallVector<OpFoldResult> &strides) {
MLIRContext *ctx = expandShapeOp.getContext();
auto reassociationIndices = expandShapeOp.getReassociationIndices();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use auto here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally didn't use auto here, because IMO SmallVector<SmallVector<long, 2>, 4> is uninformative. Why pollute code (and brain) with the detail that someone years ago thought that 4 groups of size 2 seemed like a good choice for static sizes?

SmallVector<OpFoldResult> &offsets,
SmallVector<OpFoldResult> &sizes,
SmallVector<OpFoldResult> &strides) {
auto reassociationIndices = collapseOp.getReassociationIndices();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use auto here.

sizes.push_back(getAsIndexOpFoldResult(ctx, dim));
}

// Offsets - merge reassocation groups.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put some explanation of how offsets are decided here and other places too.

@yzhang93
Copy link
Contributor

Would you like more comments in the code? I am happy to add these if so.

Yes, I think more comments and explanations are needed, especially for the utility function (e.g. getLinearCombination) and how to determine the offsets/strides for expand/collapse op.


auto add = [&](Value v, IntegerAttr attr) {
if (attr.getInt() == 0) return v;
return builder.create<arith::AddIOp>(loc, v, getConstant(attr.getInt()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there's an easier way / existing utility in upstream that can do this, instead of creating new ops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wondered the same thing.

@newling newling force-pushed the support_collapse_and_expand_in_dma_conversion branch from 764d8e6 to 86ed927 Compare September 27, 2024 04:54
@newling
Copy link
Contributor Author

newling commented Sep 27, 2024

@yzhang93 thanks for your review. Please don't review this again until I remove the [WIP]. This is the "large PR" which I said could be eliminated if I could get "linalg-fold-unit-extent-dims" to not create collapse_shape and expand_shape ops. I'm experimenting with 'useRankReducingSlices = true' in that pass now as suggested by Mahesh to see what happens, if that works I might abandon this PR

@newling newling changed the title [AMDAIEConvertToDma] Support memref shape collapse/expand [WIP][AMDAIEConvertToDma] Support memref shape collapse/expand Sep 27, 2024
@newling newling marked this pull request as draft October 7, 2024 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants