-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[MLIR] Add continuous tiling to transform dialect #82792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-mlir-scf @llvm/pr-subscribers-mlir Author: None (muneebkhan85) ChangesThis patch adds continuous tiling options to TileUsingForOp. In cases where the tensor dimensions are not exactly divisible by the tile size, we are left with leftover tensor chunks that are irregularly tiled. This approach attempts to tile the leftover chunk with a smaller tile size and repeats this process recursively using exponentially diminishing tile sizes. The transform eventually generates a chain of loops that apply tiling using diminishing tile sizes. The transform lowers from the linalg dialect to scf dialect. Patch is 62.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82792.diff 6 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
index 309573a562872f..4976c8e82db4df 100644
--- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
@@ -1833,7 +1833,9 @@ def TileUsingForOp : Op<Transform_Dialect, "structured.tile_using_for",
be as many handles as `ShapedType::kDynamic` values in the
`static_sizes` attribute. A static size of `0` indicates that the dimension
should not be tiled. No loop will be generated for such dimensions. If all
- tile sizes are `0`, this transform is effectively a no-op.
+ tile sizes are `0`, this transform is effectively a no-op. To apply
+ continuous tiling `continuous_tiles` needs to be supplied with as many
+ boolean values as there are nested loops.
This op returns handles to the tiled op (in the generated loop nest) and the
generated loops. The number of loops is the number of tile sizes that are
@@ -1859,6 +1861,7 @@ def TileUsingForOp : Op<Transform_Dialect, "structured.tile_using_for",
let arguments = (ins TransformHandleTypeInterface:$target,
Variadic<TransformAnyParamTypeOrAnyHandle>:$dynamic_sizes,
DefaultValuedOptionalAttr<DenseI64ArrayAttr, "{}">:$static_sizes,
+ DefaultValuedOptionalAttr<DenseBoolArrayAttr, "{}">:$continuous_tiles,
DefaultValuedOptionalAttr<DenseI64ArrayAttr, "{}">:$interchange,
DefaultValuedOptionalAttr<DenseBoolArrayAttr, "{}">:$scalable_sizes);
let results = (outs TransformHandleTypeInterface:$tiled_linalg_op,
@@ -1867,22 +1870,26 @@ def TileUsingForOp : Op<Transform_Dialect, "structured.tile_using_for",
OpBuilder<(ins "TypeRange":$loopTypes,
"Value":$target,
"ArrayRef<int64_t>":$staticTileSizes,
+ CArg<"ArrayRef<bool>", "{}">:$continuousTiles,
CArg<"ArrayRef<int64_t>", "{}">:$interchange,
CArg<"std::optional<ArrayRef<bool>>", "std::nullopt">:
$scalableSizes)>,
OpBuilder<(ins "TypeRange":$loopTypes,
"Value":$target,
"ArrayRef<OpFoldResult>":$mixedTileSizes,
+ CArg<"ArrayRef<bool>", "{}">:$continuousTiles,
CArg<"ArrayRef<int64_t>", "{}">:$interchange,
CArg<"std::optional<ArrayRef<bool>>", "std::nullopt">:
$scalableSizes)>,
OpBuilder<(ins "Value":$target,
"ArrayRef<int64_t>":$staticTileSizes,
+ CArg<"ArrayRef<bool>", "{}">:$continuousTiles,
CArg<"ArrayRef<int64_t>", "{}">:$interchange,
CArg<"std::optional<ArrayRef<bool>>", "std::nullopt">:
$scalableSizes)>,
OpBuilder<(ins "Value":$target,
"ArrayRef<OpFoldResult>":$mixedTileSizes,
+ CArg<"ArrayRef<bool>", "{}">:$continuousTiles,
CArg<"ArrayRef<int64_t>", "{}">:$interchange,
CArg<"std::optional<ArrayRef<bool>>", "std::nullopt">:
$scalableSizes)>,
diff --git a/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h b/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
index 965ef9e203be28..b40291b5a80da5 100644
--- a/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
+++ b/mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
@@ -71,6 +71,14 @@ struct SCFTilingOptions {
mapping, [](auto attr) -> Attribute { return attr; });
return *this;
}
+
+ /// Specify which loops in the loop nest are to be continuously tiled.
+ SmallVector<bool> continuousTileMappingVector = {};
+ SCFTilingOptions &setCTileMapping(ArrayRef<bool> ctile) {
+ continuousTileMappingVector =
+ llvm::map_to_vector(ctile, [](auto attr) -> bool { return attr; });
+ return *this;
+ }
};
/// Transformation information returned after tiling.
@@ -92,6 +100,12 @@ FailureOr<SCFTilingResult> tileUsingSCF(RewriterBase &rewriter,
TilingInterface op,
const SCFTilingOptions &options);
+/// Method to continuously tile an op that implements the `TilingInterface`
+/// using `scf.for` for iterating over the tiles.
+FailureOr<SCFTilingResult>
+continuousTileUsingSCF(RewriterBase &rewriter, TilingInterface op,
+ const SCFTilingOptions &options);
+
/// Options used to control tile + fuse.
struct SCFTileAndFuseOptions {
/// The tiling options used to control the tiling of the consumer.
diff --git a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
index 299965bcfc3ab3..75080ea7721beb 100644
--- a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
+++ b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
@@ -2476,39 +2476,41 @@ DiagnosedSilenceableFailure transform::TileReductionUsingForallOp::applyToOne(
void transform::TileUsingForOp::build(
OpBuilder &builder, OperationState &result, TypeRange loopTypes,
Value target, ArrayRef<int64_t> staticTileSizes,
- ArrayRef<int64_t> interchange,
+ ArrayRef<bool> continuousTiles, ArrayRef<int64_t> interchange,
std::optional<ArrayRef<bool>> scalableSizes) {
return build(builder, result, loopTypes,
/*target=*/target,
/*mixedTileSizes=*/
getAsOpFoldResult(builder.getI64ArrayAttr(staticTileSizes)),
- interchange, scalableSizes);
+ continuousTiles, interchange, scalableSizes);
}
void transform::TileUsingForOp::build(
OpBuilder &builder, OperationState &result, Value target,
- ArrayRef<int64_t> staticTileSizes, ArrayRef<int64_t> interchange,
+ ArrayRef<int64_t> staticTileSizes, ArrayRef<bool> continuousTiles,
+ ArrayRef<int64_t> interchange,
std::optional<ArrayRef<bool>> scalableSizes) {
build(builder, result, target,
getAsOpFoldResult(builder.getI64ArrayAttr(staticTileSizes)),
- interchange, scalableSizes);
+ builder.getDenseBoolArrayAttr(continuousTiles), interchange, scalableSizes);
}
void transform::TileUsingForOp::build(
OpBuilder &builder, OperationState &result, Value target,
- ArrayRef<OpFoldResult> mixedTileSizes, ArrayRef<int64_t> interchange,
+ ArrayRef<OpFoldResult> mixedTileSizes, ArrayRef<bool> continuousTiles,
+ ArrayRef<int64_t> interchange,
std::optional<ArrayRef<bool>> scalableSizes) {
// Loop types are automaticaly splat by the callee, setting up one is
// enough.
SmallVector<Type> loopTypes(1, builder.getType<transform::AnyOpType>());
- build(builder, result, loopTypes, target, mixedTileSizes, interchange,
+ build(builder, result, loopTypes, target, mixedTileSizes, continuousTiles, interchange,
scalableSizes);
}
void transform::TileUsingForOp::build(
OpBuilder &builder, OperationState &result, TypeRange loopTypes,
Value target, ArrayRef<OpFoldResult> mixedTileSizes,
- ArrayRef<int64_t> interchange,
+ ArrayRef<bool> continuousTiles, ArrayRef<int64_t> interchange,
std::optional<ArrayRef<bool>> scalableSizes) {
SmallVector<int64_t> staticTileSizes;
SmallVector<Value> dynamicTileSizes;
@@ -2517,6 +2519,7 @@ void transform::TileUsingForOp::build(
// attributes for multiple variadic operands. In the absence of this,
// horrible bugs ensue.
auto staticTileSizesAttr = builder.getDenseI64ArrayAttr(staticTileSizes);
+ auto continuousTilesAttr = builder.getDenseBoolArrayAttr(continuousTiles);
unsigned numExpectedLoops =
staticTileSizes.size() - llvm::count(staticTileSizes, 0);
SmallVector<Type> resultTypes;
@@ -2535,6 +2538,7 @@ void transform::TileUsingForOp::build(
/*target=*/target,
/*dynamic_sizes=*/dynamicTileSizes,
/*static_sizes=*/staticTileSizesAttr,
+ /*continuous_tiles=*/continuousTilesAttr,
/*interchange=*/builder.getDenseI64ArrayAttr(interchange),
/*scalable_sizes=*/expandedScalableSizes);
}
@@ -2675,8 +2679,15 @@ transform::TileUsingForOp::apply(transform::TransformRewriter &rewriter,
}
tilingOptions.setInterchange(getInterchange());
- FailureOr<scf::SCFTilingResult> maybeTilingResult =
+ tilingOptions.setCTileMapping(getContinuousTiles());
+
+ FailureOr<scf::SCFTilingResult> maybeTilingResult;
+ if (tilingOptions.continuousTileMappingVector.empty())
+ maybeTilingResult =
tileUsingSCF(rewriter, tilingInterface, tilingOptions);
+ else
+ maybeTilingResult =
+ continuousTileUsingSCF(rewriter, tilingInterface, tilingOptions);
if (failed(maybeTilingResult))
return DiagnosedSilenceableFailure::definiteFailure();
@@ -2725,6 +2736,18 @@ ParseResult parseOptionalInterchange(OpAsmParser &parser,
return success();
}
+ParseResult parseOptionalContinuousTiles(OpAsmParser &parser,
+ OperationState &result) {
+ if (failed(parser.parseOptionalKeyword("continuous_tiles")))
+ return success();
+ if (failed(parser.parseEqual()))
+ return failure();
+ result.addAttribute(
+ transform::TileUsingForOp::getContinuousTilesAttrName(result.name),
+ DenseBoolArrayAttr::parse(parser, Type{}));
+ return success();
+}
+
void printOptionalInterchange(OpAsmPrinter &p,
ArrayRef<int64_t> interchangeVals) {
if (!interchangeVals.empty()) {
@@ -2747,6 +2770,7 @@ ParseResult transform::TileUsingForOp::parse(OpAsmParser &parser,
if (parser.parseOperand(target) || parser.getCurrentLocation(&operandLoc) ||
parseDynamicIndexList(parser, dynamicSizes, staticSizes, scalableVals) ||
parseOptionalInterchange(parser, result) ||
+ parseOptionalContinuousTiles(parser, result) ||
parser.parseOptionalAttrDict(result.attributes) ||
parser.parseColonType(functionalType))
return ParseResult::failure();
diff --git a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
index 1a84a59ddb69df..f81a901c82db99 100644
--- a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
@@ -31,6 +31,8 @@
using namespace mlir;
+static constexpr char kLoopIndexLabel[] = "__loop_index__";
+
scf::SCFTilingOptions &
scf::SCFTilingOptions::setTileSizes(ArrayRef<OpFoldResult> ts) {
assert(!tileSizeComputationFunction && "tile sizes already set");
@@ -309,6 +311,189 @@ static LogicalResult generateLoopNest(RewriterBase &rewriter, Location loc,
return rewriter.notifyMatchFailure(loc, "unhandled loop type");
}
+static void continuousLoopNestHelper(
+ OpBuilder &builder, Location loc, ArrayRef<Range> loopRanges,
+ SmallVector<LoopLikeOpInterface> &loops, uint64_t loopLevelIdx, uint64_t &loopIdx,
+ ArrayRef<OpFoldResult> tileSizes, SmallVector<bool> &CTileVector,
+ std::map<int, OpFoldResult> &sizesMap,
+ SmallVector<scf::ForOp> &innermostLoops, ValueRange destinationTensors = {},
+ bool isHeadOrInsideHeadLoop = false) {
+
+ Value offset = getValueOrCreateConstantIndexOp(
+ builder, loc, loopRanges[loopLevelIdx].offset);
+ Value size = getValueOrCreateConstantIndexOp(builder, loc,
+ loopRanges[loopLevelIdx].size);
+ Value tileSize =
+ getValueOrCreateConstantIndexOp(builder, loc, tileSizes[loopLevelIdx]);
+
+ AffineExpr sym0, sym1, sym2;
+ bindSymbols(builder.getContext(), sym0, sym1, sym2);
+ AffineMap defaultSplitMap =
+ AffineMap::get(0, 3, {sym1 - ((sym1 - sym0) % sym2)});
+ // Simplified map for use when step is power of 2 and lower bound
+ // is exactly divisble by step.
+ AffineMap powerSplitMap = AffineMap::get(0, 3, {sym1 - (sym1 % sym2)});
+
+ uint64_t tileSizeInt = *getConstantIntValue(tileSize);
+
+ // Enforce no tiling when tile size is zero.
+ // No need to create a loop here.
+ if (tileSizeInt == 0) {
+ continuousLoopNestHelper(builder, loc, loopRanges, loops, loopLevelIdx + 1,
+ loopIdx, tileSizes, CTileVector, sizesMap,
+ innermostLoops, destinationTensors,
+ isHeadOrInsideHeadLoop);
+ return;
+ }
+
+ // The head loop is always tiled using the tile size specified
+ // in the size parameters to tile_using_for transform.
+ auto loop = builder.create<scf::ForOp>(
+ loc, offset, size, tileSize, destinationTensors,
+ [&](OpBuilder &bodyBuilder, Location bodyLoc, Value iv,
+ ValueRange /*iterArgs*/) {
+ sizesMap[loopIdx] =
+ getBoundedTileSize(bodyBuilder, bodyLoc, loopRanges[loopLevelIdx],
+ iv, getAsOpFoldResult(tileSize));
+ });
+
+ loop->setAttr(kLoopIndexLabel, builder.getIndexAttr(loopIdx));
+ ++loopIdx;
+
+ scf::ForOp currentLoop = loop;
+ auto lbInt = getConstantIntValue(currentLoop.getLowerBound());
+ // Use simplified powerSplitMap instead of the default when possible.
+ bool usePowerSplit = (lbInt.has_value()) &&
+ (*lbInt % tileSizeInt == static_cast<int64_t>(0)) &&
+ (tileSizeInt == llvm::bit_floor(tileSizeInt));
+
+ AffineMap splitMap = usePowerSplit ? powerSplitMap : defaultSplitMap;
+
+ bool isInnermostLoop = loopLevelIdx == loopRanges.size() - 1;
+ if (isInnermostLoop)
+ innermostLoops.push_back(currentLoop);
+
+ if (isHeadOrInsideHeadLoop)
+ loops.push_back(loop);
+
+ builder.setInsertionPointToEnd(loop.getBody());
+
+ // Create the nested loop inside current loop.
+ if (!isInnermostLoop)
+ continuousLoopNestHelper(builder, loop->getLoc(), loopRanges, loops,
+ loopLevelIdx + 1, loopIdx, tileSizes, CTileVector,
+ sizesMap, innermostLoops, loop.getRegionIterArgs(),
+ isHeadOrInsideHeadLoop);
+
+ // Apply continuous tiling to current loop if continuous_tiles
+ // specifies so.
+ while (CTileVector[loopLevelIdx] && tileSizeInt > 1) {
+
+ uint64_t maxPower = llvm::bit_floor(tileSizeInt);
+ tileSizeInt = maxPower == tileSizeInt ? maxPower >> 1 : maxPower;
+
+ builder.setInsertionPoint(currentLoop);
+
+ auto constStepOp = builder.create<arith::ConstantIndexOp>(loc, tileSizeInt);
+
+ Value splitBound = builder.createOrFold<affine::AffineApplyOp>(
+ loc, splitMap,
+ ValueRange{currentLoop.getLowerBound(), currentLoop.getUpperBound(),
+ currentLoop.getStep()});
+
+ builder.setInsertionPointAfter(currentLoop);
+ auto additionalLoop =
+ builder.create<scf::ForOp>(currentLoop->getLoc(), splitBound, size,
+ constStepOp, destinationTensors);
+
+ additionalLoop.getInitArgsMutable().assign(currentLoop->getResults());
+ currentLoop.getUpperBoundMutable().assign(splitBound);
+
+ builder.setInsertionPointToStart(additionalLoop.getBody());
+ AffineExpr s0, s1, d0;
+ bindDims(builder.getContext(), d0);
+ bindSymbols(builder.getContext(), s0, s1);
+ AffineMap minMap = AffineMap::get(1, 1, {s0}, builder.getContext());
+ auto additionalLoopAffineMin = affine::makeComposedAffineMin(
+ builder, loc, minMap,
+ SmallVector<OpFoldResult>{splitBound, getAsOpFoldResult(constStepOp),
+ size});
+
+ currentLoop = additionalLoop;
+
+ sizesMap[loopIdx] = getAsOpFoldResult(additionalLoopAffineMin);
+
+ // Add custom loop-indexing attribute to each loop op.
+ // Continuous tiling ends up generating many loop nestings and
+ // each loop can be identified with its loop-index attribute.
+ // This is needed later to retrieve the sizes from sizesMap.
+ currentLoop->setAttr(kLoopIndexLabel, builder.getIndexAttr(loopIdx));
+
+ ++loopIdx;
+
+ if (isInnermostLoop)
+ innermostLoops.push_back(currentLoop);
+
+ builder.setInsertionPointToEnd(currentLoop.getBody());
+
+ // Create the nested loop inside current loop.
+ if (!isInnermostLoop)
+ continuousLoopNestHelper(builder, currentLoop->getLoc(), loopRanges,
+ loops, loopLevelIdx + 1, loopIdx, tileSizes,
+ CTileVector, sizesMap, innermostLoops,
+ currentLoop.getRegionIterArgs());
+ }
+
+ // Always yield the result of the tail-end loop as this
+ // will have all the processed tiles.
+ if (!isa<func::ReturnOp>(currentLoop->getBlock()->back())) {
+ builder.setInsertionPointToEnd(currentLoop->getBlock());
+ builder.create<scf::YieldOp>(currentLoop.getLoc(),
+ currentLoop.getResults());
+ }
+ /// For the outermost loop insert the tail-end loop in front of loops
+ /// structure so that it's results can be used for replacements in the
+ /// function return. This is removed from the head of loops later.
+ else
+ loops.insert(loops.begin(), currentLoop);
+
+ destinationTensors = loop.getRegionIterArgs();
+}
+
+/// Generate an empty loop nest that represents the continuous-tiled loop nest
+/// shell.
+/// - `loopRanges` specifies the lb, ub and step of the untiled iteration space.
+/// - `tileSizes` is the tile sizes to use in the first tiling attempt. Zero
+/// represent untiled loops.
+/// - In ``sizesMap` return the multi-dimensional size of
+/// the tile processed within the inner most loop.
+/// - `CTileVector` specifies which loop nest should be continuously tiled.
+/// Note that this methods adds `scf.yield` operation for all but the innermost
+/// loop. These yield the value returned by the immediately inner tail-end loop.
+/// The caller is expected to add the scf.yield operation for all innermost
+/// loops.
+static SmallVector<LoopLikeOpInterface> generateContinuousTileLoopNest(
+ OpBuilder &builder, Location loc, ArrayRef<Range> loopRanges,
+ ArrayRef<OpFoldResult> tileSizes, SmallVector<bool> CTileVector,
+ std::map<int, OpFoldResult> &sizesMap,
+ SmallVector<scf::ForOp> &innermostLoops,
+ ValueRange destinationTensors = {}) {
+ if (loopRanges.empty())
+ return {};
+ assert(loopRanges.size() == tileSizes.size() &&
+ "expected as many tile sizes as loop ranges");
+ OpBuilder::InsertionGuard guard(builder);
+ SmallVector<LoopLikeOpInterface> loops;
+
+ uint64_t loopIdx = 0;
+ continuousLoopNestHelper(builder, loc, loopRanges, loops, 0, loopIdx,
+ tileSizes, CTileVector, sizesMap, innermostLoops,
+ destinationTensors, true);
+
+ return loops;
+}
+
+
/// Append the specified additional `newInitOperands` operands to the
/// loops existing `init` operands (or similar), and replace `loopOp` with
/// the new loop that has the additional init operands. The loop body of
@@ -679,6 +864,278 @@ mlir::scf::tileUsingSCF(RewriterBase &rewriter, TilingInterface op,
return scf::SCFTilingResult{tilingResult->tiledOps, loops, replacements};
}
+/// Implementation of continuous-tiling transformation of `op`
+/// that implements the `TilingInterface` using a sequence of
+/// `scf.for` loops to iterate over tiles of exponentially
+/// diminishing sizes.
+///
+/// The generated sequence of `scf.for` loops iterate over tiles of
+/// exponentially diminishing sizes as opposed to vanilla tiling scheme where
+/// only the tile sizes specified as transform parameters are used.
+/// continuous-tiling first applies regular tiling using the specified
+/// tile size, then halves the tile size and uses it to tile the leftover
+/// chunk. This process of halving the tile size and tiling the leftover
+/// chunk is repeated until tile size reaches 1. The transform parameter
+/// continuous_tiles controls which nested loop should be tiled. If all
+/// arguments in continuous_tiles are set to false, the result is identical
+/// to vanilla tiling transform.
+///
+/// When tiling a tensor of size M with tile size 8, it generates
+/// the following loop to tile
+///
+/// for (int i = 0; i < M; i += 8) {
+/// int size = min(8, M-i);
+/// // size is unknown at compile time because M is dynamic
+/// // compute using dynami...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
4c89934
to
93eae9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for proposing to extend tiling @muneebkhan85
Quick question, have you looked at the MultiTileSizes op (
%1:3 = transform.structured.multitile_sizes %0 { dimension = 0, target_size = 3} : (!transform.any_op) -> !transform.any_op |
Can the 2 approaches be (partially) merged to have less code overall?
Also adding @ftynse for visibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just putting a marker cause I need to look more deeply into this. THis is a significant change and needs some time to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surfacing comments as I review. More to come. THis definitely is interesting, I am trying to understand the implementation. Is it possible to not duplicate all the tiling logic
/// - `CTileVector` specifies which loop nest should be continuously tiled. | ||
/// Note that this methods adds `scf.yield` operation for all but the innermost | ||
/// loop. These yield the value returned by the immediately inner tail-end loop. | ||
/// The caller is expected to add the scf.yield operation for all innermost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found this to be a bit of a foot gun. At some point of the implementation I did the same where the caller would apply the scf.yield
. But that is awkward for the caller to do. Can we instead try to generate the scf.yield
during the transformation itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but this was a leftover from the previous "copied" comment. The scf.yield is actually generated for all inner loops as part of the call itself after their creation (as you suggested). Fixed comment now.
|
||
// Always yield the result of the tail-end loop as this | ||
// will have all the processed tiles. | ||
if (!isa<func::ReturnOp>(currentLoop->getBlock()->back())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why this is looking for a func.return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, perhaps the comments aren't that good here. It only inserts a YieldOp when there's no return at the end of the block. If there is a return then we are dealing with the outermost loop and we don't want to create a YieldOp, instead the return needs to be re-written to return the right value. The 'if' makes sure that we are handling the first case and creating a YieldOp for the inner loops only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return at the end is an artifact of the test. This seems very fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed this now, so that it doesn't depend on func.return to detect the outermost loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just putting a marker cause I need to look more deeply into this. THis is a significant change and needs some time to review.
Indeed it is.
I will add that we need the prerequisite cleanups (discussed here and here) to land first before any new code is added on these paths.
Speaking of which @MaheshRavishankar , what is the status of the required cleanup?
Fighting my backlog a bit. I am planning to start it in a couple of weeks. |
High level comment, why cant this just be an after tile and fuse peeling transformation. What is happening here seems to be peeling applied repeatedly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a bunch of mostly stylistic comments that I think are useful regardless of this code going in as is or not. I only commented once a particular kind of issues and there may be other issues of the same kind in the code, please fix all.
Overall, this looks quite interesting. I agree with other reviewers in my desire to see more code reuse. It is possible to alter the existing code to allow for more reuse if needed. If this is approached from the transform dialect perspective, it is possible to decompose this transformation into pieces similar to what was implemented for multisize tiling in
// This implements a 2D multisize tiling with target sizes [3, 10]. |
|
||
/// Specify which loops in the loop nest are to be continuously tiled. | ||
SmallVector<bool> continuousTileMappingVector = {}; | ||
SCFTilingOptions &setCTileMapping(ArrayRef<bool> ctile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's use full names, setContinuousTileMapping
.
SmallVector<bool> continuousTileMappingVector = {}; | ||
SCFTilingOptions &setCTileMapping(ArrayRef<bool> ctile) { | ||
continuousTileMappingVector = | ||
llvm::map_to_vector(ctile, [](auto attr) -> bool { return attr; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: llvm::to_vector
works just fine, no need to have an additional map
with identity function.
tilingOptions.setCTileMapping(getContinuousTiles()); | ||
|
||
FailureOr<scf::SCFTilingResult> maybeTilingResult; | ||
if (tilingOptions.continuousTileMappingVector.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please add braces if bodies don't fit into one line.
@@ -309,6 +311,188 @@ static LogicalResult generateLoopNest(RewriterBase &rewriter, Location loc, | |||
return rewriter.notifyMatchFailure(loc, "unhandled loop type"); | |||
} | |||
|
|||
static void continuousLoopNestHelper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please document top-level functions appropriately.
@@ -309,6 +311,188 @@ static LogicalResult generateLoopNest(RewriterBase &rewriter, Location loc, | |||
return rewriter.notifyMatchFailure(loc, "unhandled loop type"); | |||
} | |||
|
|||
static void continuousLoopNestHelper( | |||
OpBuilder &builder, Location loc, ArrayRef<Range> loopRanges, | |||
SmallVector<LoopLikeOpInterface> &loops, uint64_t loopLevelIdx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a (const) reference to SmallVectorImpl
to avoid copying on-stack elements of the vector. If the callee doesn't modify the length of the vector, take a MutableArrayRef
/ArrayRef
instead.
currentLoop.getRegionIterArgs()); | ||
} | ||
|
||
// Yiled results for all loops in the loop nest except for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit typo: yield.
// will have all the processed tiles. | ||
if (loopLevelIdx != 0) { | ||
builder.setInsertionPointToEnd(currentLoop->getBlock()); | ||
builder.create<scf::YieldOp>(currentLoop.getLoc(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we probably want to assert somewhere that the are results. Otherwise, the default builder creates a yield
operation and this will be the second one (yes, in hindsight, this wasn't a good choice).
return rewriter.notifyMatchFailure( | ||
op, "missing tile size computation function"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These error messages are invisible unless the function is called from a pattern rewriting engine, which I suspect is not the case here. If you want to provide a better error description, you can either use DiagnosedSilenceableFailure
from the Transform dialect or ask the caller to give an llvm::raw_ostream
, defaulted to llvm::nulls
that will accept the error message.
LLVM_DEBUG({ | ||
if (!forLoops.empty()) { | ||
llvm::dbgs() << "LoopNest shell :\n"; | ||
forLoops.front()->getBlock()->dump(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You likely want to ->print(llvm::dbgs())
for the case when error stream and debug stream differ.
/// Remove outermost tailend loop as its only use was to compute | ||
/// replacements. | ||
forLoops.erase(forLoops.begin()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what happens here: isn't this loop produced by tiling and therefore should appear in the results?
An issue here is that the static tile size is created using AffineMin at the time of creating the loop nests. This makes the issue more complicated than just having to create new tail-loops and clone regions from the original loop into it. |
Hi @ftynse, @nicolasvasilache, @MaheshRavishankar Thank you for pointing out how multisize tiling is handled and suggesting that the continuous tiling transform can be modelled after it. @muneebkhan85 and I had a discussion on how continuous tiling might be expressible using the transform dialect. The main issue we come up against is that rather than a statically known number of loops being generated, as with multisize tiling, continuous tiling can generate an arbitrary number of tiling loops. That is, whereas By modelling continuous tiling after multisize tiling, we came up with the following:
The above departs from the current transform dialect in three ways:
For the generalisation of Could you let us know if the above is similar to what you had in mind? We would be happy to work on landing the above three changes. If there are other more preferable paths to supporting continuous tiling, do let us know. @ftynse, you mentioned an approach whereby the transform dialect would itself be subject to rewrites. Do you have a reference for us to look at? |
SGTM in principle. I'm not quite sure how this tiling would work for multiple dimensions though.
Nit: I'd rather just have one result, and your example merges the two results anyway. If the two distinct results are necessary, we can either use
Nit: make sure to define behavior when lengths don't match, e.g.
I don't think something was committed upstream because there was no pressing need for that. However, transform dialect is a just a dialect. One should be able to write a pass based that applies patterns rewriting transform dialect operations quite easily. One of these patterns may expand a hypothetical high-level op into the actual implementation as proposed above. |
@ftynse Thanks for the prompt feedback on this.
It should work the same way as it does for multitile_sizes where each dimension is tiled separately. @rolfmorel and I will work on the proposed changes for |
Here's an example of how, with the proposed interface, continuous tiling can be applied to two dimensions. This has the exponentially decreasing pattern of tile sizes for the inner dimension occurring within each tiling loop of the outer dimension (where these outer loops themselves have exponentially decreasing tile sizes): %linalg = transform.structured.match ops{["linalg.generic"]} in %payload // assume only one linalg matched
%tile_sizes_dim0, %split_points_dim0 = transform.structured.continuous_tile_sizes %linalg { dimension = 0, target_size = 9} : (!transform.any_op) -> (!transform.param<i64>, !transform.param<i64>)
%tile_sizes_dim1, %split_points_dim1 = transform.structured.continuous_tile_sizes %linalg { dimension = 1, target_size = 7} : (!transform.any_op) -> (!transform.param<i64>, !transform.param<i64>)
%linalg_dim0_splits = transform.structured.split %linalg after %split_points_dim0 { dimension = 0, multiway }
transform.foreach %linalg_dim0_splits, %tile_sizes_dim0 {
^bb1(%linalg_dim0_split: !transform.any_op, %tile_size_dim0: !transform.any_param):
%linalg_dim0_split_tiled, %tile_loop = transform.structured.tile_using_for %linalg_dim0_split [%tile_size_dim0]
%linalg_dim1_splits = transform.structured.split %linalg_dim0_split_tiled after %split_points_dim1 { dimension = 1, multiway }
transform.foreach %linalg_dim1_splits, %tile_sizes_dim1 {
^bb2(%linalg_dim1_split: !transform.any_op, %tile_size_dim1: !transform.any_param):
transform.structured.tile_using_for %linalg_dim1_split [0, %tile_size_dim1]
transform.yield
}
transform.yield
} In the above, the This pattern naturally extends to higher dimensions. |
Ok, this design is fine with me. |
Sorry for the delay. I am still not convinced this is the easiest way to doing this. Looking at the description, you are trying to split full tiles and partial tiles (and then applying that recursively). As easier approach to me is to do something like this
you can first split into two
Now you can tile the individual ops. This does not need any changes to the Tiling implementation itself, and is also modular. You can take the second linalg op and apply the same procedure repeatedly to get all the different loops. That would be upto the caller to do (or wrapped in a helper function), but the core logic is just this one step. |
@MaheshRavishankar Thank you for the comments. After some deliberation we have decided to take another route for this without requiring any changes to the tiling logic. Please see the discussions here together with the example posted here The idea is to create an op |
3833975
to
646cc2a
Compare
@ftynse @MaheshRavishankar @nicolasvasilache I have re-purposed this PR in line with our earlier discussion here and my latest commits (1) introduce a continuous tiling operation ( The third part of the proposal was to modify @rolfmorel is currently working with that and will submit it as a separate PR. |
the iteration space of the specified dimension. `static_split_point` and | ||
`dynamic_split_point` in this case is a list of chunk sizes that the given | ||
dimension should be split into. With `multiway` it produces two handles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"split point" accepting chunk sizes sounds misleading. We should rename it. It is okay for the two-way case to take one size, which is same as the split point.
Also, document what happens when the sum of chunk sizes does not add up to the total size of the op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to chunk sizes
now with the description saying how it works when given just one value. The description also documents what happens when the chunk sizes do not cover the entire iteration space.
the first handle is a list of the multiple parts of the structured op | ||
after splitting, where the target dimensions for each linalg op in the | ||
list corresponds to the chunk sizes specfied in the input split list. | ||
The second handle is empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider just returning one handle always. It can be later split into parts using split_handle
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this out. It can surely be a single handle and can be split into multiple handles using split_handle, though there's a caveat that split_handle needs to specify the expected number of outputs (as list of output types) depending on the number of payloads it receives in the handle.
In the multitile-size example mlir/test/Dialect/Linalg/multisize-tiling-full.mlir
the 2nd split op generates two lists that are used by the two subsequent tile_using_for
. Creating a single output for the split op would mean merging these two lists and split_handle
would then see a handle of 4 payloads. This complicates things a bit in the sense that some reworking of the usage and examples are needed. Keeping it to produce two outputs for now. I think this change would be good for a separate PR.
DeclareOpInterfaceMethods<TransformOpInterface>, | ||
ReportTrackingListenerFailuresOpTrait]> { | ||
let description = [{ | ||
This transform takes a linalg as target and a dimension and target size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transform takes a linalg as target and a dimension and target size | |
This transform takes a handle to a linalg op as target and a dimension and target size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below. We don't usually refer to an op as "the linalg".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -2269,8 +2269,20 @@ SplitOp::apply(transform::TransformRewriter &rewriter, | |||
// Collect the dynamic split points if provided. | |||
SmallVector<Operation *> payload = | |||
llvm::to_vector(state.getPayloadOps(getTarget())); | |||
|
|||
bool isMultiwaySplit = getMultiway() ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? true : false
is a tautology. It can be omitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know what I was thinking here. Fixed.
bool isMultiwaySplit = getMultiway() ? true : false; | ||
|
||
if (isMultiwaySplit && !llvm::hasSingleElement(payload)) { | ||
return emitDefiniteFailure() << "requires exactly one target when " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use silenceable failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
splitPoints.push_back(splitPoint); | ||
} | ||
|
||
auto makeOpFromValue = [&](ArrayRef<Value> values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto makeOpFromValue = [&](ArrayRef<Value> values) { | |
auto getDefiningOps = [&](ArrayRef<Value> values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't actually make the op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
return llvm::to_vector( | ||
llvm::map_range(values, [&](Value value) -> Operation * { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map_to_vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
transformResults.set(cast<OpResult>(getTileSizes()), | ||
makeOpFromValue(spec->tileSizes)); | ||
transformResults.set(cast<OpResult>(getSplitPoints()), | ||
makeOpFromValue(splitPoints)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, we have value handles. They are rather underused and I won't push for them here to maintain consistency, but it would make this cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion and good to know that it can be done using Values, but skipping for consistency.
// Find the trip count of the iteration space dimension for which the tile | ||
// sizes are computed. | ||
SmallVector<OpFoldResult> allShapes = | ||
op.createFlatListOfOperandDims(b, b.getLoc()); | ||
AffineMap shapesToLoops = op.getShapesToLoopsMap(); | ||
SmallVector<OpFoldResult> loopRanges = | ||
makeComposedFoldedMultiResultAffineApply(b, op.getLoc(), shapesToLoops, | ||
allShapes); | ||
|
||
Value loopRange = | ||
getValueOrCreateConstantIndexOp(b, op.getLoc(), loopRanges[dimension]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use/adapt TilingInterface
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have adapted the dynamic case computeContinuousTileSizes
to entirely use TilingInterface. For consistency, I have modified the static case computeStaticContinuousTileSizes
to take a TilingInterface op, but it still relies on a casted LinalgOp internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self (@ftynse) : I haven't double checked the math here.
f4f657a
to
f7a1dec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a fly-by comment. I cant say for sure, but please refrain from using any of the linalg::tile*
methods and use the scf::tileUsingSCG*
methods. The linalg::
are being removed.
Here's a suggestion for a test case. It requires the pending change to PAYLOAD_CONTAINING_ANY_LINALG_HERE
module attributes {transform.with_named_sequence} {
transform.named_sequence @tile_dim0_continuously(%linalg: !transform.any_op {transform.readonly}, %target_size: !transform.param<i64>) -> !transform.any_op, !transform.any_op {
%tile_sizes, %split_points = transform.structured.continuous_tile_sizes %linalg descending_from %target_size { dimension = 0 } : (!transform.any_op, !transform.param<i64>) -> (!transform.param<i64>, !transform.param<i64>)
%linalg_splits, %empty = transform.structured.split %linalg after %split_points { dimension = 0, multiway } : ...
%tiled_splits, %loops = transform.foreach %linalg_splits, %tile_sizes {
^bb1(%linalg_split: !transform.any_op, %tile_size: !transform.param<i64>):
%tiled_linalg_split, %dim0_loop = transform.structured.tile_using_for %linalg_split [%tile_size] : ...
transform.yield %tiled_linalg_split, %dim0_loop : ...
}
transform.yield %tiled_splits, %loops : ...
}
transform.named_sequence @__transform_main(%payload: !transform.any_op) {
%linalg = transform.structured.match ops{["linalg.generic"]} in %payload : ...
%target_size = transform.param.constant 9 : !transform.param<index>
// NB: we would need a foreach here/in the named sequence if %linalg would associate multiple linalg.generics
%tiled_linalg_splits, %tiling_loops = transform.include @tile_dim0_continuously failures(propagate) (%linalg) : (!transform.any_op) -> !transform.any_op, !transform.any_op
transform.yield
}
} Note the above assumes that Demonstrating abstractionThe main point of interest regarding the test case is that it demonstrates how to do abstraction with the Transform dialect. That is, it shows how a higher-level transform can be implemented with the Transform dialect while the details of this implementation can be hidden. In this way, the higher-level transform's functionality is exposed basically* as if it were available like any other Transform op. (*: Note that (currently) a named sequence cannot be parametric w.r.t. attribute arguments that its contained ops take as attributes and not as params. So each op that we would like to use in named sequences - and whose attribute arguments we'd like to expose as arguments of the sequence - needs to be modified to take the relevant attributes as |
I'm open to suggestions here. Note that, despite people writing a lot of it directly, transform dialect is still an IR so some choices can be made for a compiler to reason more easily about the IR, such as param constants being operations similarly to all other constants in MLIR. |
f7a1dec
to
03a0bd8
Compare
Yes, valid point - having the param constants out in front of The more serious issue we are seeing is that the current MO is for all ops whose attribute arguments we would like to abstract over to be modified - op by op, argument by argument - to take these attributes as params as well. While this approach is feasible, it feels like there's potentially a better mechanism for addressing this need. As this is becoming off-topic for this PR, we will post on Discourse to start a discussion regarding better ways of dealing with the above. |
Please do. Let me know if you need help merging this, it's approved. |
03a0bd8
to
3d988d0
Compare
@ftynse I have introduced new test cases under While the examples here test for static dimension sizes, we have encountered an issue for the dynamic case where dimension sizes are not known, i.e, Here's an example for the dynamic dimensions case.
|
Add continuous tiling op `structured.continuous_tile_sizes` to the transform dialect that returns as result (1) a list of exponentially diminishing tile sizes, and (2) a list of chunk sizes -- along the specified dimension of the target -- where the corresponding tile sizes from (1) can be applied. The list of chunk sizes from (2) cover the entire iteration space along the given dimension of the target.
Add functionality that enables SplitOp to do a multiway split of a traget op along a given dimension. With multiway attribute, SplitOp takes a list of chunk sizes and applies it to a single target along the given dimension to generate multiple structured ops extracted from the target.
Tests SplitOp for multiway splitting of a structured op using the result of `continuous_tile_sizes` to specify the chunk sizes that the target's specified dimesion should be split into.
3d988d0
to
1a8d0b1
Compare
@ftynse I have included a commit that solves the aforementioned issue with dynamic bounds, by introducing If this looks like a viable solution to you, please help with merging this PR. Otherwise, please let me know if you have other suggestions. |
Adds `zip_shortest` functionality to `foreach` so that when it takes multiple handles of varying lengths - instead of failing - it shrinks the size of all payloads to that of the shortest payload.
Introduce a full test case demonstrating the continuous tiling idea end-to-end.
1a8d0b1
to
2893716
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a syntactic comment, but I'll land this and let you consider that comment in a separate PR.
let results = (outs Variadic<Transform_AnyHandleOrParamType>:$results); | ||
let regions = (region SizedRegion<1>:$body); | ||
let assemblyFormat = | ||
"$targets `:` type($targets) (`->` type($results)^)? $body attr-dict"; | ||
"$targets attr-dict `:` type($targets) (`->` type($results)^)? $body"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about turning the attribute into a keyword rather than putting the entire dictionary here, which looks rather awkward?
@muneebkhan85 Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested Please check whether problems have been caused by your change specifically, as How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
This patch enables continuous tiling of a target structured op using diminishing tile sizes. In cases where the tensor dimensions are not exactly divisible by the tile size, we are left with leftover tensor chunks that are irregularly tiled. This approach enables tiling of the leftover chunk with a smaller tile size and repeats this process recursively using exponentially diminishing tile sizes. This eventually generates a chain of loops that apply tiling using diminishing tile sizes. Adds `continuous_tile_sizes` op to the transform dialect. This op, when given a tile size and a dimension, computes a series of diminishing tile sizes that can be used to tile the target along the given dimension. Additionally, this op also generates a series of chunk sizes that the corresponding tile sizes should be applied to along the given dimension. Adds `multiway` attribute to `transform.structured.split` that enables multiway splitting of a single target op along the given dimension, as specified in a list enumerating the chunk sizes.
#98492) This PR addresses a [comment] made by @ftynse about the syntax for `ForeachOp`. The syntax was modified by @muneebkhan85 in #82792, where the attribute dictionary was moved to the middle. This patch moves it back to its original place at the end. And introduces an optional keyword for `zip_shortest`. [comment]: #82792 (review)
#98492) Summary: This PR addresses a [comment] made by @ftynse about the syntax for `ForeachOp`. The syntax was modified by @muneebkhan85 in #82792, where the attribute dictionary was moved to the middle. This patch moves it back to its original place at the end. And introduces an optional keyword for `zip_shortest`. [comment]: #82792 (review) Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251443
#111171) Follow-up a review comment from #82792 (comment) as a separate PR: E.g.: ``` %0:2 = transform.structured.split ``` is changed to ``` %t = transform.structured.split %0:2 = transform.split_handle %t ```
This patch enables continuous tiling of a target structured op using diminishing tile sizes. In cases where the tensor dimensions are not exactly divisible by the tile size, we are left with leftover tensor chunks that are irregularly tiled. This approach enables tiling of the leftover chunk with a smaller tile size and repeats this process recursively using exponentially diminishing tile sizes. This eventually generates a chain of loops that apply tiling using diminishing tile sizes.
Adds
continuous_tile_sizes
op to the transform dialect. This op, when given a tile size and a dimension, computes a series of diminishing tile sizes that can be used to tile the target along the given dimension. Additionally, this op also generates a series of chunk sizes that the corresponding tile sizes should be applied to along the given dimension.Adds
multiway
attribute totransform.structured.split
that enables multiway splitting of a single target op along the given dimension, as specified in a list enumerating the chunk sizes.