-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[mlir] Enable decoupling two kinds of greedy behavior. #104649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-mlir-core Author: Jacques Pienaar (jpienaar) ChangesThe greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. Full diff: https://github.com/llvm/llvm-project/pull/104649.diff 2 Files Affected:
diff --git a/mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h b/mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
index eaff85804f6b3d..061cdd4b7d4d94 100644
--- a/mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
+++ b/mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
@@ -91,6 +91,15 @@ class GreedyRewriteConfig {
/// An optional listener that should be notified about IR modifications.
RewriterBase::Listener *listener = nullptr;
+
+ // Whether this should fold while greedily rewriting.
+ //
+ // Note: greedy here generally refers to two forms, 1) greedily applying
+ // patterns based purely on benefit and applying without backtracking using
+ // default cost model, 2) greedily folding where possible while attempting to
+ // match and rewrite using the provided patterns. With this option set to
+ // false it only does the former.
+ bool fold = true;
};
//===----------------------------------------------------------------------===//
@@ -104,8 +113,8 @@ class GreedyRewriteConfig {
/// The greedy rewrite may prematurely stop after a maximum number of
/// iterations, which can be configured in the configuration parameter.
///
-/// Also performs folding and simple dead-code elimination before attempting to
-/// match any of the provided patterns.
+/// Also performs simple dead-code elimination before attempting to match any of
+/// the provided patterns.
///
/// A region scope can be set in the configuration parameter. By default, the
/// scope is set to the specified region. Only in-scope ops are added to the
@@ -117,10 +126,18 @@ class GreedyRewriteConfig {
///
/// Note: This method does not apply patterns to the region's parent operation.
LogicalResult
+applyPatternsGreedily(Region ®ion, const FrozenRewritePatternSet &patterns,
+ GreedyRewriteConfig config = GreedyRewriteConfig(),
+ bool *changed = nullptr);
+/// Same as `applyPatternsAndGreedily` above with folding.
+inline LogicalResult
applyPatternsAndFoldGreedily(Region ®ion,
const FrozenRewritePatternSet &patterns,
GreedyRewriteConfig config = GreedyRewriteConfig(),
- bool *changed = nullptr);
+ bool *changed = nullptr) {
+ config.fold = true;
+ return applyPatternsGreedily(region, patterns, config, changed);
+}
/// Rewrite ops nested under the given operation, which must be isolated from
/// above, by repeatedly applying the highest benefit patterns in a greedy
@@ -129,8 +146,8 @@ applyPatternsAndFoldGreedily(Region ®ion,
/// The greedy rewrite may prematurely stop after a maximum number of
/// iterations, which can be configured in the configuration parameter.
///
-/// Also performs folding and simple dead-code elimination before attempting to
-/// match any of the provided patterns.
+/// Also performs simple dead-code elimination before attempting to match any of
+/// the provided patterns.
///
/// This overload runs a separate greedy rewrite for each region of the
/// specified op. A region scope can be set in the configuration parameter. By
@@ -147,10 +164,9 @@ applyPatternsAndFoldGreedily(Region ®ion,
///
/// Note: This method does not apply patterns to the given operation itself.
inline LogicalResult
-applyPatternsAndFoldGreedily(Operation *op,
- const FrozenRewritePatternSet &patterns,
- GreedyRewriteConfig config = GreedyRewriteConfig(),
- bool *changed = nullptr) {
+applyPatternsGreedily(Operation *op, const FrozenRewritePatternSet &patterns,
+ GreedyRewriteConfig config = GreedyRewriteConfig(),
+ bool *changed = nullptr) {
bool anyRegionChanged = false;
bool failed = false;
for (Region ®ion : op->getRegions()) {
@@ -164,6 +180,15 @@ applyPatternsAndFoldGreedily(Operation *op,
*changed = anyRegionChanged;
return failure(failed);
}
+/// Same as `applyPatternsGreedily` above with folding.
+inline LogicalResult
+applyPatternsAndFoldGreedily(Operation *op,
+ const FrozenRewritePatternSet &patterns,
+ GreedyRewriteConfig config = GreedyRewriteConfig(),
+ bool *changed = nullptr) {
+ config.fold = true;
+ return applyPatternsGreedily(op, patterns, config, changed);
+}
/// Rewrite the specified ops by repeatedly applying the highest benefit
/// patterns in a greedy worklist driven manner until a fixpoint is reached.
@@ -171,8 +196,8 @@ applyPatternsAndFoldGreedily(Operation *op,
/// The greedy rewrite may prematurely stop after a maximum number of
/// iterations, which can be configured in the configuration parameter.
///
-/// Also performs folding and simple dead-code elimination before attempting to
-/// match any of the provided patterns.
+/// Also performs simple dead-code elimination before attempting to match any of
+/// the provided patterns.
///
/// Newly created ops and other pre-existing ops that use results of rewritten
/// ops or supply operands to such ops are also processed, unless such ops are
@@ -194,10 +219,19 @@ applyPatternsAndFoldGreedily(Operation *op,
/// the IR was modified at all. `allOpsErased` is set to "true" if all ops in
/// `ops` were erased.
LogicalResult
+applyOpPatternsGreedily(ArrayRef<Operation *> ops,
+ const FrozenRewritePatternSet &patterns,
+ GreedyRewriteConfig config = GreedyRewriteConfig(),
+ bool *changed = nullptr, bool *allErased = nullptr);
+/// Same as `applyOpPatternsGreedily` with folding.
+inline LogicalResult
applyOpPatternsAndFold(ArrayRef<Operation *> ops,
const FrozenRewritePatternSet &patterns,
GreedyRewriteConfig config = GreedyRewriteConfig(),
- bool *changed = nullptr, bool *allErased = nullptr);
+ bool *changed = nullptr, bool *allErased = nullptr) {
+ config.fold = true;
+ return applyOpPatternsGreedily(ops, patterns, config, changed, allErased);
+}
} // namespace mlir
diff --git a/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp b/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
index e0d0acd122e26b..4e8b74620da5fe 100644
--- a/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
+++ b/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
@@ -6,7 +6,7 @@
//
//===----------------------------------------------------------------------===//
//
-// This file implements mlir::applyPatternsAndFoldGreedily.
+// This file implements mlir::applyPatternsGreedily.
//
//===----------------------------------------------------------------------===//
@@ -488,7 +488,7 @@ bool GreedyPatternRewriteDriver::processWorklist() {
// infinite folding loop, as every constant op would be folded to an
// Attribute and then immediately be rematerialized as a constant op, which
// is then put on the worklist.
- if (!op->hasTrait<OpTrait::ConstantLike>()) {
+ if (config.fold && !op->hasTrait<OpTrait::ConstantLike>()) {
SmallVector<OpFoldResult> foldResults;
if (succeeded(op->fold(foldResults))) {
LLVM_DEBUG(logResultWithLine("success", "operation was folded"));
@@ -840,6 +840,11 @@ LogicalResult RegionPatternRewriteDriver::simplify(bool *changed) && {
// regions to enable more aggressive CSE'ing).
OperationFolder folder(ctx, this);
auto insertKnownConstant = [&](Operation *op) {
+ // This hoisting is to enable more folding, so skip checking if known
+ // constant, updating dense map etc if not doing folding.
+ if (!config.fold)
+ return false;
+
// Check for existing constants when populating the worklist. This avoids
// accidentally reversing the constant order during processing.
Attribute constValue;
@@ -894,9 +899,9 @@ LogicalResult RegionPatternRewriteDriver::simplify(bool *changed) && {
}
LogicalResult
-mlir::applyPatternsAndFoldGreedily(Region ®ion,
- const FrozenRewritePatternSet &patterns,
- GreedyRewriteConfig config, bool *changed) {
+mlir::applyPatternsGreedily(Region ®ion,
+ const FrozenRewritePatternSet &patterns,
+ GreedyRewriteConfig config, bool *changed) {
// The top-level operation must be known to be isolated from above to
// prevent performing canonicalizations on operations defined at or above
// the region containing 'op'.
@@ -1012,7 +1017,7 @@ static Region *findCommonAncestor(ArrayRef<Operation *> ops) {
return region;
}
-LogicalResult mlir::applyOpPatternsAndFold(
+LogicalResult mlir::applyOpPatternsGreedily(
ArrayRef<Operation *> ops, const FrozenRewritePatternSet &patterns,
GreedyRewriteConfig config, bool *changed, bool *allErased) {
if (ops.empty()) {
|
@llvm/pr-subscribers-mlir Author: Jacques Pienaar (jpienaar) ChangesThe greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. Full diff: https://github.com/llvm/llvm-project/pull/104649.diff 2 Files Affected:
diff --git a/mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h b/mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
index eaff85804f6b3d..061cdd4b7d4d94 100644
--- a/mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
+++ b/mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
@@ -91,6 +91,15 @@ class GreedyRewriteConfig {
/// An optional listener that should be notified about IR modifications.
RewriterBase::Listener *listener = nullptr;
+
+ // Whether this should fold while greedily rewriting.
+ //
+ // Note: greedy here generally refers to two forms, 1) greedily applying
+ // patterns based purely on benefit and applying without backtracking using
+ // default cost model, 2) greedily folding where possible while attempting to
+ // match and rewrite using the provided patterns. With this option set to
+ // false it only does the former.
+ bool fold = true;
};
//===----------------------------------------------------------------------===//
@@ -104,8 +113,8 @@ class GreedyRewriteConfig {
/// The greedy rewrite may prematurely stop after a maximum number of
/// iterations, which can be configured in the configuration parameter.
///
-/// Also performs folding and simple dead-code elimination before attempting to
-/// match any of the provided patterns.
+/// Also performs simple dead-code elimination before attempting to match any of
+/// the provided patterns.
///
/// A region scope can be set in the configuration parameter. By default, the
/// scope is set to the specified region. Only in-scope ops are added to the
@@ -117,10 +126,18 @@ class GreedyRewriteConfig {
///
/// Note: This method does not apply patterns to the region's parent operation.
LogicalResult
+applyPatternsGreedily(Region ®ion, const FrozenRewritePatternSet &patterns,
+ GreedyRewriteConfig config = GreedyRewriteConfig(),
+ bool *changed = nullptr);
+/// Same as `applyPatternsAndGreedily` above with folding.
+inline LogicalResult
applyPatternsAndFoldGreedily(Region ®ion,
const FrozenRewritePatternSet &patterns,
GreedyRewriteConfig config = GreedyRewriteConfig(),
- bool *changed = nullptr);
+ bool *changed = nullptr) {
+ config.fold = true;
+ return applyPatternsGreedily(region, patterns, config, changed);
+}
/// Rewrite ops nested under the given operation, which must be isolated from
/// above, by repeatedly applying the highest benefit patterns in a greedy
@@ -129,8 +146,8 @@ applyPatternsAndFoldGreedily(Region ®ion,
/// The greedy rewrite may prematurely stop after a maximum number of
/// iterations, which can be configured in the configuration parameter.
///
-/// Also performs folding and simple dead-code elimination before attempting to
-/// match any of the provided patterns.
+/// Also performs simple dead-code elimination before attempting to match any of
+/// the provided patterns.
///
/// This overload runs a separate greedy rewrite for each region of the
/// specified op. A region scope can be set in the configuration parameter. By
@@ -147,10 +164,9 @@ applyPatternsAndFoldGreedily(Region ®ion,
///
/// Note: This method does not apply patterns to the given operation itself.
inline LogicalResult
-applyPatternsAndFoldGreedily(Operation *op,
- const FrozenRewritePatternSet &patterns,
- GreedyRewriteConfig config = GreedyRewriteConfig(),
- bool *changed = nullptr) {
+applyPatternsGreedily(Operation *op, const FrozenRewritePatternSet &patterns,
+ GreedyRewriteConfig config = GreedyRewriteConfig(),
+ bool *changed = nullptr) {
bool anyRegionChanged = false;
bool failed = false;
for (Region ®ion : op->getRegions()) {
@@ -164,6 +180,15 @@ applyPatternsAndFoldGreedily(Operation *op,
*changed = anyRegionChanged;
return failure(failed);
}
+/// Same as `applyPatternsGreedily` above with folding.
+inline LogicalResult
+applyPatternsAndFoldGreedily(Operation *op,
+ const FrozenRewritePatternSet &patterns,
+ GreedyRewriteConfig config = GreedyRewriteConfig(),
+ bool *changed = nullptr) {
+ config.fold = true;
+ return applyPatternsGreedily(op, patterns, config, changed);
+}
/// Rewrite the specified ops by repeatedly applying the highest benefit
/// patterns in a greedy worklist driven manner until a fixpoint is reached.
@@ -171,8 +196,8 @@ applyPatternsAndFoldGreedily(Operation *op,
/// The greedy rewrite may prematurely stop after a maximum number of
/// iterations, which can be configured in the configuration parameter.
///
-/// Also performs folding and simple dead-code elimination before attempting to
-/// match any of the provided patterns.
+/// Also performs simple dead-code elimination before attempting to match any of
+/// the provided patterns.
///
/// Newly created ops and other pre-existing ops that use results of rewritten
/// ops or supply operands to such ops are also processed, unless such ops are
@@ -194,10 +219,19 @@ applyPatternsAndFoldGreedily(Operation *op,
/// the IR was modified at all. `allOpsErased` is set to "true" if all ops in
/// `ops` were erased.
LogicalResult
+applyOpPatternsGreedily(ArrayRef<Operation *> ops,
+ const FrozenRewritePatternSet &patterns,
+ GreedyRewriteConfig config = GreedyRewriteConfig(),
+ bool *changed = nullptr, bool *allErased = nullptr);
+/// Same as `applyOpPatternsGreedily` with folding.
+inline LogicalResult
applyOpPatternsAndFold(ArrayRef<Operation *> ops,
const FrozenRewritePatternSet &patterns,
GreedyRewriteConfig config = GreedyRewriteConfig(),
- bool *changed = nullptr, bool *allErased = nullptr);
+ bool *changed = nullptr, bool *allErased = nullptr) {
+ config.fold = true;
+ return applyOpPatternsGreedily(ops, patterns, config, changed, allErased);
+}
} // namespace mlir
diff --git a/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp b/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
index e0d0acd122e26b..4e8b74620da5fe 100644
--- a/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
+++ b/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
@@ -6,7 +6,7 @@
//
//===----------------------------------------------------------------------===//
//
-// This file implements mlir::applyPatternsAndFoldGreedily.
+// This file implements mlir::applyPatternsGreedily.
//
//===----------------------------------------------------------------------===//
@@ -488,7 +488,7 @@ bool GreedyPatternRewriteDriver::processWorklist() {
// infinite folding loop, as every constant op would be folded to an
// Attribute and then immediately be rematerialized as a constant op, which
// is then put on the worklist.
- if (!op->hasTrait<OpTrait::ConstantLike>()) {
+ if (config.fold && !op->hasTrait<OpTrait::ConstantLike>()) {
SmallVector<OpFoldResult> foldResults;
if (succeeded(op->fold(foldResults))) {
LLVM_DEBUG(logResultWithLine("success", "operation was folded"));
@@ -840,6 +840,11 @@ LogicalResult RegionPatternRewriteDriver::simplify(bool *changed) && {
// regions to enable more aggressive CSE'ing).
OperationFolder folder(ctx, this);
auto insertKnownConstant = [&](Operation *op) {
+ // This hoisting is to enable more folding, so skip checking if known
+ // constant, updating dense map etc if not doing folding.
+ if (!config.fold)
+ return false;
+
// Check for existing constants when populating the worklist. This avoids
// accidentally reversing the constant order during processing.
Attribute constValue;
@@ -894,9 +899,9 @@ LogicalResult RegionPatternRewriteDriver::simplify(bool *changed) && {
}
LogicalResult
-mlir::applyPatternsAndFoldGreedily(Region ®ion,
- const FrozenRewritePatternSet &patterns,
- GreedyRewriteConfig config, bool *changed) {
+mlir::applyPatternsGreedily(Region ®ion,
+ const FrozenRewritePatternSet &patterns,
+ GreedyRewriteConfig config, bool *changed) {
// The top-level operation must be known to be isolated from above to
// prevent performing canonicalizations on operations defined at or above
// the region containing 'op'.
@@ -1012,7 +1017,7 @@ static Region *findCommonAncestor(ArrayRef<Operation *> ops) {
return region;
}
-LogicalResult mlir::applyOpPatternsAndFold(
+LogicalResult mlir::applyOpPatternsGreedily(
ArrayRef<Operation *> ops, const FrozenRewritePatternSet &patterns,
GreedyRewriteConfig config, bool *changed, bool *allErased) {
if (ops.empty()) {
|
I don’t understand what this means, or how it relates to greedy, or what is the actual problem to solve right now…
I am slightly concerned that this request comes from some wrong design or incorrect root causing. |
If one has a set of patterns that have to all be applied before another set of patterns, then one needs to split it into multiple passes today (the priorities are not sufficient depending on iteration order). Now one ends up with trying to do the same folding multiple times needlessly vs if you just did it in one. This enables using different sets of patterns relatively cheaply, rather than all checks for folding being wasted until last/if ever. A lot of interface and dialect fallback hook querying until then, all wasted: 38% of total execution time spent in mlir::detail::constant_op_binder::match, of that significant time in hasTrait called from there in the profile I'm looking at. I don't think there is any folding even happening in the end, just a lot of attempts made (e.g., its not expensive folders being an issue, just actually querying values to see if they are constant and a folder can be used is). |
To make it more concrete: All passes/pipelines combined 140s
So ~50% of time is spent just checking if an operation is a constant (don't have all the line info to see time spent in repopulating constant hash map etc). The only ways to reduce overhead at the moment is to combine into larger patternsets (only sometimes possible) or not using the rewrite driver (which means you either lose the worklist management & nicer debug/tracing facilities or just create duplication). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea to make the driver more customizable, esp. if it can help with performance. By specifying patterns, users have control over which ops will be processed. But it is currently not possible to control which ops will be visited for folding purposes, so all ops are visited.
@@ -91,6 +91,15 @@ class GreedyRewriteConfig { | |||
|
|||
/// An optional listener that should be notified about IR modifications. | |||
RewriterBase::Listener *listener = nullptr; | |||
|
|||
// Whether this should fold while greedily rewriting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: tripple ///
@@ -840,6 +840,11 @@ LogicalResult RegionPatternRewriteDriver::simplify(bool *changed) && { | |||
// regions to enable more aggressive CSE'ing). | |||
OperationFolder folder(ctx, this); | |||
auto insertKnownConstant = [&](Operation *op) { | |||
// This hoisting is to enable more folding, so skip checking if known | |||
// constant, updating dense map etc if not doing folding. | |||
if (!config.fold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insertKnownConstant
is not really doing "folding": it does not call the folder. What it does is CSE'ing constants. So I would go as far as adding a second config flag: cseConstants
(see #89552). Alternatively, update the GreedyRewriteConfig::fold
documentation and mention that setting it to "false" also disables the CSE'ing of constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am quite concern with the accumulation of flags: this seems to me like working around problems instead of addressing them, and adding complexity that I don't see manageable on the long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What problem do you think is being worked around?
For me I see this as I'm a user that wants a worklist driven driver, I have a curated pipeline where I know there isn't any folding here (e.g., the ops being created have no folding, its part of pipeline where folding is ~0% likely, so the ROI on checking if it can be folded is none), but they have no way of communicating that intent. The problem is expert users not being able to express intent without crafting own/local forking.
Now, constant CSE'ing as Matthias correctly points out is different. It is documented here as enabling more folding, so disabled too (this is cheaper here too as all constant hoisting would have happened earlier and there may be no additional chance here to produce additional constants that would need CSE'ing). But it is different so I could see controlling separately is good. And don't mind doing that. I think this is a framework for high performance, so reasonably defaults that can be tuned seems reasonable to me. I considered the maintenance cost but specialized drivers that share most of the same code seemed higher maintenance cost to me.
Is the hasTrait purely the ConstantLike trait check? or cumulative checking for the various operands/etc.? Feels like there is a lot of low hanging fruit here for optimization, even outside of decoupling folding. Also, how many iterations per driver run are you seeing? I've kind of always felt that the multiple iteration stuff was a big hammer and not optimal in many ways. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There aren't any tests.
Its cumulative. This is across all pass executions for the pipeline, so there are multiple greedy pattern applications along with other passes/walks. There 3 iterations at most for any of them (2 being minimum if mutation is made). Now, this is the entire pipeline so at the start of pipeline there may be some cleanup value. But none of the dialects matched/produced implement folders, so in this pipeline beyond the first pass, making checking for constants cheaper just makes the wasted time lower (it doesn't really have a benefit, disabling it improves compile time significantly with no negative performance impact). |
Fixed. |
// NOCSE-DAG: arith.constant 2 : i32 | ||
// NOCSE-DAG: arith.constant 1 : i32 | ||
// NOCSE-DAG: arith.constant 2 : i32 | ||
// NOCSE-DAG: arith.constant 3 : i32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the behavior we want here? Seems like we still have some constant folding (more constants are introduced) but they aren't deduped!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused for second too :) This is the no CSE case. So we do have folding, just not constant depupe. This is flag Matthias suggested, so these two features are independent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment for fold says:
/// Whether this should fold while greedily rewriting. This also disables
/// CSE'ing constants.
bool fold = true;
Why aren't we keeping these independent then? Disabling Fold shouldn't disable CSE if the CSE option is true...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of date comment, sorry. It is/should be independent. I'll double check now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we separate the new API (applyPatternsGreedily
), switching the in-tree pases to use it, and deprecation into it into a few PRs?
I'm worried that landing all of these in one step can results in reverts that we can avoid with a gradual rollout... Especially with lots of folks disappearing for vacation and potentially not being around to investigate breakages in their areas of interest.
Won't the fold being set to true by default make this an NFC change? |
What tripped me is that in config.fold = true;
return applyPatternsGreedily(region, patterns, config, changed);
} @jpienaar could you add a tl;dr section to the PR description that lists the API-level changes and suggests how to integrate this in downstream projects? The motivation is very clear, but what to do with the PR not necessarily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay for sparse (assuming fold remains the default, since all tests pass... ;-)
No this does not change the default: it is because this API has "Fold" in the name that we override the flag to ensure preserving the existing behavior of this API. Having this override or not is NFC in the current condition though, it seems like mostly "defensive programming" here. Folks having the deprecated message should not have anything else to do than a mass replace of |
Yes this should be a NFC change. And yes just defense (the default is to fold, but the current API with fold in the name does take an option and so someone could have disabled folding in the config, called function that says it folds and then be surprised - I mean they really shouldn't :)). Yes I updated by simply doing find and replace textually. With current default nothing else needed. Let me update description to capture this. |
The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) wrt how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering, one is required to applying patterns in specific order/different passes. But if using the driver one ends up needlessly retrying folding or having multiple rounds of folding attempts, where one final run would have sufficed. It also is rather confusing to users that just want to apply some patterns while having all the convenience and structure to have unrelated changes to IR. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways.
This partially incorporates llvm#89552. I haven't exposed it on canonicalizer pass as that could be distinct discussion.
✅ With the latest revision this PR passed the C/C++ code formatter. |
With this patch, I am getting:
Is there any way you could take a look at this? Thanks! |
As per the commit message, you should just replace call:
|
Thanks! I've checked in 9901906. |
**Context:** We update the llvm version tagged by jax 0.6.0: ``` mhlo=617a9361d186199480c080c9e8c474a5e30c22d1 llvm=179d30f8c3fddd3c85056fd2b8e877a4a8513158 ``` We also update Enzyme to the latest version, which is 0.0.180, at commit `db0181320d6e425ee963bd496ed0d8dbb615be18` **Description of the Change:** Firstly, jax recently moved from the Google github organization to its own jax-ml organization. This means the urls, and the retrieval method for the underlying llvm and mhlo git commit tags, needs to be updated. (Thanks @mehrdad2m !) Now on to the actual changes. I will list the changes in increasing complexity. 1. The new enzyme cmake target is `EnzymeStatic-21` (from 20) 2. Enzyme works with a later llvm then our target, so it has some llvm intrinsics unknown to the one we are targeting. We patch them away. They do not concern us since they are all intrinsics for nvidia backends. 3. `applyPatternsAndFoldGreedily` is removed. Drop-in replacement is `applyPatternsGreedily`. llvm/llvm-project#104649, llvm/llvm-project#126701 4. ops with `CallOpInterface` must have two new optional attributes `arg_attrs` and `res_attrs` llvm/llvm-project#123176 5. `CallInterfaceCallable` objects now must be directly casted to the callee `SymbolRefAttr`, i.e. `callee.get<SymbolRefAttr>()` -> `cast<SymbolRefAttr>(callee)` llvm/llvm-project@35e8989 6. The `lookupOrCreateFn` family of functions now return `FailureOr<funcop>` instead of just `funcop`, so a `.value()` needs to be used to retrieve the underlying `funcop`. llvm/llvm-project@e84f6b6 7. The cpp api for `OneShotBufferizePassOptions` no longer needs complicated lambdas for the type converter options. They can be set with the `mlir::bufferization::LayoutMapOption::IdentityLayoutMap` options directly. 8. The individual `match` and `rewrite` methods in pattern rewrites are removed. Use the two-in-one `matchAndRewrite` instead. llvm/llvm-project#129861 9. For rewrite patterns with 1-to-N convertions, a new `macthAndRewrite` overload with `OneToNOpAdaptor` must be used. For us, this is only the `catalyst.list*` ops. llvm/llvm-project#116470 10. The lowering of `cf::AssertOp` to llvm was split from the overall`--covert-cf-to-llvm` pass. We need to manually call this separate pattern for cf.assert duriing quantum to llvm dialect lowering, where we also convert cf to llvm. https://github.com/llvm/llvm-project/pull/120431/files 11. The new mhlo depends on a [shardy](https://github.com/openxla/shardy) dialect. Shardy is built with bazel, not cmake. Building shardy ourselves would be very difficult (not having bazel in our build ecosystem is a hard constraint, cc @mlxd ), and also not necessary (we just use mhlo for their "standard" passes). We thus patch out all shardy components. 12. Three necessary passes were removed in mhlo: `mhlo-legalize-control-flow`, `mhlo-legalize-to-std`, `hlo-legalize-sort` tensorflow/mlir-hlo@4a640be#diff-ef0d7e30da19a396ba036405a9ef636f8b1be194618b0a90f4602671fc2ef34d tensorflow/mlir-hlo@2a5e267#diff-f8c7cb07b43593403e00e0dbf9983f0186b4eb70368cc99af3b924061f1ea46f - Alongside the removal of `mhlo-legalize-to-std`, the cmake target `MhloToStandard` was removed too. We simply patch them back for now. **For the above two points, note that there will be an overall migration to the stablehlo repo, as mhlo is sunseting. Therefore, spending too much time on this isn't necessary, so we just patch.** 13. The new pattern applicator (`applyPatternsGreedily`) is more aggressive in dead code elimination, and is eliminating dead `Value`s in the adjoint gradient method. The `nodealloc` function we generate for adjoint gradient lowering used to only return the qreg, not the expval result. This causes the expval op to be eliminated since it has no users. This further causes wrong gradient results, since the entire program, all ops included (regardless of dead or not), impacts the gradient through chain rule. To avoid this, we return the expval result as well. In doing this, we implicitly assume that differentiated qnodes can only return expval. Although this assumption is true and also restricted by frontend, ideally we should not have it hard coded. We leave this as a TODO for a future feature. 14. The old `--buffer-deallocation` pass is removed. Intended replacement is `--buffer-deallocation-pipeline`. This migration is very complicated. We simply add back the old buffer deallocation pass in the catalyst dialect as a util for now. We will revisit this in #1778 . mlir lit test updates: 1. `bufferization.to_tensor/memref` updated assembly format 2. gradient adjoint lowering test returns both qreg and expval 3. Some inverse unrealized conversion cast pairs are canceled by the new pattern rewriter. 4. `llvm.mlir.undef` is deprecated, use `llvm.mlir.poison` instead. llvm/llvm-project#125629 **Benefits:** Up to date with upstream versions. [sc-92017] --------- Co-authored-by: Tzung-Han Juang <tzunghan.juang@gmail.com> Co-authored-by: Ritu Thombre <42207923+ritu-thombre99@users.noreply.github.com> Co-authored-by: Mehrdad Malekmohammadi <mehrdad.malek@xanadu.ai> Co-authored-by: Mehrdad Malek <39844030+mehrdad2m@users.noreply.github.com> Co-authored-by: David Ittah <dime10@users.noreply.github.com> Co-authored-by: Joey Carter <joseph.carter@xanadu.ai>
The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed.
Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g.,
find ./ -type f -exec sed -i 's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;
variety) as the API hasn't changed between the two.