More aggressively unify duplicate lets #8204

abadams · 2024-04-19T19:57:48Z

The simplifier can also clean up most of these, but it's harder for it
because it has to consider that other mutations may have taken place.
Beefing this up has no impact on lowering times for most apps, but
something pathological was going on for local_laplacian. At 20 pyramid
levels, this speeds up lowering by 1.3x. At 50 pyramid levels it's 2.3x.
At 100 pyramid levels it's 4.1x.

It also slightly reduces binary size.

Deletes a bunch of code and speeds up lowering time of local laplacian with 20 pyramid levels by ~2.5%

It was O(n) for n facts. This makes it O(log(n)) This was particularly bad for pipelines with lots of inputs or outputs, because those pipelines have lots of asserts, which make for lots of facts to substitute in. Speeds up lowering of local laplacian with 20 pyramid levels (which has only one input and one output) by 1.09x Speeds up lowering of the adams 2019 cost model training pipeline (lots of weight inputs and lots outputs due to derivatives) by 1.5x Speeds up resnet50 (tons of weight inputs) lowering by 7.3x!

…o abadams/faster_substitute_facts

Interval::is_single_point() used to only compare expressions by shallow equality to see if they are the same Expr object. However, bounds_of_expr_in_scope is really improved if it uses deep equality instead, so it has a prepass that goes over the provided scope, calls equal(min, max) on everything, and fixes up anything where deep equality is true but shallow equality. This prepass costs O(n) for n things in scope, regardless of how complex the expression being analyzed is. So if you ask for the bounds of '4' say in a context where there are lots of things in the scope, it's absurdly slow. We were doing this! BoxTouched calls bounds_of_expr_in_scope lots of times on small index Exprs within the same very large scope. It's better to just make Interval::is_single_point() check deep equality. This speeds up local laplacian lowering by 1.1x, and resnet50 lowering by 1.5x. There were also places where intervals that were a single point were diverging due to carelessly written code. E.g. the interval [40*8, 40*8], where both of those 40*8s are the same Mul node, was being simplified like this: interval.min = simplify(interval.min); interval.max = simplify(interval.max); Not only does this do double the simplification work it should, but it also caused something that was a single point to diverge into not being a single point, because the repeated constant-folding creates a new Expr. With the new is_single_point this matters a lot less, but even so, I centralized simplification of intervals into a single helper that doesn't do the pointless double-simplification for single points. Some of these shallowly-unequal but deeply-equal Intervals were being created in bounds inference itself after the prepass, which may have been generating suboptimal bounds. This change should fix that in addition to the compile-time benefits. Also added a simplify call in SkipStages because I noticed when it processed specializations it was creating things like (condition) || (!condition).

The simplifier can also clean up most of these, but it's harder for it because it has to consider that other mutations may have taken place. Beefing this up has no impact on lowering times for most apps, but something pathological was going on for local_laplacian. At 20 pyramid levels, this speeds up lowering by 1.3x. At 50 pyramid levels it's 2.3x. At 100 pyramid levels it's 4.1x. It also slightly reduces binary size.

…ify_duplicate_lets

abadams · 2024-04-23T19:03:14Z

Ready for review

rootjalex · 2024-04-24T19:56:18Z

src/UnifyDuplicateLets.cpp

-                rewrites[op->name] = iter->second;
+            if (simplified.as<Variable>() ||
+                simplified.as<IntImm>()) {
+                // The RHS collapsed to just a Var or a constant, so uses of


Can't there be other constant types, like UIntImm, FloatImm, or StringImm, that we care about here?

I forgot to address this directly in review, instead of just in the update comment. Basically no. This is at a point in lowering where LetStmts are just bounds inference expressions.

Looking up with an Expr key and deep equality is expensive, so this was bad.

abadams · 2024-04-24T21:54:24Z

Better comment pushed. I also fixed that I was doing a double-lookup of the value in the map 'scope' in the case where it wasn't already there.

abadams · 2024-04-28T19:40:25Z

Just needs an approval

rootjalex

Lgtm

abadams added 13 commits April 16, 2024 17:06

Rewrite IREquality to use a more compact stack instead of deep recursion

538577a

Deletes a bunch of code and speeds up lowering time of local laplacian with 20 pyramid levels by ~2.5%

clang-tidy

7a60519

Fold in the version of equal in IRMatch.h/cpp

150f5e9

Add missing switch breaks

d3efa14

Merge remote-tracking branch 'origin/abadams/rewrite_ir_equality' int…

4dfbd72

…o abadams/faster_substitute_facts

Add missing comments

22a04bd

Merge remote-tracking branch 'origin/abadams/rewrite_ir_equality' int…

26b9cc2

…o abadams/faster_substitute_facts

Elaborate on why we treat NaNs as equal

ef4b2de

Merge remote-tracking branch 'origin/abadams/rewrite_ir_equality' int…

6aebeb3

…o abadams/faster_substitute_facts

clang-tidy

b15a648

abadams mentioned this pull request Apr 19, 2024

Faster vars used tracking in simplify let visitor #8205

Merged

Merge remote-tracking branch 'origin/main' into abadams/aggressive_un…

5ada5df

…ify_duplicate_lets

rootjalex reviewed Apr 24, 2024

View reviewed changes

abadams added 2 commits April 24, 2024 14:50

Clarify comment; Avoid double-lookup into the scope

fc4229e

Looking up with an Expr key and deep equality is expensive, so this was bad.

Add a std::move

251d1fb

rootjalex approved these changes Apr 28, 2024

View reviewed changes

abadams merged commit 8202163 into main Apr 28, 2024
19 checks passed

BrewTestBot mentioned this pull request Jul 17, 2024

halide 18.0.0 Homebrew/homebrew-core#177657

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More aggressively unify duplicate lets #8204

More aggressively unify duplicate lets #8204

abadams commented Apr 19, 2024 •

edited

Loading

abadams commented Apr 23, 2024

rootjalex Apr 24, 2024

abadams Apr 28, 2024

abadams commented Apr 24, 2024

abadams commented Apr 28, 2024

rootjalex left a comment

More aggressively unify duplicate lets #8204

More aggressively unify duplicate lets #8204

Conversation

abadams commented Apr 19, 2024 • edited Loading

abadams commented Apr 23, 2024

rootjalex Apr 24, 2024

Choose a reason for hiding this comment

abadams Apr 28, 2024

Choose a reason for hiding this comment

abadams commented Apr 24, 2024

abadams commented Apr 28, 2024

rootjalex left a comment

Choose a reason for hiding this comment

abadams commented Apr 19, 2024 •

edited

Loading