Allow optimizing with global function effects #5040

kripken · 2022-09-14T22:01:51Z

This adds a map of function name => the effects of that function to the
PassOptions structure. That lets us compute those effects once and then
use them in multiple passes afterwards. For example, that lets us optimize
away a call to a function that has no effects:

(drop (call $nothing))

[..]

(func $nothing
  ;; .. lots of stuff but no effects, only a returned value ..
)

Vacuum will remove that dropped call if we tell it that the called function has
no effects. Note that a nice result of adding this to the PassOptions struct
is that all passes will use the extra info automatically.

This is not enabled by default as the benefits seem rather minor, though it
does help in a small but noticeable way on J2Wasm code, where we use
call.without.effects and have situations like this:

(func $foo
  (call $bar)
)
(func $bar
  (call.without.effects ..)
)

The call to bar looks like it has effects, normally, but with global effect info
we know it actually doesn't.

To use this, one would do

--generate-global-effects [.. some passes that use the effects ..] --discard-global-effects

Discarding is not necessary, but if there is a pass later that adds effects, then not
discarding could lead to bugs, since we'd think there are fewer effects than there are.
(However, normal optimization passes never add effects, only remove them.)

It's also possible to call this multiple times:

--generate-global-effects -O3 --generate-global-effects -O3

That computes affects after the first -O3, and may find fewer effects than earlier.

This doesn't compute the full transitive closure of the effects across functions. That is,
when computing a function's effects, we don't look into its own calls. The simple case
so far is enough to handle the call.without.effects example from before (though it
may take multiple optimization cycles).

kripken · 2022-09-14T22:05:57Z

src/ir/effects.h

+          return;
+        }
+      }
+
      parent.calls = true;
      // When EH is enabled, any call can throw.
      if (parent.features.hasExceptionHandling() && parent.tryDepth == 0) {


Hmm, actually maybe we can use tryDepth even with the global effect info...

Can you elaborate?

I was thinking that if tryDepth > 0 then even if the call's effects say "I can throw", we could ignore that (since it will be caught). I added a testcase for that now, see a173e67 (and the commit before it adds the optimization.

Yeah I think this is handled already, because we don't even set throws_ in case it's within a try-catch.

kripken · 2022-09-14T22:26:02Z

Oh, I measured overhead, and this PR has no cost by itself.

aheejin

How does the pass in the middle can make use of the pre-generated effects?

aheejin · 2022-09-15T19:44:09Z

scripts/fuzz_opt.py

+    # note that no pass we run here should add effects to a function, so it is
+    # ok to run this pass and let the passes after it use the effects to
+    # optimize


How is this guaranteed? No optimization passes ever generate any effects, calls, or global writes?

I believe that's the case, yes. No pass that we fuzz, at least, adds new effects. Mainly since these are optimization passes, so they don't change observable behavior.

aheejin · 2022-09-15T20:07:25Z

src/passes/GlobalEffects.cpp

+
+namespace wasm {
+
+struct GenerateGlobalEffects : public Pass {


This is not function parallel by intention?

The main work could be, yeah. The problem is we need to clear the global info first, and we can't fill it in in parallel. So we have sequential work at the beginning and end. I opted to use ParallelFunctionAnalysis in the middle here, and make the pass as a whole not parallel - seems simplest?

Oh, I missed the main work is done in parallel. Thanks.

aheejin · 2022-09-15T20:08:57Z

src/ir/effects.h

+      if (parent.funcEffectsMap) {
+        auto iter = parent.funcEffectsMap->find(curr->target);
+        if (iter != parent.funcEffectsMap->end()) {
+          // We have effect information for this call target, and can just use
+          // that.
+          parent.mergeIn(iter->second);
+          return;
+        }
+      }


Is this to do some transitive computation right? For example, A calls B, B calls C and C has only a nop, if we examine functions in C->B->A order, we can deduce that A doesn't have any effects. But this may not be so when analyses run in parallel. But I notice GlobalEffects is not function parallel.. Is that not function parallel because of this? If so that benefit seems a little.. minor?

But this may not be so when analyses run in parallel.

Yes, atm this runs in parallel, and each function doesn't look any deeper than itself.

We could do a transitive closure operation afterwards. But I'm not sure it's worth it. This simple version already should handle all the real-world cases I can think of (with enough cycles of the optimizer, since each time it makes progress - there are no "cycles" to worry about here, as a cycle would be a potential infinite recursion anyhow, that we can't remove).

What if the target effect is in the process of being generated? Is it safe to merge it here while it hasn't been done?

funcEffectsMap is empty during the parallel computation (we clear it before, and only write to it after the parallel part). So we'd never reach this code path during the parallel computation.

gkdn · 2022-09-15T21:39:21Z

@kripken I haven't reviewed the CL nor understand the general architecture in binaryen but wanted to double check the behavior around something.

Let's say we did run generate-global-effects and remove call.without.effects at one point and run generate-global-effects again. Will the effects of call.without.effects disappear with that order or will it work incrementally accounting the previous knowledge about the calls?

For the record, in JsCompiler we actually run side-effect recalculation after every iteration of our <clinit> pruning pass since removal of <clinit> is one of the common ways of removing side effects (almost every static method has side effects due to implicit <clinit>).
Another common scenario is; we have development-only checks that only throws in development mode. Those checkes usually takes couple of iterations to get rid of and later makes the methods using them side effect free as well. For those it is usually enough to run the recalculation after couple of iterations.

gkdn · 2022-09-15T21:44:30Z

Ah I just saw discard-global-effects in the description so it seems like not the default behavior so that probably answers my question.

kripken · 2022-09-15T21:54:35Z

@aheejin

How does the pass in the middle can make use of the pre-generated effects?

The effects are all stored in PassOptions on the PassRunner. We pass that into EffectAnalyzer anyhow all the time now (I think you added that actually) so it's very convenient - when EffectAnalyzer visits a call we just check on PassOptions if we have global info for that function. So no change is needed to any pass code.

@gkdn

Let's say we did run generate-global-effects and remove call.without.effects at one point and run generate-global-effects again. Will the effects of call.without.effects disappear with that order or will it work incrementally accounting the previous knowledge about the calls?

Each time it starts from scratch when it computes the effects. So no previous information is saved. That means that the global effects become less effective after lowering the call.without.effects intrinsic. All the benefit of that intrinsic must be utilized before the lowering, in other words.

gkdn · 2022-09-15T22:20:55Z

Each time it starts from scratch when it computes the effects. So no previous information is saved. That means that the global effects become less effective after lowering the call.without.effects intrinsic. All the benefit of that intrinsic must be utilized before the lowering, in other words.

The intrinsic-lowering is the one that is handling that right? So we need make sure it is not executed almost the very end?

OOC, why does it always calculate from scratch? Shouldn't no-side-effect stick? Is there any value on recalculating side-effects of a call/method if it is already determined to be side-effect free unless someone calls discard-global-effects?

aheejin · 2022-09-15T22:43:05Z

src/ir/effects.h

+          // that. The one change we may want to make is to remove throws_, if
+          // the target function throws and we know that will be caught anyhow,
+          // the same as the code below for the general path.
+          const auto& targetEffects = iter->second;
+          if (targetEffects.throws_ && parent.tryDepth > 0) {
+            auto filteredEffects = targetEffects;
+            filteredEffects.throws_ = false;
+            parent.mergeIn(filteredEffects);
+          } else {
+            // Just merge in all the effects.
+            parent.mergeIn(targetEffects);
+          }


Not sure if this is necessary. We don't set throws_ if tryDepth > 0 in the first place.

binaryen/src/ir/effects.h

Lines 368 to 371 in 2fdb22b

if (self->parent.delegateTargets.count(curr->name) &&

self->parent.tryDepth == 0) {

self->parent.throws_ = true;

}

That code just handles delegates, I think?

In the testcase I added there isn't a delegate, but the problem still happens. Basically the visitCall logic should not set throws_ if we are in a try-catch-all. We have logic for that in the old code here:

binaryen/src/ir/effects.h

Lines 426 to 429 in 2fdb22b

// When EH is enabled, any call can throw.

if (parent.features.hasExceptionHandling() && parent.tryDepth == 0) {

parent.throws_ = true;

}

The new code above just does the same for when we use global effects.

Ah right, I was confused. I thought the tryDepth was the target's...

kripken · 2022-09-15T22:54:12Z

@gkdn

why does it always calculate from scratch?

Just simplicity and to avoid possible bugs. A normal optimization pass doesn't add new effects, but some instrumentation passes do, so this sequence is risky:

calculate effects
run pass that adds logging (calls to imports that log)
optimize

The last step might think a function has no effects when it does, and optimize out some logging.

We could say that it's the responsibility of a logging pass to clear the effects, but there's a risk a pass author might forget.

Anyhow, I think there is little harm to this approach, since leaving lower-intrinsics til near the end is the right thing anyhow.

gkdn · 2022-09-15T23:21:24Z

@kripken

I think there is little harm to this approach

Wouldn't this also affect the performance of this pass and preventing us running it frequently?

We could say that it's the responsibility of a logging pass to clear the effects, but there's a risk a pass author might forget.

Maybe, this can be pointed while enabling the pass?
e.g. --propagate-global-effects would do an incremental generation where we can use it vs. another project who is adding logging passes can use the other one?

Anyway just some food for thought, thanks for the patch.

kripken · 2022-09-15T23:35:05Z

@gkdn

e.g. --propagate-global-effects

Ah, interesting idea. Yes, we could add a variation like that that builds on/refines the previous info. I'd need to think about how that works with the extension to do the transitive closure, though, which I think is also worth looking into.

For now though does this PR look like a reasonable start? Or would you prefer I look into the propagate before landing?

Wouldn't this also affect the performance of this pass and preventing us running it frequently?

Not sure I follow this. I think running --generate-global-effects can be done frequently, and should have good results, if it's done before intrinsic lowering. If you run enough iterations before intrinsic lowering I think all the benefit could be achieved, unless I'm missing something?

(Inlining won't work on an intrinsic, so that is a possible concern, but I don't think inlining interacts badly with the specific optimization here of removing calls to code without effects. But again, maybe I'm missing something.)

gkdn · 2022-09-16T00:14:49Z

Sorry with "performance of the pass", I was referring to the total runtime of the pass, not its effectiveness.

And now I noticed that this is not working on the transitive closure :)
When analyzing transitive closure, it is kind of natural not to re-traverse a function if you already know that it doesn't have side effects. Also because a transitive analysis is costly, you may want to look into all the shortcuts that are available. In JsCompiler, side-effect analysis is one of the things that end up taking most time but also very effective in reducing the final code size especial for J2CL so we bite the bullet there. I don't really know the impact on execution performance but looking the current regression in benchmark, looks like V8 is not doing much wrt removing calls to methods without side effects.

Anyway, long story short, the pass as it is might have very limited effect as it is only covering more trivial scenarios and I expect that we will get benefit from more sophisticated stuff . Being said that my instincts so far on wasm/binaryen wasn't super accurate since we have generally different dynamics at play here. For example binaryen inlines aggressively which might be enabling it to see more stuff, potentially revealing more things that doesn't have side effects with a simpler analysis. On the other hand, I'm thinking we should eventually be less aggressive on inlining.

Maybe we can discuss this stuff further offline.

(Regardless I don't see any harm in submitting this version of the patch.)

aheejin · 2022-09-16T00:33:03Z

Just my two cents.. I tried to do some transitive analysis to remove try-catch in #4618 with a very limited gain. It was mostly only checking whether a call can throw or not, but there were very limited number of calls that we can rule out as "not throwing", and being able to throw qualifies as a global effect anyway. The reason we had so little of them were, many functions eventually call library functions, most of which can throw, or imported functions, which we should assume can throw.

I tried that on a C++ program, so Java programs may have different characteristics though.

gkdn · 2022-09-16T00:51:19Z

I think the limiting factor in the try-catch scenario is the number of try/catches. For example if you had try-catch on each statement, you will get a tone of of benefit.
In the case of side-effect, each unused statement is a candidate. I expect unused statements to be rather more frequent than the try/catches due to dead code paths. To give an example; if you have a logging statement and if the logging is disabled, then all the arguments become unused; including the complex ones that generate some debug text. However you cannot drop them unless you can prove that they don't have side effects.

I think the imports is an important point. We earlier discussed to surround them with call.without.effects where applicable but we haven't done that yet.

kripken · 2022-09-16T15:20:56Z

@gkdn Yes, I do think it's worth investigating the improvements. My reasoning, though, is that even this simple implementation can optimize all the possible things - it just will take more optimization cycles. I've thought about this for a while and I can't see new potential that is unlocked by either the transitive closure or refining known effects, just a reduction in cycles.

And this simple implementation runs in linear time. Together with the fact that in my tests the depth of call chains that are optimized here is pretty shallow, I think few optimization cycles are needed, and so I suspect this simple linear approach may be enough. But that could be wrong, and if we see many cycles are needed then we should definitely look into either or both of the transitive closure and refining known effects.

kripken · 2022-09-16T15:21:34Z

(but yes, for now let's land this and do more experimenting)

kripken added 12 commits September 14, 2022 11:25

start

77083fc

progress

92bc594

more

7eeb192

builds

cb9520e

format

529b4ca

fix

7d9037b

format

cb1250a

finish

eaad031

fix

805db8e

fix

532df2d

comment

8dbcaa3

more

cb54b83

kripken requested review from tlively and aheejin September 14, 2022 22:01

kripken commented Sep 14, 2022

View reviewed changes

kripken added 3 commits September 14, 2022 15:26

comment

bb1c635

test

6227596

fixlit

484030d

aheejin reviewed Sep 15, 2022

View reviewed changes

Filter throw effects

83d0503

test

a173e67

kripken added 2 commits September 15, 2022 15:25

test

6f60ac8

test

75e4252

aheejin reviewed Sep 15, 2022

View reviewed changes

aheejin approved these changes Sep 15, 2022

View reviewed changes

kripken merged commit 9894890 into main Sep 16, 2022

kripken deleted the global.func.effects branch September 16, 2022 15:22


		namespace wasm {

		struct GenerateGlobalEffects : public Pass {

	if (self->parent.delegateTargets.count(curr->name) &&
	self->parent.tryDepth == 0) {
	self->parent.throws_ = true;
	}

	// When EH is enabled, any call can throw.
	if (parent.features.hasExceptionHandling() && parent.tryDepth == 0) {
	parent.throws_ = true;
	}

Allow optimizing with global function effects #5040

Allow optimizing with global function effects #5040

Uh oh!

Conversation

kripken commented Sep 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken commented Sep 14, 2022

Uh oh!

aheejin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gkdn commented Sep 15, 2022

Uh oh!

gkdn commented Sep 15, 2022

Uh oh!

kripken commented Sep 15, 2022

Uh oh!

gkdn commented Sep 15, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken commented Sep 15, 2022

Uh oh!

gkdn commented Sep 15, 2022

Uh oh!

kripken commented Sep 15, 2022

Uh oh!

gkdn commented Sep 16, 2022

Uh oh!

aheejin commented Sep 16, 2022

Uh oh!

gkdn commented Sep 16, 2022

Uh oh!

kripken commented Sep 16, 2022

Uh oh!

kripken commented Sep 16, 2022

Uh oh!

Uh oh!

kripken commented Sep 14, 2022 •

edited

Loading