Skip to content

Allow optimizing with global function effects #5040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Sep 16, 2022
Merged

Conversation

kripken
Copy link
Member

@kripken kripken commented Sep 14, 2022

This adds a map of function name => the effects of that function to the
PassOptions structure. That lets us compute those effects once and then
use them in multiple passes afterwards. For example, that lets us optimize
away a call to a function that has no effects:

(drop (call $nothing))

[..]

(func $nothing
  ;; .. lots of stuff but no effects, only a returned value ..
)

Vacuum will remove that dropped call if we tell it that the called function has
no effects. Note that a nice result of adding this to the PassOptions struct
is that all passes will use the extra info automatically.

This is not enabled by default as the benefits seem rather minor, though it
does help in a small but noticeable way on J2Wasm code, where we use
call.without.effects and have situations like this:

(func $foo
  (call $bar)
)
(func $bar
  (call.without.effects ..)
)

The call to bar looks like it has effects, normally, but with global effect info
we know it actually doesn't.

To use this, one would do

--generate-global-effects [.. some passes that use the effects ..] --discard-global-effects

Discarding is not necessary, but if there is a pass later that adds effects, then not
discarding could lead to bugs, since we'd think there are fewer effects than there are.
(However, normal optimization passes never add effects, only remove them.)

It's also possible to call this multiple times:

--generate-global-effects -O3 --generate-global-effects -O3

That computes affects after the first -O3, and may find fewer effects than earlier.

This doesn't compute the full transitive closure of the effects across functions. That is,
when computing a function's effects, we don't look into its own calls. The simple case
so far is enough to handle the call.without.effects example from before (though it
may take multiple optimization cycles).

@kripken kripken requested review from tlively and aheejin September 14, 2022 22:01
return;
}
}

parent.calls = true;
// When EH is enabled, any call can throw.
if (parent.features.hasExceptionHandling() && parent.tryDepth == 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, actually maybe we can use tryDepth even with the global effect info...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that if tryDepth > 0 then even if the call's effects say "I can throw", we could ignore that (since it will be caught). I added a testcase for that now, see a173e67 (and the commit before it adds the optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this is handled already, because we don't even set throws_ in case it's within a try-catch.

@kripken
Copy link
Member Author

kripken commented Sep 14, 2022

Oh, I measured overhead, and this PR has no cost by itself.

Copy link
Member

@aheejin aheejin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the pass in the middle can make use of the pre-generated effects?

Comment on lines +1275 to +1277
# note that no pass we run here should add effects to a function, so it is
# ok to run this pass and let the passes after it use the effects to
# optimize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this guaranteed? No optimization passes ever generate any effects, calls, or global writes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that's the case, yes. No pass that we fuzz, at least, adds new effects. Mainly since these are optimization passes, so they don't change observable behavior.


namespace wasm {

struct GenerateGlobalEffects : public Pass {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not function parallel by intention?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main work could be, yeah. The problem is we need to clear the global info first, and we can't fill it in in parallel. So we have sequential work at the beginning and end. I opted to use ParallelFunctionAnalysis in the middle here, and make the pass as a whole not parallel - seems simplest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed the main work is done in parallel. Thanks.

src/ir/effects.h Outdated
Comment on lines 431 to 439
if (parent.funcEffectsMap) {
auto iter = parent.funcEffectsMap->find(curr->target);
if (iter != parent.funcEffectsMap->end()) {
// We have effect information for this call target, and can just use
// that.
parent.mergeIn(iter->second);
return;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to do some transitive computation right? For example, A calls B, B calls C and C has only a nop, if we examine functions in C->B->A order, we can deduce that A doesn't have any effects. But this may not be so when analyses run in parallel. But I notice GlobalEffects is not function parallel.. Is that not function parallel because of this? If so that benefit seems a little.. minor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this may not be so when analyses run in parallel.

Yes, atm this runs in parallel, and each function doesn't look any deeper than itself.

We could do a transitive closure operation afterwards. But I'm not sure it's worth it. This simple version already should handle all the real-world cases I can think of (with enough cycles of the optimizer, since each time it makes progress - there are no "cycles" to worry about here, as a cycle would be a potential infinite recursion anyhow, that we can't remove).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the target effect is in the process of being generated? Is it safe to merge it here while it hasn't been done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funcEffectsMap is empty during the parallel computation (we clear it before, and only write to it after the parallel part). So we'd never reach this code path during the parallel computation.

@gkdn
Copy link
Contributor

gkdn commented Sep 15, 2022

@kripken I haven't reviewed the CL nor understand the general architecture in binaryen but wanted to double check the behavior around something.

Let's say we did run generate-global-effects and remove call.without.effects at one point and run generate-global-effects again. Will the effects of call.without.effects disappear with that order or will it work incrementally accounting the previous knowledge about the calls?

For the record, in JsCompiler we actually run side-effect recalculation after every iteration of our <clinit> pruning pass since removal of <clinit> is one of the common ways of removing side effects (almost every static method has side effects due to implicit <clinit>).
Another common scenario is; we have development-only checks that only throws in development mode. Those checkes usually takes couple of iterations to get rid of and later makes the methods using them side effect free as well. For those it is usually enough to run the recalculation after couple of iterations.

@gkdn
Copy link
Contributor

gkdn commented Sep 15, 2022

Ah I just saw discard-global-effects in the description so it seems like not the default behavior so that probably answers my question.

@kripken
Copy link
Member Author

kripken commented Sep 15, 2022

@aheejin

How does the pass in the middle can make use of the pre-generated effects?

The effects are all stored in PassOptions on the PassRunner. We pass that into EffectAnalyzer anyhow all the time now (I think you added that actually) so it's very convenient - when EffectAnalyzer visits a call we just check on PassOptions if we have global info for that function. So no change is needed to any pass code.

@gkdn

Let's say we did run generate-global-effects and remove call.without.effects at one point and run generate-global-effects again. Will the effects of call.without.effects disappear with that order or will it work incrementally accounting the previous knowledge about the calls?

Each time it starts from scratch when it computes the effects. So no previous information is saved. That means that the global effects become less effective after lowering the call.without.effects intrinsic. All the benefit of that intrinsic must be utilized before the lowering, in other words.

@gkdn
Copy link
Contributor

gkdn commented Sep 15, 2022

Each time it starts from scratch when it computes the effects. So no previous information is saved. That means that the global effects become less effective after lowering the call.without.effects intrinsic. All the benefit of that intrinsic must be utilized before the lowering, in other words.

The intrinsic-lowering is the one that is handling that right? So we need make sure it is not executed almost the very end?

OOC, why does it always calculate from scratch? Shouldn't no-side-effect stick? Is there any value on recalculating side-effects of a call/method if it is already determined to be side-effect free unless someone calls discard-global-effects?

Comment on lines +435 to +446
// that. The one change we may want to make is to remove throws_, if
// the target function throws and we know that will be caught anyhow,
// the same as the code below for the general path.
const auto& targetEffects = iter->second;
if (targetEffects.throws_ && parent.tryDepth > 0) {
auto filteredEffects = targetEffects;
filteredEffects.throws_ = false;
parent.mergeIn(filteredEffects);
} else {
// Just merge in all the effects.
parent.mergeIn(targetEffects);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is necessary. We don't set throws_ if tryDepth > 0 in the first place.

binaryen/src/ir/effects.h

Lines 368 to 371 in 2fdb22b

if (self->parent.delegateTargets.count(curr->name) &&
self->parent.tryDepth == 0) {
self->parent.throws_ = true;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code just handles delegates, I think?

In the testcase I added there isn't a delegate, but the problem still happens. Basically the visitCall logic should not set throws_ if we are in a try-catch-all. We have logic for that in the old code here:

binaryen/src/ir/effects.h

Lines 426 to 429 in 2fdb22b

// When EH is enabled, any call can throw.
if (parent.features.hasExceptionHandling() && parent.tryDepth == 0) {
parent.throws_ = true;
}

The new code above just does the same for when we use global effects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I was confused. I thought the tryDepth was the target's...

@kripken
Copy link
Member Author

kripken commented Sep 15, 2022

@gkdn

why does it always calculate from scratch?

Just simplicity and to avoid possible bugs. A normal optimization pass doesn't add new effects, but some instrumentation passes do, so this sequence is risky:

  • calculate effects
  • run pass that adds logging (calls to imports that log)
  • optimize

The last step might think a function has no effects when it does, and optimize out some logging.

We could say that it's the responsibility of a logging pass to clear the effects, but there's a risk a pass author might forget.

Anyhow, I think there is little harm to this approach, since leaving lower-intrinsics til near the end is the right thing anyhow.

@gkdn
Copy link
Contributor

gkdn commented Sep 15, 2022

@kripken

I think there is little harm to this approach

Wouldn't this also affect the performance of this pass and preventing us running it frequently?

We could say that it's the responsibility of a logging pass to clear the effects, but there's a risk a pass author might forget.

Maybe, this can be pointed while enabling the pass?
e.g. --propagate-global-effects would do an incremental generation where we can use it vs. another project who is adding logging passes can use the other one?

Anyway just some food for thought, thanks for the patch.

@kripken
Copy link
Member Author

kripken commented Sep 15, 2022

@gkdn

e.g. --propagate-global-effects

Ah, interesting idea. Yes, we could add a variation like that that builds on/refines the previous info. I'd need to think about how that works with the extension to do the transitive closure, though, which I think is also worth looking into.

For now though does this PR look like a reasonable start? Or would you prefer I look into the propagate before landing?

Wouldn't this also affect the performance of this pass and preventing us running it frequently?

Not sure I follow this. I think running --generate-global-effects can be done frequently, and should have good results, if it's done before intrinsic lowering. If you run enough iterations before intrinsic lowering I think all the benefit could be achieved, unless I'm missing something?

(Inlining won't work on an intrinsic, so that is a possible concern, but I don't think inlining interacts badly with the specific optimization here of removing calls to code without effects. But again, maybe I'm missing something.)

@gkdn
Copy link
Contributor

gkdn commented Sep 16, 2022

Sorry with "performance of the pass", I was referring to the total runtime of the pass, not its effectiveness.

And now I noticed that this is not working on the transitive closure :)
When analyzing transitive closure, it is kind of natural not to re-traverse a function if you already know that it doesn't have side effects. Also because a transitive analysis is costly, you may want to look into all the shortcuts that are available. In JsCompiler, side-effect analysis is one of the things that end up taking most time but also very effective in reducing the final code size especial for J2CL so we bite the bullet there. I don't really know the impact on execution performance but looking the current regression in benchmark, looks like V8 is not doing much wrt removing calls to methods without side effects.

Anyway, long story short, the pass as it is might have very limited effect as it is only covering more trivial scenarios and I expect that we will get benefit from more sophisticated stuff . Being said that my instincts so far on wasm/binaryen wasn't super accurate since we have generally different dynamics at play here. For example binaryen inlines aggressively which might be enabling it to see more stuff, potentially revealing more things that doesn't have side effects with a simpler analysis. On the other hand, I'm thinking we should eventually be less aggressive on inlining.

Maybe we can discuss this stuff further offline.

(Regardless I don't see any harm in submitting this version of the patch.)

@aheejin
Copy link
Member

aheejin commented Sep 16, 2022

Just my two cents.. I tried to do some transitive analysis to remove try-catch in #4618 with a very limited gain. It was mostly only checking whether a call can throw or not, but there were very limited number of calls that we can rule out as "not throwing", and being able to throw qualifies as a global effect anyway. The reason we had so little of them were, many functions eventually call library functions, most of which can throw, or imported functions, which we should assume can throw.

I tried that on a C++ program, so Java programs may have different characteristics though.

@gkdn
Copy link
Contributor

gkdn commented Sep 16, 2022

I think the limiting factor in the try-catch scenario is the number of try/catches. For example if you had try-catch on each statement, you will get a tone of of benefit.
In the case of side-effect, each unused statement is a candidate. I expect unused statements to be rather more frequent than the try/catches due to dead code paths. To give an example; if you have a logging statement and if the logging is disabled, then all the arguments become unused; including the complex ones that generate some debug text. However you cannot drop them unless you can prove that they don't have side effects.

I think the imports is an important point. We earlier discussed to surround them with call.without.effects where applicable but we haven't done that yet.

@kripken
Copy link
Member Author

kripken commented Sep 16, 2022

@gkdn Yes, I do think it's worth investigating the improvements. My reasoning, though, is that even this simple implementation can optimize all the possible things - it just will take more optimization cycles. I've thought about this for a while and I can't see new potential that is unlocked by either the transitive closure or refining known effects, just a reduction in cycles.

And this simple implementation runs in linear time. Together with the fact that in my tests the depth of call chains that are optimized here is pretty shallow, I think few optimization cycles are needed, and so I suspect this simple linear approach may be enough. But that could be wrong, and if we see many cycles are needed then we should definitely look into either or both of the transitive closure and refining known effects.

@kripken
Copy link
Member Author

kripken commented Sep 16, 2022

(but yes, for now let's land this and do more experimenting)

@kripken kripken merged commit 9894890 into main Sep 16, 2022
@kripken kripken deleted the global.func.effects branch September 16, 2022 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants