-
Notifications
You must be signed in to change notification settings - Fork 787
Allow optimizing with global function effects #5040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
return; | ||
} | ||
} | ||
|
||
parent.calls = true; | ||
// When EH is enabled, any call can throw. | ||
if (parent.features.hasExceptionHandling() && parent.tryDepth == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, actually maybe we can use tryDepth
even with the global effect info...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that if tryDepth > 0
then even if the call's effects say "I can throw", we could ignore that (since it will be caught). I added a testcase for that now, see a173e67 (and the commit before it adds the optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this is handled already, because we don't even set throws_
in case it's within a try
-catch
.
Oh, I measured overhead, and this PR has no cost by itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the pass in the middle can make use of the pre-generated effects?
# note that no pass we run here should add effects to a function, so it is | ||
# ok to run this pass and let the passes after it use the effects to | ||
# optimize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this guaranteed? No optimization passes ever generate any effects, calls, or global writes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's the case, yes. No pass that we fuzz, at least, adds new effects. Mainly since these are optimization passes, so they don't change observable behavior.
|
||
namespace wasm { | ||
|
||
struct GenerateGlobalEffects : public Pass { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not function parallel by intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main work could be, yeah. The problem is we need to clear the global info first, and we can't fill it in in parallel. So we have sequential work at the beginning and end. I opted to use ParallelFunctionAnalysis
in the middle here, and make the pass as a whole not parallel - seems simplest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed the main work is done in parallel. Thanks.
src/ir/effects.h
Outdated
if (parent.funcEffectsMap) { | ||
auto iter = parent.funcEffectsMap->find(curr->target); | ||
if (iter != parent.funcEffectsMap->end()) { | ||
// We have effect information for this call target, and can just use | ||
// that. | ||
parent.mergeIn(iter->second); | ||
return; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to do some transitive computation right? For example, A calls B, B calls C and C has only a nop
, if we examine functions in C->B->A order, we can deduce that A doesn't have any effects. But this may not be so when analyses run in parallel. But I notice GlobalEffects
is not function parallel.. Is that not function parallel because of this? If so that benefit seems a little.. minor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this may not be so when analyses run in parallel.
Yes, atm this runs in parallel, and each function doesn't look any deeper than itself.
We could do a transitive closure operation afterwards. But I'm not sure it's worth it. This simple version already should handle all the real-world cases I can think of (with enough cycles of the optimizer, since each time it makes progress - there are no "cycles" to worry about here, as a cycle would be a potential infinite recursion anyhow, that we can't remove).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the target effect is in the process of being generated? Is it safe to merge it here while it hasn't been done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
funcEffectsMap
is empty during the parallel computation (we clear it before, and only write to it after the parallel part). So we'd never reach this code path during the parallel computation.
@kripken I haven't reviewed the CL nor understand the general architecture in binaryen but wanted to double check the behavior around something. Let's say we did run generate-global-effects and remove For the record, in JsCompiler we actually run side-effect recalculation after every iteration of our |
Ah I just saw |
The effects are all stored in
Each time it starts from scratch when it computes the effects. So no previous information is saved. That means that the global effects become less effective after lowering the |
The OOC, why does it always calculate from scratch? Shouldn't no-side-effect stick? Is there any value on recalculating side-effects of a call/method if it is already determined to be side-effect free unless someone calls |
// that. The one change we may want to make is to remove throws_, if | ||
// the target function throws and we know that will be caught anyhow, | ||
// the same as the code below for the general path. | ||
const auto& targetEffects = iter->second; | ||
if (targetEffects.throws_ && parent.tryDepth > 0) { | ||
auto filteredEffects = targetEffects; | ||
filteredEffects.throws_ = false; | ||
parent.mergeIn(filteredEffects); | ||
} else { | ||
// Just merge in all the effects. | ||
parent.mergeIn(targetEffects); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is necessary. We don't set throws_
if tryDepth > 0
in the first place.
Lines 368 to 371 in 2fdb22b
if (self->parent.delegateTargets.count(curr->name) && | |
self->parent.tryDepth == 0) { | |
self->parent.throws_ = true; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code just handles delegates, I think?
In the testcase I added there isn't a delegate, but the problem still happens. Basically the visitCall
logic should not set throws_
if we are in a try-catch-all. We have logic for that in the old code here:
Lines 426 to 429 in 2fdb22b
// When EH is enabled, any call can throw. | |
if (parent.features.hasExceptionHandling() && parent.tryDepth == 0) { | |
parent.throws_ = true; | |
} |
The new code above just does the same for when we use global effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I was confused. I thought the tryDepth
was the target's...
Just simplicity and to avoid possible bugs. A normal optimization pass doesn't add new effects, but some instrumentation passes do, so this sequence is risky:
The last step might think a function has no effects when it does, and optimize out some logging. We could say that it's the responsibility of a logging pass to clear the effects, but there's a risk a pass author might forget. Anyhow, I think there is little harm to this approach, since leaving |
Wouldn't this also affect the performance of this pass and preventing us running it frequently?
Maybe, this can be pointed while enabling the pass? Anyway just some food for thought, thanks for the patch. |
Ah, interesting idea. Yes, we could add a variation like that that builds on/refines the previous info. I'd need to think about how that works with the extension to do the transitive closure, though, which I think is also worth looking into. For now though does this PR look like a reasonable start? Or would you prefer I look into the propagate before landing?
Not sure I follow this. I think running (Inlining won't work on an intrinsic, so that is a possible concern, but I don't think inlining interacts badly with the specific optimization here of removing calls to code without effects. But again, maybe I'm missing something.) |
Sorry with "performance of the pass", I was referring to the total runtime of the pass, not its effectiveness. And now I noticed that this is not working on the transitive closure :) Anyway, long story short, the pass as it is might have very limited effect as it is only covering more trivial scenarios and I expect that we will get benefit from more sophisticated stuff . Being said that my instincts so far on wasm/binaryen wasn't super accurate since we have generally different dynamics at play here. For example binaryen inlines aggressively which might be enabling it to see more stuff, potentially revealing more things that doesn't have side effects with a simpler analysis. On the other hand, I'm thinking we should eventually be less aggressive on inlining. Maybe we can discuss this stuff further offline. (Regardless I don't see any harm in submitting this version of the patch.) |
Just my two cents.. I tried to do some transitive analysis to remove I tried that on a C++ program, so Java programs may have different characteristics though. |
I think the limiting factor in the try-catch scenario is the number of try/catches. For example if you had try-catch on each statement, you will get a tone of of benefit. I think the imports is an important point. We earlier discussed to surround them with |
@gkdn Yes, I do think it's worth investigating the improvements. My reasoning, though, is that even this simple implementation can optimize all the possible things - it just will take more optimization cycles. I've thought about this for a while and I can't see new potential that is unlocked by either the transitive closure or refining known effects, just a reduction in cycles. And this simple implementation runs in linear time. Together with the fact that in my tests the depth of call chains that are optimized here is pretty shallow, I think few optimization cycles are needed, and so I suspect this simple linear approach may be enough. But that could be wrong, and if we see many cycles are needed then we should definitely look into either or both of the transitive closure and refining known effects. |
(but yes, for now let's land this and do more experimenting) |
This adds a map of function name => the effects of that function to the
PassOptions
structure. That lets us compute those effects once and thenuse them in multiple passes afterwards. For example, that lets us optimize
away a call to a function that has no effects:
Vacuum will remove that dropped call if we tell it that the called function has
no effects. Note that a nice result of adding this to the
PassOptions
structis that all passes will use the extra info automatically.
This is not enabled by default as the benefits seem rather minor, though it
does help in a small but noticeable way on J2Wasm code, where we use
call.without.effects
and have situations like this:The call to
bar
looks like it has effects, normally, but with global effect infowe know it actually doesn't.
To use this, one would do
Discarding is not necessary, but if there is a pass later that adds effects, then not
discarding could lead to bugs, since we'd think there are fewer effects than there are.
(However, normal optimization passes never add effects, only remove them.)
It's also possible to call this multiple times:
That computes affects after the first
-O3
, and may find fewer effects than earlier.This doesn't compute the full transitive closure of the effects across functions. That is,
when computing a function's effects, we don't look into its own calls. The simple case
so far is enough to handle the
call.without.effects
example from before (though itmay take multiple optimization cycles).