Skip to content

[EH] Optimize out trys whose body does not throw (DO NOT LAND) #4618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Apr 26, 2022

This performs an interprocedural analysis using the call graph to decide
whether each function can throw or not, and use the info to optimize out
trys whose body does not throw.

About half of the code is for printing information and not for actual
optimization.

I experimented with 'wasm-opt' wasm binary built with -O2, and while
this pass was able to analyze 6975/7968 (87.5%) functions for
throwability, this ended up optimizing out only 162/9790 (1.7%) trys,
resulting in a mere 0.02% improvement in code size. Not analyzable
functions were considered throwable.

The main reason was there are not very many functinons that can't throw.
Most functions call either imported library functions or C++ standard
library functions. Imported functions are considered throwable, and most
C++ library functions can throw. And many library functions also contain
indirect calls, which we consider throwable. Maybe we can try searching
all modules to figure out if there is any functions with the specific
signature that can throw to figure out whether a call_indirect throws
or not, but this pass doesn't do it yet. Also we can think of a way to
break more call graph cycles to make more functions analyzable, but I
doubt if it will bring meaningful improvement given the current stat.

All imported functions should be considered throwable. But as an
experiment, I also tried to measure the possible improvement in code
size when we assume all imported functions can't throw. Even if this
incorrectly optimistic experiment, the pass was able to optimize
412/9790 (4.2%) trys, resulting in 0.09% improvement.

I'm planning to run this on other binaries too, but at the moment, due
to the disappointing results, I'm not necessarily planning to land this.
But I'm uploading this anyway for future reference, in case I, or
someone else, gives another try at optimizing this.

This performs an interprocedural analysis using the call graph to decide
whether each function can throw or not, and use the info to optimize out
trys whose body does not throw.

About half of the code is for printing information and not for actual
optimization.

I experimented with 'wasm-opt' wasm binary built with -O2, and while
this pass was able to analyze 6975/7968 (87.5%) functions for
throwability, this ended up optimizing out only 162/9790 (1.7%) trys,
resulting in a mere 0.02% improvement in code size. Not analyzable
functions were considered throwable.

The main reason was there are not very many functinons that can't throw.
Most functions call either imported library functions or C++ standard
library functions. Imported functions are considered throwable, and most
C++ library functions can throw. And many library functions also contain
indirect calls, which we consider throwable. Maybe we can try searching
all modules to figure out if there is any functions with the specific
signature that can throw to figure out whether a `call_indirect` throws
or not, but this pass doesn't do it yet. Also we can think of a way to
break more call graph cycles to make more functions analyzable, but I'm
doubt if it will bring meaningful improvement given the current stat.

All imported functions should be considered throwable. But as an
experiment, I also tried to measure the possible improvement in code
size when we assume all imported functions can't throw. Even if this
incorrectly optimistic experiment, the pass was able to optimize
412/9790 (4.2%) trys, resulting in 0.09% improvement.

I'm planning to run this on other binaries too, but at the moment, due
to the disappointing results, I'm not necessarily planning to land this.
But I'm uploading this anyway for future reference, in case I, or
someone else, gives another try at optimizing this.
@kripken
Copy link
Member

kripken commented Apr 27, 2022

Ah, that is surprising... I expected this type of optimization to help more.

I tested this on Dart GC and J2Wasm output, which both have small amounts of try-catch in them. I don't see a benefit there either.

Maybe my intuitions are wrong, then, but I would expect this to help a lot more... simple RAII patterns in C++ should force clang to emit try-catches, like here:

https://gist.github.com/kripken/0b4e39a9f2943887ff122df5557c252c

The try-catch there is unnecessary, and I verified that this pass removes it perfectly.

So I wonder if that situation is just more rare than I think?

Imported functions are considered throwable

I think removing 4.2% of all try-catch is pretty good. It could impact performance at least in baseline compilers (which do not have zero-cost exceptions). And I think maybe we can get close to the assumption that imports do not throw - no JS import will throw a C++ exception, so the only danger is a JS import that calls back into C++. We actually do have that information when we do metadce, so in principle we could use it.

most C++ library functions can throw

Ah, yes, this is worrying. Even std::vector operations can throw...

I wonder if there are real-world applications that want to catch their own exceptions but don't care about C++ library code? Binaryen itself fits that, actually. If we have users that could benefit from it, maybe we could have an option for that somehow, that is, "build with exceptions but build libc++ without exceptions" - ?

And many library functions also contain indirect calls, which we consider throwable. Maybe we can try searching all modules to figure out if there is any functions with the specific signature that can throw to figure out whether a call_indirect throws or not, but this pass doesn't do it yet.

I think that is worth doing in general, separately from this PR - improving CallGraphPropertyAnalysis to connect call_indirect only to relevant function types could be pretty useful for all the users of that utility. That is a major limitation atm I think.

Separately from all this, nice work on the pass and the tests! Even if this does not end up a big win, it is important to investigate. But I am still hopeful this PR has significant potential that we can unlock.

@aheejin
Copy link
Member Author

aheejin commented Apr 28, 2022

Thanks for the feedback!

First off, the experiment was done on top of a binary built with -O2, so the trivial cases, e.g., try body without any calls or throws, have already been removed by Vacuum.

Ah, that is surprising... I expected this type of optimization to help more.

I tested this on Dart GC and J2Wasm output, which both have small amounts of try-catch in them. I don't see a benefit there either.

Maybe my intuitions are wrong, then, but I would expect this to help a lot more... simple RAII patterns in C++ should force clang to emit try-catches, like here:

https://gist.github.com/kripken/0b4e39a9f2943887ff122df5557c252c

The try-catch there is unnecessary, and I verified that this pass removes it perfectly.

So I wonder if that situation is just more rare than I think?

Imported functions are considered throwable

I think removing 4.2% of all try-catch is pretty good. It could impact performance at least in baseline compilers (which do not have zero-cost exceptions). And I think maybe we can get close to the assumption that imports do not throw - no JS import will throw a C++ exception, so the only danger is a JS import that calls back into C++. We actually do have that information when we do metadce, so in principle we could use it.

But we have to handle JS exceptions in case of catch_all (which are used for destructors), no?

And this pass is not function parallel (the optimization part is parallel, but the analysis is not), and also does some fixed point thing, so I'm not sure if this is worth it to get that amount of benefit, especially given that its code size improvement is so tiny. Maybe OK for -O3/-O4..? This pass took around 5 secs for wasm-opt.wasm binary, but can be longer for bigger binaries. Not sure how this compares to other passes. Do we have many non-parallel opt passes?

most C++ library functions can throw

Ah, yes, this is worrying. Even std::vector operations can throw...

I wonder if there are real-world applications that want to catch their own exceptions but don't care about C++ library code? Binaryen itself fits that, actually. If we have users that could benefit from it, maybe we could have an option for that somehow, that is, "build with exceptions but build libc++ without exceptions" - ?

Hmm, interesting, but I'm not sure if it can be a solution that can be applied in general, including our partners..

And many library functions also contain indirect calls, which we consider throwable. Maybe we can try searching all modules to figure out if there is any functions with the specific signature that can throw to figure out whether a call_indirect throws or not, but this pass doesn't do it yet.

I think that is worth doing in general, separately from this PR - improving CallGraphPropertyAnalysis to connect call_indirect only to relevant function types could be pretty useful for all the users of that utility. That is a major limitation atm I think.

As an experiment I tried to assume all call_indirect cannot throw, just to see the upper bound we can reach with this analysis. About 19% of trys were able to be removed. But this is very unlikely, given that the analyzed functions were around 9x more likely to throw than not throw. Even with 19% of trys can be removed, the unrealistic upper bound, the code size improvement was only 0.27%, which is somewhat depressing because when I (incorrectly) let the pass optimize out all trys, the code size saving was 5.3%. And here we are able to optimize about 20% or trys, but saving is not 1% but 0.27%. I guess remaining trys have bigger catch bodies or something in this case.

Separately from all this, nice work on the pass and the tests! Even if this does not end up a big win, it is important to investigate. But I am still hopeful this PR has significant potential that we can unlock.

Thanks!

I haven't tried to come up with any call graph cycle-breaking mechanism so far. But given that around 90% of analyzable functions were throwable, I think the upper limit we can achieve with this pass is, for this binary, around 0.5%, given that the code size improvement when we assumed every try can be removed was 5%. (This can be slightly higher if we can use metadce info to rule out non-throwing imported functions) Other binaries can exhibit different stats for sure though.

@kripken
Copy link
Member

kripken commented Apr 28, 2022

But we have to handle JS exceptions in case of catch_all (which are used for destructors), no?

Maybe I'm wrong, but I think we don't need to. AFAIK JS exceptions that unwind through wasm are a sign of breakage already, and not something we can recover from. Generally such an exception will set the global JS flag ABORT which should prevent any more code from running anyhow.

(There is one sort-of exception, where we throw a JS exception in order to simulate an infinite loop, in emscripten_set_main_loop for example. This is a hack. But it is actually also a case where we do not want to notice the JS exception: we want to just escape outside, leaving the C stack where it was, and I guess C++ destructors should also not run.)

Maybe OK for -O3/-O4..? This pass took around 5 secs for wasm-opt.wasm binary, but can be longer for bigger binaries. Not sure how this compares to other passes. Do we have many non-parallel opt passes?

We have a few non-parallel passes, and some I'm fairly sure are slower than this, so I'm not too worried. But yes, as you said, maybe we'd want to limit this to maybe -O3 etc., and probably we'd not run it in -O1 (which we try to keep minimal / linear).

As an experiment I tried to assume all call_indirect cannot throw, just to see the upper bound we can reach with this analysis. About 19% of trys were able to be removed. But this is very unlikely, given that the analyzed functions were around 9x more likely to throw than not throw. Even with 19% of trys can be removed, the unrealistic upper bound, the code size improvement was only 0.27%, which is somewhat depressing because when I (incorrectly) let the pass optimize out all trys, the code size saving was 5.3%.

Yes, disappointing...

I wonder if ignoring all imports and ignoring all indirect calls would be better than the separate measurements have shown? There may be some functions that do both things, and only ignoring both factors at once would let us get rid of. In fact I think even printf may be an example of that, as it does indirect calls for musl file stuff, and calls an import to actually print in JS. Specifically on wasm-opt, I'd expect this (unrealistic) combination to do well since we have so few places that actually throw an exception.

@aheejin
Copy link
Member Author

aheejin commented Apr 30, 2022

(There is one sort-of exception, where we throw a JS exception in order to simulate an infinite loop, in emscripten_set_main_loop for example. This is a hack. But it is actually also a case where we do not want to notice the JS exception: we want to just escape outside, leaving the C stack where it was, and I guess C++ destructors should also not run.)

Hmm, this can be a problem when used with Wasm EH, because catch_all catches everything, no? Here we were talking about optimization opportunities, but in the original binary, catch_all is going to catch them? Are we currently using these two together?

I wonder if ignoring all imports and ignoring all indirect calls would be better than the separate measurements have shown? There may be some functions that do both things, and only ignoring both factors at once would let us get rid of. In fact I think even printf may be an example of that, as it does indirect calls for musl file stuff, and calls an import to actually print in JS. Specifically on wasm-opt, I'd expect this (unrealistic) combination to do well since we have so few places that actually throw an exception.

Tried that. 21.4% of trys removed, code size improvement 0.34%. 😞

@kripken
Copy link
Member

kripken commented May 2, 2022

Hmm, this can be a problem when used with Wasm EH, because catch_all catches everything, no?

Does catch_all catch non-wasm exceptions too? (So wasm could catch and possibly ignore a JS exception?) I haven't followed that part of the spec closely.

Tried that. 21.4% of trys removed, code size improvement 0.34%. disappointed

Hmm... too bad. Thanks for checking. Well, then I think what you said before about the STL is probably the problem in Binaryen, every std::vector operation might throw...

Though that makes me wonder if maybe other codebases using exceptions use their own vectors etc. which might not have exceptions. If so then this might do a lot better on them.

Separately, this seems like more reason to get to removing exceptions from Binaryen (#2917).

@aheejin
Copy link
Member Author

aheejin commented May 2, 2022

Does catch_all catch non-wasm exceptions too?

Yes

(So wasm could catch and possibly ignore a JS exception?)

In theory yes. To do that we have to define the JS exception tag (which is one of the pending items in EH). But this will increase code size because we have to augment every single catch_all with one more catch and rethrow. Is emscripten_set_main_loop used often, or any of our EH-using partners? Are their programs that use EH currently misbehave?

@kripken
Copy link
Member

kripken commented May 2, 2022

Ah, interesting, I didn't know that.

This is probably not a common problem. Even when emscripten_set_main_loop is used with the "simulate infinite loop" option, most of the time there isn't anything interesting on the stack that needs to stay alive. In the worst case, if a user gets a bug here, then ASan should be able to detect a use after free (the object on the stack was freed when it will still be used later), and the recommendation for the user would be to avoid the "simulate infinite loop" option. So I think we don't need to do anything for this, but maybe we should document it somewhere (either on emscripten_set_main_loop, or on wasm exceptions).

@aheejin
Copy link
Member Author

aheejin commented May 3, 2022

Added a warning note for emscripten_set_main_loop in emscripten-core/emscripten#16871.

Since we don't have a near-term plan to land this, will close this PR for the moment. May reopen if it becomes necessary later.

@aheejin aheejin closed this May 3, 2022
@aheejin aheejin deleted the exception_opts branch May 3, 2022 23:00
@kripken kripken mentioned this pull request May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants