-
Notifications
You must be signed in to change notification settings - Fork 787
[EH] Optimize out trys whose body does not throw (DO NOT LAND) #4618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This performs an interprocedural analysis using the call graph to decide whether each function can throw or not, and use the info to optimize out trys whose body does not throw. About half of the code is for printing information and not for actual optimization. I experimented with 'wasm-opt' wasm binary built with -O2, and while this pass was able to analyze 6975/7968 (87.5%) functions for throwability, this ended up optimizing out only 162/9790 (1.7%) trys, resulting in a mere 0.02% improvement in code size. Not analyzable functions were considered throwable. The main reason was there are not very many functinons that can't throw. Most functions call either imported library functions or C++ standard library functions. Imported functions are considered throwable, and most C++ library functions can throw. And many library functions also contain indirect calls, which we consider throwable. Maybe we can try searching all modules to figure out if there is any functions with the specific signature that can throw to figure out whether a `call_indirect` throws or not, but this pass doesn't do it yet. Also we can think of a way to break more call graph cycles to make more functions analyzable, but I'm doubt if it will bring meaningful improvement given the current stat. All imported functions should be considered throwable. But as an experiment, I also tried to measure the possible improvement in code size when we assume all imported functions can't throw. Even if this incorrectly optimistic experiment, the pass was able to optimize 412/9790 (4.2%) trys, resulting in 0.09% improvement. I'm planning to run this on other binaries too, but at the moment, due to the disappointing results, I'm not necessarily planning to land this. But I'm uploading this anyway for future reference, in case I, or someone else, gives another try at optimizing this.
Ah, that is surprising... I expected this type of optimization to help more. I tested this on Dart GC and J2Wasm output, which both have small amounts of try-catch in them. I don't see a benefit there either. Maybe my intuitions are wrong, then, but I would expect this to help a lot more... simple RAII patterns in C++ should force clang to emit try-catches, like here: https://gist.github.com/kripken/0b4e39a9f2943887ff122df5557c252c The try-catch there is unnecessary, and I verified that this pass removes it perfectly. So I wonder if that situation is just more rare than I think?
I think removing 4.2% of all try-catch is pretty good. It could impact performance at least in baseline compilers (which do not have zero-cost exceptions). And I think maybe we can get close to the assumption that imports do not throw - no JS import will throw a C++ exception, so the only danger is a JS import that calls back into C++. We actually do have that information when we do metadce, so in principle we could use it.
Ah, yes, this is worrying. Even std::vector operations can throw... I wonder if there are real-world applications that want to catch their own exceptions but don't care about C++ library code? Binaryen itself fits that, actually. If we have users that could benefit from it, maybe we could have an option for that somehow, that is, "build with exceptions but build libc++ without exceptions" - ?
I think that is worth doing in general, separately from this PR - improving Separately from all this, nice work on the pass and the tests! Even if this does not end up a big win, it is important to investigate. But I am still hopeful this PR has significant potential that we can unlock. |
Thanks for the feedback! First off, the experiment was done on top of a binary built with
But we have to handle JS exceptions in case of And this pass is not function parallel (the optimization part is parallel, but the analysis is not), and also does some fixed point thing, so I'm not sure if this is worth it to get that amount of benefit, especially given that its code size improvement is so tiny. Maybe OK for
Hmm, interesting, but I'm not sure if it can be a solution that can be applied in general, including our partners..
As an experiment I tried to assume all
Thanks! I haven't tried to come up with any call graph cycle-breaking mechanism so far. But given that around 90% of analyzable functions were throwable, I think the upper limit we can achieve with this pass is, for this binary, around 0.5%, given that the code size improvement when we assumed every |
Maybe I'm wrong, but I think we don't need to. AFAIK JS exceptions that unwind through wasm are a sign of breakage already, and not something we can recover from. Generally such an exception will set the global JS flag (There is one sort-of exception, where we throw a JS exception in order to simulate an infinite loop, in
We have a few non-parallel passes, and some I'm fairly sure are slower than this, so I'm not too worried. But yes, as you said, maybe we'd want to limit this to maybe
Yes, disappointing... I wonder if ignoring all imports and ignoring all indirect calls would be better than the separate measurements have shown? There may be some functions that do both things, and only ignoring both factors at once would let us get rid of. In fact I think even |
Hmm, this can be a problem when used with Wasm EH, because
Tried that. 21.4% of |
Does
Hmm... too bad. Thanks for checking. Well, then I think what you said before about the STL is probably the problem in Binaryen, every Though that makes me wonder if maybe other codebases using exceptions use their own vectors etc. which might not have exceptions. If so then this might do a lot better on them. Separately, this seems like more reason to get to removing exceptions from Binaryen (#2917). |
Yes
In theory yes. To do that we have to define the JS exception tag (which is one of the pending items in EH). But this will increase code size because we have to augment every single |
Ah, interesting, I didn't know that. This is probably not a common problem. Even when |
Added a warning note for Since we don't have a near-term plan to land this, will close this PR for the moment. May reopen if it becomes necessary later. |
This performs an interprocedural analysis using the call graph to decide
whether each function can throw or not, and use the info to optimize out
trys whose body does not throw.
About half of the code is for printing information and not for actual
optimization.
I experimented with 'wasm-opt' wasm binary built with -O2, and while
this pass was able to analyze 6975/7968 (87.5%) functions for
throwability, this ended up optimizing out only 162/9790 (1.7%) trys,
resulting in a mere 0.02% improvement in code size. Not analyzable
functions were considered throwable.
The main reason was there are not very many functinons that can't throw.
Most functions call either imported library functions or C++ standard
library functions. Imported functions are considered throwable, and most
C++ library functions can throw. And many library functions also contain
indirect calls, which we consider throwable. Maybe we can try searching
all modules to figure out if there is any functions with the specific
signature that can throw to figure out whether a
call_indirect
throwsor not, but this pass doesn't do it yet. Also we can think of a way to
break more call graph cycles to make more functions analyzable, but I
doubt if it will bring meaningful improvement given the current stat.
All imported functions should be considered throwable. But as an
experiment, I also tried to measure the possible improvement in code
size when we assume all imported functions can't throw. Even if this
incorrectly optimistic experiment, the pass was able to optimize
412/9790 (4.2%) trys, resulting in 0.09% improvement.
I'm planning to run this on other binaries too, but at the moment, due
to the disappointing results, I'm not necessarily planning to land this.
But I'm uploading this anyway for future reference, in case I, or
someone else, gives another try at optimizing this.