-
Notifications
You must be signed in to change notification settings - Fork 5k
[wasm] Jiterpreter monitoring phase take 2 #83489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to 'arch-wasm': @lewing Issue DetailsThis PR adds a monitoring phase like #83432 , but instead of measuring the displacement returned by the trace, it monitors it (using the cheapest kind of instrumentation I could come up with) to track whether the trace bails out for specific reasons like calls early on. A more ideal form of instrumentation would actually count opcodes or have a loop counter, but we don't have the luxury of compiling traces multiple times, so we have to keep the instrumentation as cheap as possible. The read + increment for a loop counter would make tight loops like reversing or comparing spans much slower, I think.
|
In a local run, this doesn't appear to reject any important traces from browser-bench, but it's hard to say for sure without performance runs. |
…average distance (in bytes) they travel Then reject traces that have a low average distance after the monitoring period Cleanup Cleanup Do monitoring in terms of opcode count, and record the location of specific bailout types Also track whether a backward branch was taken during a trace so that it can be factored in Cleanup Cleanup Cleanup Cleanup
… distance For the monitoring opcode counter, use (opcodes_to_first_branch + opcodes_from_last_branch_target) Fix traces aborting for monitor points
66989da
to
c78801f
Compare
…eshold to the long one
c78801f
to
962d137
Compare
Reworked the value being measured based on some suggestions, it now assigns a 'penalty score' to trace executions depending on how far they got before bailing out. This removes the need for a dummy distance value for traces that didn't bail out, and allows smoothly ramping up and down by setting two (short and long) distance thresholds. Then you set an average penalty threshold, where traces above that threshold are rejected, between 0 and 200. I also changed the opcode count it uses so that instead of the number of opcodes compiled, it's the number of opcodes we can be certain have run at this point - the number of opcodes leading up to the first branch, and the number of opcodes since the last branch target. We could do some static analysis to come up with a better number than this, but it seems to be good enough as a starting point so far. In local testing this reduces Perf_Dictionary.ContainsValue from 1.0sec to around 650ms, by rejecting ContainsValue. It's a massive branchy trace that doesn't actually run much code and always bails out for a call before looping, so the previous two metrics (displacement & average number of opcodes) didn't work. |
Don't assert when hitting the traceinfo limit, just stop compiling traces and print a warning Raise traceinfo limit to make sure it's not realistic to hit it
Test failures are unrelated. |
This PR adds a monitoring phase like #83432 , but instead of measuring the displacement returned by the trace, it monitors it (using the cheapest kind of instrumentation I could come up with) to track whether the trace bails out for specific reasons like calls early on.
All traces are now passed a pointer to a small 'jiterpreter call info' blob, that contains a "has taken backward branch" flag along with an offset field (in # of compiled 'high value' jiterpreter opcodes) that is written to if an undesirable bailout happens (referred to as an 'exit' in the trace generator). Undesirable bailouts are things that, if they happen frequently, indicate that the trace may be unproductive. A bailout for a failed null check or for a 'throw' opcode is not undesirable, since those are both by nature unlikely events and it's not important to measure them. A method call or delegate call is something the jiterpreter can't handle and if it happens frequently early in a trace, that trace is probably just calling a method early on by design.
If the backward branch flag is set, we assume that the trace ran at least once to completion before looping, since most backward branches are going to be loops. It's fine for a loop to eventually bail out due to a function call, since it almost certainly got work done.
A more ideal form of instrumentation would actually count opcodes or have a loop counter, but we don't have the luxury of compiling traces multiple times, so we have to keep the instrumentation as cheap as possible. The read + increment for a loop counter would make tight loops like reversing or comparing spans much slower, I think.
EDIT: No merge because I want to delay this until after P3.