Skip to content

[wasm] Jiterpreter monitoring phase take 2 #83489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 21, 2023

Conversation

kg
Copy link
Member

@kg kg commented Mar 16, 2023

This PR adds a monitoring phase like #83432 , but instead of measuring the displacement returned by the trace, it monitors it (using the cheapest kind of instrumentation I could come up with) to track whether the trace bails out for specific reasons like calls early on.
All traces are now passed a pointer to a small 'jiterpreter call info' blob, that contains a "has taken backward branch" flag along with an offset field (in # of compiled 'high value' jiterpreter opcodes) that is written to if an undesirable bailout happens (referred to as an 'exit' in the trace generator). Undesirable bailouts are things that, if they happen frequently, indicate that the trace may be unproductive. A bailout for a failed null check or for a 'throw' opcode is not undesirable, since those are both by nature unlikely events and it's not important to measure them. A method call or delegate call is something the jiterpreter can't handle and if it happens frequently early in a trace, that trace is probably just calling a method early on by design.
If the backward branch flag is set, we assume that the trace ran at least once to completion before looping, since most backward branches are going to be loops. It's fine for a loop to eventually bail out due to a function call, since it almost certainly got work done.

A more ideal form of instrumentation would actually count opcodes or have a loop counter, but we don't have the luxury of compiling traces multiple times, so we have to keep the instrumentation as cheap as possible. The read + increment for a loop counter would make tight loops like reversing or comparing spans much slower, I think.

EDIT: No merge because I want to delay this until after P3.

@ghost
Copy link

ghost commented Mar 16, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds a monitoring phase like #83432 , but instead of measuring the displacement returned by the trace, it monitors it (using the cheapest kind of instrumentation I could come up with) to track whether the trace bails out for specific reasons like calls early on.
All traces are now passed a pointer to a small 'jiterpreter call info' blob, that contains a "has taken backward branch" flag along with an offset field (in # of compiled 'high value' jiterpreter opcodes) that is written to if an undesirable bailout happens (referred to as an 'exit' in the trace generator). Undesirable bailouts are things that, if they happen frequently, indicate that the trace may be unproductive. A bailout for a failed null check or for a 'throw' opcode is not undesirable, since those are both by nature unlikely events and it's not important to measure them. A method call or delegate call is something the jiterpreter can't handle and if it happens frequently early in a trace, that trace is probably just calling a method early on by design.
If the backward branch flag is set, we assume that the trace ran at least once to completion before looping, since most backward branches are going to be loops. It's fine for a loop to eventually bail out due to a function call, since it almost certainly got work done.

A more ideal form of instrumentation would actually count opcodes or have a loop counter, but we don't have the luxury of compiling traces multiple times, so we have to keep the instrumentation as cheap as possible. The read + increment for a loop counter would make tight loops like reversing or comparing spans much slower, I think.

Author: kg
Assignees: -
Labels:

arch-wasm, area-Codegen-Jiterpreter-mono

Milestone: -

@kg
Copy link
Member Author

kg commented Mar 16, 2023

In a local run, this doesn't appear to reject any important traces from browser-bench, but it's hard to say for sure without performance runs.

kg added 4 commits March 20, 2023 17:05
…average distance (in bytes) they travel

Then reject traces that have a low average distance after the monitoring period

Cleanup

Cleanup

Do monitoring in terms of opcode count, and record the location of specific bailout types
Also track whether a backward branch was taken during a trace so that it can be factored in

Cleanup

Cleanup

Cleanup

Cleanup
… distance

For the monitoring opcode counter, use (opcodes_to_first_branch + opcodes_from_last_branch_target)
Fix traces aborting for monitor points
@kg kg force-pushed the wasm-jiterpreter-monitoring-phase-2 branch from 66989da to c78801f Compare March 21, 2023 07:37
@kg kg force-pushed the wasm-jiterpreter-monitoring-phase-2 branch from c78801f to 962d137 Compare March 21, 2023 07:38
@kg
Copy link
Member Author

kg commented Mar 21, 2023

Reworked the value being measured based on some suggestions, it now assigns a 'penalty score' to trace executions depending on how far they got before bailing out. This removes the need for a dummy distance value for traces that didn't bail out, and allows smoothly ramping up and down by setting two (short and long) distance thresholds. Then you set an average penalty threshold, where traces above that threshold are rejected, between 0 and 200.

I also changed the opcode count it uses so that instead of the number of opcodes compiled, it's the number of opcodes we can be certain have run at this point - the number of opcodes leading up to the first branch, and the number of opcodes since the last branch target. We could do some static analysis to come up with a better number than this, but it seems to be good enough as a starting point so far.

In local testing this reduces Perf_Dictionary.ContainsValue from 1.0sec to around 650ms, by rejecting ContainsValue. It's a massive branchy trace that doesn't actually run much code and always bails out for a call before looping, so the previous two metrics (displacement & average number of opcodes) didn't work.

kg added 4 commits March 21, 2023 01:05
Don't assert when hitting the traceinfo limit, just stop compiling traces and print a warning
Raise traceinfo limit to make sure it's not realistic to hit it
@kg
Copy link
Member Author

kg commented Mar 21, 2023

Test failures are unrelated.
From local testing this shouldn't cause any browser-bench regressions, but it's worth checking it in now to see what kind of deltas we get in dotnet/performance. I expect to see some improvements and some regressions there, and then I'll be able to dig in to those and figure out whether this is actually an acceptable iteration of the heuristic.

@kg kg merged commit 2ea6288 into dotnet:main Mar 21, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants