-
Notifications
You must be signed in to change notification settings - Fork 5.2k
JIT: Add post-morph profile repair phase #113896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Add post-morph profile repair phase #113896
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot wasn't able to review any files in this pull request.
Files not reviewed (3)
- src/coreclr/jit/compiler.cpp: Language not supported
- src/coreclr/jit/compiler.h: Language not supported
- src/coreclr/jit/fgbasic.cpp: Language not supported
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Collections are a bit impoverished at the moment; I'll wait for the next collection run |
Looking at LoopsFoundDuringOpts : 2
LoopsInverted : 0
- LoopsCloned : 1
+ LoopsCloned : 0
LoopsUnrolled : 0 And for LoopsFoundDuringOpts : 1
LoopsInverted : 0
- LoopsCloned : 1
+ LoopsCloned : 0
LoopsUnrolled : 0 There are plenty of instances where we clone more as well, like in LoopsFoundDuringOpts : 1
LoopsInverted : 2
- LoopsCloned : 0
+ LoopsCloned : 1
LoopsUnrolled : 0 (Note that lexical loop inversion found more loop candidates than the graph-based approach did.) Also LoopsFoundDuringOpts : 1
LoopsInverted : 0
- LoopsCloned : 0
+ LoopsCloned : 1
LoopsUnrolled : 0 Multiple collections have duplicate contexts with diffs in cloning, thus inflating the size diffs in both directions. We also have smaller diffs due to IV opts and CSEs being newly (un)profitable. For example, in - LoopsIVWidened : 1
- WidenedIVs : 1
- UnusedIVsRemoved : 0
- CseCount : 1
+ LoopsIVWidened : 2
+ WidenedIVs : 2
+ UnusedIVsRemoved : 1
+ CseCount : 3 Because synthesis does not mark handler blocks as rarely run, I'm seeing optimizations that use |
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs show a TP cost of up to 0.5% (some of this is probably from some opts like loop cloning kicking in more often now). I plan to disable |
Can you look at The GDV-based cloning heuristics are profile sensitive, and I'd like to be sure we think the new decision here is the right one. |
src/coreclr/jit/compiler.cpp
Outdated
|
||
// Re-establish profile consistency, now that inlining and morph have run. | ||
// | ||
DoPhase(this, PHASE_REPAIR_PROFILE, &Compiler::fgRepairProfile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally use unique phase IDs for each phase, so can you add a new one for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
Sure, thanks for pointing this particular example out: I noticed that for the post-importation synthesis calls, if the profile is still inconsistent after recomputing the block weights, we defer the post-phase checks. The importer has a check for re-enabling these checks under the assumption that bad IL can create flow that trips up synthesis, so any code that passes importation should have a reconcilable profile. If later synthesis calls fail to make the profile consistent, the deferred checks are never re-enabled since that's an importer-specific quirk. It turns out the later profile repair phases have been quietly failing to make the profile consistent in several cases, including the above method. In particular, the flow into a call-finally pair doesn't always make it out, and this loss of flow is then propagated to downstream blocks. This might be why we're no longer finding loop cloning profitable in so many cases. Let me try fixing this in another PR first, since this will affect the pre-layout profile repair phase too. |
|
/azp run runtime-coreclr libraries-pgo |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Diffs are still big, though the TP impact is lessened somewhat by synthesis being able to reuse the flowgraph annotations from previous phases. |
Part of #107749. The latter half of the JIT frontend contains numerous optimizations that rely on block weights to compute the profitability of their transformations, and may benefit from more consistent profile data. Morph and flow opts frequently redirect flow out of blocks and into other ones, and the corresponding tweaks to the profile are usually too local to reduce/increase flow along affected paths. This gives downstream phases inaccurate ideas about which blocks are newly cold/hot, and other profile transformations (such as
fgExpandRarelyRunBlocks
) can further propagate inaccuracies. Thus, re-running synthesis shortly after morph seems like an opportune and cheap place to fix the profile. Like the late profile synthesis run, we aren't interested in changing edge likelihoods here -- we just want to propagate changes in block weights through the flowgraph.I also included some dead code cleanup I meant to do in a previous PR.