-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Morph, like many JIT phases, visits all the blocks in a method by following the bbNext
chain. There's a missed opportunity here to cheaply propagate some information from block to block.
For "forward" phases like morph it is often preferable to visit the blocks in reverse post-order (RPO). An RPO ensures that for most blocks all the predecessors of the block have been visited before the block itself.
Currently value numbering implements an RPO visit. It's also possible to create an RPO using fgDfsReversePostorder
.
The initial goal of this work is to modify morph to rely on RPO, and then to enable a simple form of global assertion prop for Morph (aka "cross-block local assertion prop") that can push facts forward across block boundaries. #86822 has a prototype implementation. The main remaining challenges there are to make the RPO efficient and to properly handle cases where control flow is altered.
- (Q4 23) Implement RPO for Morph, and enable cross-block assertion prop
- Stop allocating new basic blocks in morph
- JIT: defer creating throw helper blocks until simple lower #93371
- (follow-on fix) JIT: fix bug introduced by recent throw helper block changes #93897
- (follow-on TP win) JIT: remove simple lowering #93704
- (follow-on cleanup): JIT: clean up unused throw helpers #93948
- JIT: create recursive tail call scratch block early in morph #93764
- (follow-on fix) JIT: add missing recursive tail call detection logic #93892
- JIT: stop creating gc poll blocks in morph #93816
- JIT: defer creating throw helper blocks until simple lower #93371
- Come up with a strategy to ensure that the remaining control flow changes in morph are "safe" in that they do not invalidate assumptions made earlier about the flow into a block
- The idea here is to forbid or curtail
fgAddRefPred
. Removing preds should be OK as it won't invalidate the RPO, but adding preds is problematic. Trying this out, one violation (given the above PRs to stop adding new blocks) is to add edges togenReturnBB
. Along with those edges, IR is added to copy return values and that IR is morphed "out of sequence" -- meaning it would also be problematic for assertion prop. Seems like all that flow merging and copying could happen earlier, say infgAddInternal
; in fact there may be some benefit as perhaps some struct copies can also be merged.- this mostly works out, except for tail calls. If we merge all returns early, we end up creating space after the tail call for phases to add new IR; this is an attractive nuisance. Tail calls may or may not end up as
BBJ_RETURN
(plusBBF_HAS_JMP
) orBBJ_THROW
; a failed tail call will end up asBBJ_ALWAYS
, and a tail call helper dispatched call likewise. So we can't merge early and we can't rely on not changing flow later. - The current plan is to merge all non-tail call returns early (when there is a
genReturnBB
) and then during morph, if there are tail calls, treat thegenReturnBB
as a flagged block (similar to what we propose for the tail to loop below). - Then enhance tail merge to merge tails for identical
BBJ_RETURN
blocks -- this would just fall out if we could merge all returns earlier, but since we can't, we need to do extra work here. Work in progress: JIT: move return merging earlier #93997no longer needed?- Related cleanup: JIT: update tail call IR validity checks #94130
- this mostly works out, except for tail calls. If we merge all returns early, we end up creating space after the tail call for phases to add new IR; this is an attractive nuisance. Tail calls may or may not end up as
- With that out of the way, the only other flow edge addition comes from the tail call to loop transformation. This introduces new edges into the second block of the flow graph. Since this is a back-edge no facts are going to propagate forward in a one-pass RPO, so the plan here is to flag that block as special during RPO and have the RPO walker pass the "has unvisited pred" flag to the visitor (like it would for any other loop entry). The RPO walker can then safely allow new edges to this block to appear as it is visiting the graph.
- The idea here is to forbid or curtail
- Revise morph to visit in RPO. If we can get to the point where we're guaranteed not to have any problematic flow updates then either a static (precompute the RPO) or dynamic (evolve the RPO, ala ValueNumberState) approach should work; the latter may be preferable.
- JIT: make global morph its own phase #94185
- upon further analysis the dynamic approach can't work in morph, as it relies on the loop table, which is not available.
- JIT: morph blocks in RPO #94247
- Changes to local assertion prop (see notes below for more details)
- Revise local assertion prop to track live assertions via bit vectors:
- Revise local assertion prop to not clear the assertion table at start of each block, implement the assertion merging algorithm, allow cross-block assertion prop. See notes below. This will go in as off by default.
- Set up perf lab runs and look at performance
- Work on minimizing the (bad) diffs
- Work in reducing the TP impact.
- JIT: Improve local assertion prop throughput #94597
- it looks like there are some places in morph where we can consult assertions before emitting IR, in particular in places we seem to emit explicit null checks and then almost immediately remove them.
- Enable cross-block assertion prop by default.
- Jakob points out in JIT: morph blocks in RPO #94247 that we can likely make the RPO more efficient in various ways, eg we likely don't need the up-front enter blocks or dom start nodes, and we can (eventually) prune away unreachable blocks.
- (stretch) review parts of assertion prop that are currently disabled for local assertion prop, and perhaps enable some of them
- In particular the "edge" EQ/NEQ assertions from JTRUE are likely worth enabling. In our current setup we'd need to have two assertion out sets for BBJ_COND blocks... since we're not using
bbAssertionIn
orbbAssertionGen
we could hijack one of those slots to hold the data. Then when looking at pred info block would have to know the pred has two sets and pick the right one. Seems quite doable.- JIT: enable complementary jtrue assertions for cross-block local ap #94741
- JIT: avoid local jtrue assertions involving floating types #94935
- Tanner points out in the above we can be a bit more aggressive if we know the values being compared are not zeros or nans.
- is there any value in propagating addresses of locals? Jakob says it's too late already...
- In particular the "edge" EQ/NEQ assertions from JTRUE are likely worth enabling. In our current setup we'd need to have two assertion out sets for BBJ_COND blocks... since we're not using
- (stretch) avoid visiting unreachable blocks, and drop them from the RPO (perhaps delete them eagerly)
- I did some prototyping where morph could remove statically (pre-morph) unreachable blocks; this wasn't that impactful, and some blocks are difficult to remove (notably try entries). So maybe not worth the trouble.
- Another idea along those lines is to just disable local assertion prop for those blocks, since it takes time and possibly generates useless assertions that steal space
- (stretch) refine RPO on the fly to avoid visiting blocks that become unreachable because of cross-block assertion prop
- Similar to the idea above, but now for blocks that become unreachable mid-morph, because we removed edges. If a block was reachable in the original RPO, but no longer is, we can remove it rather than morph it. Similar complications ensue. I have not prototyped this.
- (stretch) do
QMARK
expansion before morph, remove bespoke handling in morph forQMARK
assertions (see JIT: expand qmarks earlier #86778)
- Stop allocating new basic blocks in morph
- (stretch) Implement an efficient, reusable RPO traversal; look for other phases that can benefit from RPO
- JIT: Factor SSA's DFS and profile synthesis's loop finding #95251 has a reusable RPO (covering reachable blocks)
- (stretch) Rework value numbering to use the reusable traversal
- Instead of this VN is now re-using the RPO generated by SSA building: JIT: Use post order computed by SSA in VN #94623
There will be extra TP cost from the RPO and from the assertion changes; the hope is that these are paid back by improved optimizations and that this entire change can be zero-cost.
Note for "backwards" phases post-order provides similar benefits (all successors likely visited before a block is visited).
cc @dotnet/jit-contrib
Metadata
Metadata
Assignees
Labels
Type
Projects
Status