JIT: use reverse post-order (RPO) traversal for morph

Morph, like many JIT phases, visits all the blocks in a method by following the `bbNext` chain. There's a missed opportunity here to cheaply propagate some information from block to block.

For "forward" phases like morph it is often preferable to visit the blocks in reverse post-order (RPO). An RPO ensures that for most blocks all the predecessors of the block have been visited before the block itself.

Currently value numbering implements an RPO visit. It's also possible to create an RPO using `fgDfsReversePostorder`.

The initial goal of this work is to modify morph to rely on RPO, and then to enable a simple form of global assertion prop for Morph (aka "cross-block local assertion prop") that can push facts forward across block boundaries. https://github.com/dotnet/runtime/pull/86822 has a prototype implementation. The main remaining challenges there are to make the RPO efficient and to properly handle cases where control flow is altered.

- [x] (Q4 23) Implement RPO for Morph, and enable cross-block assertion prop
  - [x] Stop allocating new basic blocks in morph
    - https://github.com/dotnet/runtime/pull/93371
      - (follow-on fix) https://github.com/dotnet/runtime/pull/93897
      - (follow-on TP win) https://github.com/dotnet/runtime/pull/93704
      - (follow-on cleanup): https://github.com/dotnet/runtime/issues/93948
    - https://github.com/dotnet/runtime/pull/93764
      - (follow-on fix) https://github.com/dotnet/runtime/pull/93892
    - https://github.com/dotnet/runtime/pull/93816
  - [x] Come up with a strategy to ensure that the remaining control flow changes in morph are "safe" in that they do not invalidate assumptions made earlier about the flow into a block
    - The idea here is to forbid or curtail `fgAddRefPred`. Removing preds should be OK as it won't invalidate the RPO, but adding preds is problematic. Trying this out, one violation (given the above PRs to stop adding new blocks) is to add edges to `genReturnBB`. Along with those edges, IR is added to copy return values and that IR is morphed "out of sequence" -- meaning it would also be problematic for assertion prop. Seems like all that flow merging and copying could happen earlier, say in `fgAddInternal`; in fact there may be some benefit as perhaps some struct copies can also be merged.
      - this mostly works out, except for tail calls. If we merge all returns early, we end up creating space after the tail call for phases to add new IR; this is an attractive nuisance. Tail calls may or may not end up as `BBJ_RETURN` (plus `BBF_HAS_JMP`) or `BBJ_THROW`; a failed tail call will end up as `BBJ_ALWAYS`, and a tail call helper dispatched call likewise. So we can't merge early and we can't rely on not changing flow later.
      - The current plan is to merge all non-tail call returns early (when there is a `genReturnBB`) and then during morph, if there are tail calls, treat the `genReturnBB` as a flagged block (similar to what we propose for the tail to loop below).
      - Then enhance tail merge to merge tails for identical `BBJ_RETURN` blocks -- this would just fall out if we could merge all returns earlier, but since we can't, we need to do extra work here.
      - ~~Work in progress: https://github.com/dotnet/runtime/pull/93997~~ no longer needed?
      - Related cleanup: https://github.com/dotnet/runtime/pull/94130
    - With that out of the way, the only other flow edge addition comes from the tail call to loop transformation. This introduces new edges into the second block of the flow graph. Since this is a back-edge no facts are going to propagate forward in a one-pass RPO, so the plan here is to flag that block as special during RPO and have the RPO walker pass the "has unvisited pred" flag to the visitor (like it would for any other loop entry). The RPO walker can then safely allow new edges to this block to appear as it is visiting the graph.
  - [x] Revise morph to visit in RPO. If we can get to the point where we're guaranteed not to have any problematic flow updates then either a static (precompute the RPO) or dynamic (evolve the RPO, ala ValueNumberState) approach should work; the latter may be preferable.
    - https://github.com/dotnet/runtime/pull/94185
    - [upon further analysis](https://github.com/dotnet/runtime/issues/93246#issuecomment-1787432449) the dynamic approach can't work in morph, as it relies on the loop table, which is not available.
    - https://github.com/dotnet/runtime/pull/94247
  - [x] Changes to local assertion prop (see [notes below](https://github.com/dotnet/runtime/issues/93246#issuecomment-1789956574) for more details)
    - Revise local assertion prop to track live assertions via bit vectors: 
      - https://github.com/dotnet/runtime/pull/94322
    - Revise local assertion prop to not clear the assertion table at start of each block, implement the assertion merging algorithm, allow cross-block assertion prop.  See [notes](https://github.com/dotnet/runtime/issues/93246#issuecomment-1799589554) below. This will go in as off by default.
      - https://github.com/dotnet/runtime/pull/94363
      - https://github.com/dotnet/runtime/pull/94608
      - https://github.com/dotnet/runtime/pull/94885
      - #94978
    - Set up perf lab runs and look at performance
      - https://github.com/dotnet/performance/issues/3482
    - Work on minimizing the (bad) diffs
      - https://github.com/dotnet/runtime/pull/94690
    - Work in reducing the TP impact.
      - https://github.com/dotnet/runtime/pull/94597
      - it looks like there are some places in morph where we can consult assertions before emitting IR, in particular in places we seem to emit explicit null checks and then almost immediately remove them.
    - Enable cross-block assertion prop by default.
      -  https://github.com/dotnet/runtime/pull/94689
  - [x]  Jakob points out in #94247 that we can likely make the RPO more efficient in various ways, eg we likely don't need the up-front enter blocks or dom start nodes, and we can (eventually) prune away unreachable blocks.
    - https://github.com/dotnet/runtime/pull/94497
  - [x] (stretch) review parts of assertion prop that are currently disabled for local assertion prop, and perhaps enable some of them
    - In particular the "edge" EQ/NEQ assertions from JTRUE are likely worth enabling. In our current setup we'd need to have two assertion out sets for BBJ_COND blocks... since we're not using `bbAssertionIn` or `bbAssertionGen` we could hijack one of those slots to hold the data. Then when looking at pred info block would have to know the pred has two sets and pick the right one. Seems quite doable.
      - https://github.com/dotnet/runtime/pull/94741
      - https://github.com/dotnet/runtime/pull/94935
        - Tanner points out in the above we can be a bit more aggressive if we know the values being compared are not zeros or nans.
    - is there any value in propagating addresses of locals? Jakob says it's too late already...
  - [x] (stretch) avoid visiting unreachable blocks, and drop them from the RPO (perhaps delete them eagerly)
    - I did some prototyping where morph could remove statically (pre-morph) unreachable blocks; this wasn't that impactful, and some blocks are difficult to remove (notably try entries). So maybe not worth the trouble.
    - Another idea along those lines is to just disable local assertion prop for those blocks, since it takes time and possibly generates useless assertions that steal space
  - [ ] (stretch) refine RPO on the fly to avoid visiting blocks that become unreachable because of cross-block assertion prop
    - Similar to the idea above, but now for blocks that become unreachable mid-morph, because we removed edges. If a block was reachable in the original RPO, but no longer is, we can remove it rather than morph it. Similar complications ensue. I have not prototyped this.
  - [ ] (stretch) do `QMARK` expansion before morph, remove bespoke handling in morph for `QMARK` assertions (see https://github.com/dotnet/runtime/pull/86778)
    - see also https://github.com/dotnet/runtime/pull/94690#pullrequestreview-1729755595
- [ ] (stretch) Implement an efficient, reusable RPO traversal; look for other phases that can benefit from RPO
  - #95251 has a reusable RPO (covering reachable blocks)
- [x] (stretch) Rework value numbering to use the reusable traversal
  - Instead of this VN is now re-using the RPO generated by SSA building: #94623

There will be extra TP cost from the RPO and from the assertion changes; the hope is that these are paid back by improved optimizations and that this entire change can be zero-cost.

Note for "backwards" phases post-order provides similar benefits (all successors likely visited before a block is visited).

cc @dotnet/jit-contrib 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JIT: use reverse post-order (RPO) traversal for morph #93246

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

JIT: use reverse post-order (RPO) traversal for morph #93246

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions