On Stack Replacement Next Steps

#### Possible next steps now that #32969 is merged, in rough order of priority.

- [x] implement a regularly scheduled test run that enables OSR for pri1 tests on x64 windows and x64 Linux (both default OSR and with "OSR stress") (done, #33709)
- [x] fix issues with OSR creating side entries into try regions (#35687, also seen during #34522). PR: #59784
- [x] fix failures in jit-experimental runs with OSR (#43534, #47532, #51057)
- [x] fix interaction of OSR and PGO (#47942 fixed via PR #61453; PR #62263)
- [x] run tests with OSR and some GC and JIT stress modes (PR #61934)
- [x] investigate assert seen in the weekly OSR tests `Assert failure(PID 7028 [0x00001b74], Thread: 7084 [0x1bac]): ppInfo->m_osrMethodCode == NULL` -- likely the logic guarding against threads racing to build the patchpoint method needs adjusting (likely fixed by #38165)
- [x] partial method compilation (eg don't initially jit exceptional paths, prototyped in #34522; PR #60791)
- [x] support OSR from synchronous methods (#61712)
- [x] don't do OSR in methods that are asking for debug codegen (fine as is, since such methods aren't eligible for tiering)
- [x] look at how OSR performs for powershell startup ([which uses QJFL=1])  (see [note 4 below](https://github.com/dotnet/runtime/issues/33658#issuecomment-973683253))
- [x] implement variant of QJFL that bails out of methods that can't be OSR'd (see [note 1 below](https://github.com/dotnet/runtime/issues/33658#issuecomment-969293801)) (https://github.com/dotnet/runtime/pull/61851)
- [x] run ASP.NET perf tests and verify startup improvements are as expected / no steady state losses
- [x] ~~look at how debuggers handle OSR frames; if the double-RBP restore is too confusing, think about relying on the original method's RBP (will still need split save areas). On further thought, it seems like (for x64) we can pass the tier0 method caller's RBP to the osr method and just have one unwind restore. This is what I'm doing for arm64 and it seems to be working out ok.~~ (new plan is to revise arm64 to conform with how x64 will work, see below)
- [x] arm64 platform support (see [note 7 below](https://github.com/dotnet/runtime/issues/33658#issuecomment-985786573)). (PR: https://github.com/dotnet/runtime/pull/62831)
- [x] sort out complications with OSR and ALTJIT (see [note 6 below](https://github.com/dotnet/runtime/issues/33658#issuecomment-985182460)). Addressed Arm64 PR.
- [x] Exclude CI tests of crossgen2 from using OSR as it is run via a 6.0 runtime and so hits some old OSR bugs (https://github.com/dotnet/runtime/pull/62968).
- [x] Fix issues uncovered by OSR stress (https://github.com/dotnet/runtime/pull/62980, https://github.com/dotnet/runtime/pull/64116)
- [x] fix problem with broken epilog unwind on x64 (see [note](https://github.com/dotnet/runtime/pull/64116#issuecomment-1022795565)) https://github.com/dotnet/runtime/pull/65609
- [x] re-enable struct promotion https://github.com/dotnet/runtime/pull/65903
- [x] Track OSR impact on Techempower (data now available on the PGO tab) https://github.com/aspnet/Benchmarks/pull/1726
- [x] Run debugger tests
- [x] Ensure BenchmarkDotNet optimizes key auto-generated methods to avoid holding on to GC references (see [notes](https://github.com/dotnet/performance/issues/2214#issuecomment-1052893465)), also https://github.com/dotnet/BenchmarkDotNet/issues/1934 and https://github.com/dotnet/BenchmarkDotNet/pull/1935
- [x] Update dotnet/performance with new BDN version and verify everything's ok. Some of the tests I expected to improve did, others did not -- see [note](https://github.com/dotnet/runtime/issues/33658#issuecomment-1066155430) below.
- [x] Look more closely at interaction of OSR and loop optimizer (see [notes below](https://github.com/dotnet/runtime/issues/33658#issuecomment-1054587182)) https://github.com/dotnet/runtime/pull/66208
- [x] Update perfview / traceevent to properly parse new jit type (data there already) https://github.com/microsoft/perfview/pull/1584
- [x] Update DAC/SOS to properly understand new native code versions for methods (see [notes](https://github.com/dotnet/runtime/issues/33658#issuecomment-1063155928)) https://github.com/dotnet/diagnostics/pull/2928 and https://github.com/dotnet/runtime/pull/66507
- [x] Fix ARM64 issue with large OSR funclet frames on arm64 https://github.com/dotnet/runtime/issues/65996 (via https://github.com/dotnet/runtime/pull/66124)
- [x] Update BDN iteration strategy for long running benchmarks. https://github.com/dotnet/BenchmarkDotNet/pull/1949 and https://github.com/dotnet/performance/pull/2323
- [x] Fix bad interaction of OSR and the more general loop cloning introduced in #66257 (see [note](https://github.com/dotnet/runtime/issues/33658#issuecomment-1076537562)) https://github.com/dotnet/runtime/pull/67067
- [x] Fix stress failure https://github.com/dotnet/runtime/issues/67078 (via https://github.com/dotnet/runtime/pull/67131)
- [x] run perf test suite with OSR and investigate any regressions versus current default (see [notes](https://github.com/dotnet/runtime/pull/61934#issuecomment-990031427), [more notes](https://github.com/dotnet/runtime/issues/33658#issuecomment-1050043091)). We have temporarily co-opted the regularly scheduled "no pgo" perf lab runs for windows x64 to actually run OSR. Results [here](https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows%2010.0.18362_PGOType%3Dnopgo/AllTestindex.html). And also enabled autofiling so OSR perf results that differ from the old no pgo perf are reported as regressions/improvements. [Example of regressions](https://github.com/dotnet/perf-autofiling-issues/issues/3862).
- [x] Investigate test failure https://github.com/dotnet/runtime/issues/67215 (likely unrelated, see #66924)
- [x] Fix stress failure https://github.com/dotnet/runtime/issues/67152 (https://github.com/dotnet/runtime/pull/67274)
- [x] Enable QJFL and OSR by default for x64/arm64 ~~#61934~~ ~~#63642~~ https://github.com/dotnet/runtime/pull/65675
- [x] Enable use of sparse edge instrumentation in OSR methods (https://github.com/dotnet/runtime/issues/47942). #80481
- [x] Import entire method initially and trim unneeded parts once we are done with morph. This fixes lingering issues with computing local exposure: #83910 
- [x] Ensure Tier0-exposed locals are normalize on load in the OSR method: #84000
- [ ] Run diagnostic tests (blocked;  they're not yet updated for the .NET 7 branch)

#### Issues and fixes after OSR was enabled

- [x] https://github.com/dotnet/runtime/issues/67488 (fixed by https://github.com/dotnet/runtime/pull/67680)
- [x] https://github.com/dotnet/runtime/issues/67668 (fixed by https://github.com/dotnet/runtime/pull/67678)
- [x] https://github.com/dotnet/runtime/issues/67410 (fixed by https://github.com/dotnet/runtime/pull/67884)
- [x] https://github.com/dotnet/runtime/issues/68003 (fixed by https://github.com/dotnet/runtime/pull/68048)
- [x] https://github.com/dotnet/runtime/issues/68170 (fixed by https://github.com/dotnet/runtime/pull/68198)
- [x] https://github.com/dotnet/runtime/issues/68194 (fixed by https://github.com/dotnet/runtime/pull/68202)
- [x] https://github.com/dotnet/runtime/issues/70263 (fixed by https://github.com/dotnet/runtime/pull/70916)
- [x] https://github.com/dotnet/runtime/issues/71005 (fixed by https://github.com/dotnet/runtime/pull/71245)
- [x] https://github.com/dotnet/runtime/issues/75828 (fixed by https://github.com/dotnet/runtime/pull/75922)
- [x] https://github.com/dotnet/runtime/issues/83783 (fixed by #83910)

#### Performance Regressions

- [x] https://github.com/dotnet/runtime/issues/67594
- [x] https://github.com/dotnet/runtime/issues/78127
- [x] #78110 
- [ ] #80210
- [x] #80757 

#### Other ideas: enhancements or optimizations

- [ ] Update arm64 to use the same split callee-save technique we now use on x64, and pass Tier0 FP to the OSR method. This gives arm64 methods standard epilogs.
- [x] Revise Arm64 frame layout to put PSPSym above callee-saves, so that OSR method can share Tier0 PSP, and OSR funclets don't need to pad their frames with the Tier0 frame (see [notes starting here](https://github.com/dotnet/runtime/pull/62831#issuecomment-995296856)) and ([more notes](https://github.com/dotnet/runtime/pull/62831#issuecomment-1007655181)). Or, revise the OSR method so it shares the PSP slot with the TIer0 frame (requires split callee-save above). (PSP sym no longer exists: #114630)
- [ ] look into enabling more independent promotion in OSR methods. Right now we use the Tier0 address exposure data and this is very conservative. Also see https://github.com/dotnet/runtime/pull/67131. 
  * Possibly addressed by https://github.com/dotnet/runtime/pull/83910
  * Possibly addressed by https://github.com/dotnet/runtime/pull/83388
- [ ] support OSR in methods with stackalloc (see [note 2 below](https://github.com/dotnet/runtime/issues/33658#issuecomment-971029913)) and [[further notes]](https://github.com/dotnet/runtime/issues/33658#issuecomment-1520861264)
- [ ] support OSR for reverse pinvoke methods (see [note 3 below](https://github.com/dotnet/runtime/issues/33658#issuecomment-971089184))
- [ ] support OSR from methods that make explicit tail calls (see [note 5 below](https://github.com/dotnet/runtime/issues/33658#issuecomment-974526420))
- [ ] implement aggressive frame trimming (reduce original method frame to just live extent)
- [ ] look into viability of backpatching the patchpoint call with a jump to the OSR method instead
- [ ] look into how to support limited Tier0 opts with OSR
- [ ] look into emitting more compact patchpoint code sequences
- [ ] look into emitting more compact patchpoint info blobs
- [ ] think about asynchronous creation of OSR methods
- [ ] look into the feasibility of having one OSR method cover all the patchpoints
- [ ] look into using the "mutator" tool in jitutils to inject loops into methods that don't have them, so that we can trigger OSR in more cases. 
   * Note that random patchpoint placement and fast OSR triggers can accomplish something similar without needing to alter tests. There's nothing saying a patchpoint has to be within a loop. https://github.com/dotnet/runtime/pull/62980
- [ ] OSR + GS -- perhaps OSR method should have its own cookie (if needed) in addition to the Tier0 cookie, and check them both on exit? Currently we just check the Tier0 cookie, but if the OSR frame holds saved LR/FP we might miss an overrun.  Note: not needed if we move arm64 to the new x64 plan, as there's just one save area that gets restored, and it is in the Tier0 frame.
- [ ] update runtime strategy to support "slow" OSR method creation, but quick transitions when OSR methods exist
- [ ] support for mid-block patchpoints (where IL stack is empty). Among other things, this would let us do "source" patchpoint targeting more often.
- [ ] support for patchpoints at non-stack empty IL offsets. Would require custom per-site patchpoint descriptors and more.
- [ ] defer altering control flow for OSR until much later. Currently we do it very early and need to protect the original method entry specially in case we want to branch there during morph (see https://github.com/dotnet/runtime/pull/94597#issuecomment-1807163840).

cc @dotnet/jit-contrib 

category:cq
theme:osr
skill-level:expert
cost:extra-large

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On Stack Replacement Next Steps #33658

Possible next steps now that #32969 is merged, in rough order of priority.

Issues and fixes after OSR was enabled

Performance Regressions

Other ideas: enhancements or optimizations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

On Stack Replacement Next Steps #33658

Description

Possible next steps now that #32969 is merged, in rough order of priority.

Issues and fixes after OSR was enabled

Performance Regressions

Other ideas: enhancements or optimizations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions