Open
Description
Possible next steps now that #32969 is merged, in rough order of priority.
- implement a regularly scheduled test run that enables OSR for pri1 tests on x64 windows and x64 Linux (both default OSR and with "OSR stress") (done, Add pipeline setup for jit offbydefault extra testing #33709)
- fix issues with OSR creating side entries into try regions (JIT\opt\OSR\tailrecursetry\tailrecursetry.cmd fails with COMPlus_FastTailCalls=0 #35687, also seen during Initial support for partially jitted methods #34522). PR: JIT: better support for patchpoints in try regions #59784
- fix failures in jit-experimental runs with OSR (OSR Test Failures #43534, OSR method may require kept-alive this in cases where base method didn't #47532, JIT: fix osr gc info this reporting #51057)
- fix interaction of OSR and PGO (JIT: resolve issues with OSR and PGO #47942 fixed via PR Update how OSR and PGO interact #61453; PR JIT: handle interaction of OSR, PGO, and tail calls #62263)
- run tests with OSR and some GC and JIT stress modes (PR Enable QJFL and OSR by default for x64 #61934)
- investigate assert seen in the weekly OSR tests
Assert failure(PID 7028 [0x00001b74], Thread: 7084 [0x1bac]): ppInfo->m_osrMethodCode == NULL
-- likely the logic guarding against threads racing to build the patchpoint method needs adjusting (likely fixed by A couple of small OSR fixes #38165) - partial method compilation (eg don't initially jit exceptional paths, prototyped in Initial support for partially jitted methods #34522; PR Initial support for partially jitted methods #60791)
- support OSR from synchronous methods (JIT: support OSR for synchronized methods #61712)
- don't do OSR in methods that are asking for debug codegen (fine as is, since such methods aren't eligible for tiering)
- look at how OSR performs for powershell startup ([which uses QJFL=1]) (see note 4 below)
- implement variant of QJFL that bails out of methods that can't be OSR'd (see note 1 below) (JIT: refactor to allow OSR to switch to optimized #61851)
- run ASP.NET perf tests and verify startup improvements are as expected / no steady state losses
-
look at how debuggers handle OSR frames; if the double-RBP restore is too confusing, think about relying on the original method's RBP (will still need split save areas). On further thought, it seems like (for x64) we can pass the tier0 method caller's RBP to the osr method and just have one unwind restore. This is what I'm doing for arm64 and it seems to be working out ok.(new plan is to revise arm64 to conform with how x64 will work, see below) - arm64 platform support (see note 7 below). (PR: OSR support for Arm64 #62831)
- sort out complications with OSR and ALTJIT (see note 6 below). Addressed Arm64 PR.
- Exclude CI tests of crossgen2 from using OSR as it is run via a 6.0 runtime and so hits some old OSR bugs (Suppress OSR for crossgen2 execution #62968).
- Fix issues uncovered by OSR stress (JIT: yet another OSR stress mode #62980, enable osr stress tests for libraries and add fixes #64116)
- fix problem with broken epilog unwind on x64 (see note) JIT: revise approach for x64 OSR epilogs #65609
- re-enable struct promotion JIT: OSR jitstress fixes; enable struct promotion #65903
- Track OSR impact on Techempower (data now available on the PGO tab) add osr to pgo modes aspnet/Benchmarks#1726
- Run debugger tests
- Ensure BenchmarkDotNet optimizes key auto-generated methods to avoid holding on to GC references (see notes), also Ensure WorkloadActionUnroll and similar are optimized if possible BenchmarkDotNet#1934 and Fix optimization of action methods for coreclr BenchmarkDotNet#1935
- Update dotnet/performance with new BDN version and verify everything's ok. Some of the tests I expected to improve did, others did not -- see note below.
- Look more closely at interaction of OSR and loop optimizer (see notes below) JIT: add OSR patchpoint strategy, inhibit tail duplication #66208
- Update perfview / traceevent to properly parse new jit type (data there already) Add support for OSR jitted code microsoft/perfview#1584
- Update DAC/SOS to properly understand new native code versions for methods (see notes) Add OSR to DacpTieredVersionData diagnostics#2928 and Add OSR to DacpTieredVersionData #66507
- Fix ARM64 issue with large OSR funclet frames on arm64 Assertion failed 'genFuncletInfo.fiSpDelta1 >= -240' during 'Generate code' #65996 (via JIT: handle large OSR funclet frame initial SP delta for arm64 type 5… #66124)
- Update BDN iteration strategy for long running benchmarks. Revise heuristic for initial jitting. BenchmarkDotNet#1949 and Bump BDN to 0.13.1.1728, to pick up fixes for wasm/aot runs performance#2323
- Fix bad interaction of OSR and the more general loop cloning introduced in Cloning improvements #66257 (see note) Fix loop cloning for OSR #67067
- Fix stress failure OSR stress failure in System.Diagnostics.DiagnosticsSource.Tests #67078 (via JIT: block some struct promotion for OSR #67131)
- run perf test suite with OSR and investigate any regressions versus current default (see notes, more notes). We have temporarily co-opted the regularly scheduled "no pgo" perf lab runs for windows x64 to actually run OSR. Results here. And also enabled autofiling so OSR perf results that differ from the old no pgo perf are reported as regressions/improvements. Example of regressions.
- Investigate test failure Test failure JIT/opt/OSR/tailrecurse/tailrecurse.sh #67215 (likely unrelated, see Test failure JIT/opt/Devirtualization/structreturningstruct/structreturningstruct.sh #66924)
- Fix stress failure OSR failure in System.Text.Json.Tests #67152 (JIT: Use small register types for some enregisterable locals #67274)
- Enable QJFL and OSR by default for x64/arm64
Enable QJFL and OSR by default for x64 #61934Enable QJFL and OSR by default for x64 and arm64 #63642Enable QJFL and OSR by default for x64 and arm64 #65675 - Enable use of sparse edge instrumentation in OSR methods (JIT: resolve issues with OSR and PGO #47942). JIT: Enable edge based profiles for all scenarios #80481
- Import entire method initially and trim unneeded parts once we are done with morph. This fixes lingering issues with computing local exposure: JIT: import entire method for OSR, prune unneeded parts later #83910
- Ensure Tier0-exposed locals are normalize on load in the OSR method: JIT: small OSR locals must be normalize on load #84000
- Run diagnostic tests (blocked; they're not yet updated for the .NET 7 branch)
Issues and fixes after OSR was enabled
- Assertion failed 'codeGen->isFramePointerUsed() || varDsc->GetStackOffset() >= 0' #67488 (fixed by JIT: fix OSR handling for pinned locals #67680)
- OSR variants may not report pinned locals as pinned #67668 (fixed by JIT: Fix OSR local detection for implicit byref promotion #67678)
- Test failure DataContractSerializerTests.DCS_MyPersonSurrogate_Stress #67410 (fixed by JIT: fix interaction of OSR and shadowed parameters #67884)
- Test failure JIT\\opt\\OSR\\largefuncletframe\\largefuncletframe.cmd #68003 (fixed by JIT: fix assert seen in some OSR cases #68048)
- Assertion failed 'isValidGeneralDatasize(size)' during 'Generate code' #68170 (fixed by JIT: fix issue with invalid operand size in OSR prolog on arm64 #68198)
- OSR gets wrong offset for memory arg on arm64 OSX #68194 (fixed by JIT: fix patchpoint offset encoding #68202)
- Test failure: JIT\Regression\JitBlue\GitHub_21990\GitHub_21990\GitHub_21990.cmd #70263 (fixed by JIT: arm64 osr funclet frame probing #70916)
- [libraries-pgo] arm64 failure in System.Tests.DoubleTests.ParsePatterns #71005 (fixed by JIT: fix double reporting of some GC frame slots #71245)
- [libraries-pgo] Issues related to overlapped IO #75828 (fixed by Preserve last error for patchpoint helpers #75922)
- JIT: OSR is not conservative enough about exposing struct locals #83783 (fixed by JIT: import entire method for OSR, prune unneeded parts later #83910)
Performance Regressions
- Regressions due to QJFL and OSR #67594
- The frame times of my toy ray tracer increased from 24ms to 30ms #78127
- Performance of my Maze Generator went from 2.1 seconds to 3.8 after upgrading to .NET 7.0 #78110
- QuickJitForLoops causing regressions #80210
- I accidentaly found performance reggresion using"mandelbrot algorithm" #80757
Other ideas: enhancements or optimizations
- Update arm64 to use the same split callee-save technique we now use on x64, and pass Tier0 FP to the OSR method. This gives arm64 methods standard epilogs.
- Revise Arm64 frame layout to put PSPSym above callee-saves, so that OSR method can share Tier0 PSP, and OSR funclets don't need to pad their frames with the Tier0 frame (see notes starting here) and (more notes). Or, revise the OSR method so it shares the PSP slot with the TIer0 frame (requires split callee-save above). (PSP sym no longer exists: Abolish PSPSym from ABI #114630)
- look into enabling more independent promotion in OSR methods. Right now we use the Tier0 address exposure data and this is very conservative. Also see JIT: block some struct promotion for OSR #67131.
- Possibly addressed by JIT: import entire method for OSR, prune unneeded parts later #83910
- Possibly addressed by JIT: Add a (disabled) prototype for a generalized promotion pass #83388
- support OSR in methods with stackalloc (see note 2 below) and [further notes]
- support OSR for reverse pinvoke methods (see note 3 below)
- support OSR from methods that make explicit tail calls (see note 5 below)
- implement aggressive frame trimming (reduce original method frame to just live extent)
- look into viability of backpatching the patchpoint call with a jump to the OSR method instead
- look into how to support limited Tier0 opts with OSR
- look into emitting more compact patchpoint code sequences
- look into emitting more compact patchpoint info blobs
- think about asynchronous creation of OSR methods
- look into the feasibility of having one OSR method cover all the patchpoints
- look into using the "mutator" tool in jitutils to inject loops into methods that don't have them, so that we can trigger OSR in more cases.
- Note that random patchpoint placement and fast OSR triggers can accomplish something similar without needing to alter tests. There's nothing saying a patchpoint has to be within a loop. JIT: yet another OSR stress mode #62980
- OSR + GS -- perhaps OSR method should have its own cookie (if needed) in addition to the Tier0 cookie, and check them both on exit? Currently we just check the Tier0 cookie, but if the OSR frame holds saved LR/FP we might miss an overrun. Note: not needed if we move arm64 to the new x64 plan, as there's just one save area that gets restored, and it is in the Tier0 frame.
- update runtime strategy to support "slow" OSR method creation, but quick transitions when OSR methods exist
- support for mid-block patchpoints (where IL stack is empty). Among other things, this would let us do "source" patchpoint targeting more often.
- support for patchpoints at non-stack empty IL offsets. Would require custom per-site patchpoint descriptors and more.
- defer altering control flow for OSR until much later. Currently we do it very early and need to protect the original method entry specially in case we want to branch there during morph (see JIT: Improve local assertion prop throughput #94597 (comment)).
cc @dotnet/jit-contrib
category:cq
theme:osr
skill-level:expert
cost:extra-large