Skip to content

JIT: Avoid byref strength reductions in loops with suspension points #115605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 16, 2025

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented May 15, 2025

Reducing to a byref type (e.g. when iterating through an array) in a
loop with suspension points is not valid, since the new byref IV will be
live across the suspension point. In those cases instead strength reduce
to the index.

Fix #115260

Example diff in test
 ; Assembly listing for method StrengthReductionTest:StrengthReduction(int[]):int (FullOpts)
 ; Emitting BLENDED_CODE for X64 with AVX - Windows
 ; FullOpts code
 ; async
 ; optimized code
 ; rsp based frame
 ; partially interruptible
 ; No PGO data
 ; 0 inlinees with PGO data; 4 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 AsyncCont    [V00,T04] (  4,  3   )     ref  ->  rcx         single-def "Async continuation arg"
-;  V01 arg1         [V01,T05] (  3,  3   )     ref  ->  rdx         class-hnd single-def <int[]>
-;  V02 loc1         [V02,T02] (  6, 10   )     int  ->  rsi        
-;  V03 loc2         [V03,T06] (  3,  2.25)     ref  ->  rdx         class-hnd single-def <int[]>
+;  V00 AsyncCont    [V00,T05] (  5,  3   )     ref  ->  rcx         single-def "Async continuation arg"
+;  V01 arg1         [V01,T06] (  3,  3   )     ref  ->  rdx         class-hnd single-def <int[]>
+;  V02 loc1         [V02,T02] (  6, 10   )     int  ->  rbx        
+;  V03 loc2         [V03,T04] (  5,  6   )     ref  ->  rsi         class-hnd single-def <int[]>
 ;* V04 loc3         [V04    ] (  0,  0   )     int  ->  zero-ref   
 ;* V05 loc4         [V05    ] (  0,  0   )     int  ->  zero-ref   
 ;* V06 loc5         [V06    ] (  0,  0   )  struct ( 8) zero-ref    do-not-enreg[SF] ld-addr-op <System.Runtime.CompilerServices.YieldAwaitable+YieldAwaiter>
 ;* V07 loc6         [V07    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op <System.Runtime.CompilerServices.YieldAwaitable>
 ;  V08 OutArgs      [V08    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <UNNAMED>
 ;* V09 tmp1         [V09    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op "Inline ldloca(s) first use temp" <System.Runtime.CompilerServices.YieldAwaitable>
 ;* V10 tmp2         [V10    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op "Inline ldloca(s) first use temp" <System.Runtime.CompilerServices.YieldAwaitable+YieldAwaiter>
 ;  V11 cse0         [V11,T07] (  3,  2.25)     int  ->  rdi         "CSE #01: aggressive"
-;  V12 rat0         [V12,T01] (  4, 12.25)   byref  ->  rbx         must-init "Strength reduced derived IV"
-;  V13 rat1         [V13,T00] (  6, 12.25)     int  ->  rdi         "Trip count IV"
+;  V12 rat0         [V12,T00] (  6, 12.25)    long  ->  rbp         "Strength reduced derived IV"
+;  V13 rat1         [V13,T01] (  6, 12.25)     int  ->  rdi         "Trip count IV"
 ;  V14 rat2         [V14,T03] (  3,  8   )     ref  ->  rcx         "returned continuation"
-;  V15 rat3         [V15,T08] (  5,  0   )     ref  ->  rax         "new continuation"
-;  V16 rat4         [V16,T09] (  3,  0   )     ref  ->  rcx         "byte[] for continuation"
-;  V17 rat5         [V17,T10] (  3,  0   )     ref  ->  rdx         "byte[] for continuation"
+;  V15 rat3         [V15,T08] (  6,  0   )     ref  ->  r14         "new continuation"
+;  V16 rat4         [V16,T11] (  2,  0   )     ref  ->  rcx         "object[] for continuation"
+;  V17 rat5         [V17,T09] (  4,  0   )     ref  ->  rax         "byte[] for continuation"
+;  V18 rat6         [V18,T10] (  4,  0   )     ref  ->  rdx         "byte[] for continuation"
+;  V19 rat7         [V19,T12] (  2,  0   )     ref  ->  rdx         "object[] for continuation"
 ;
 ; Lcl frame size = 32
 
 G_M17835_IG01:
+       push     r14
        push     rdi
        push     rsi
+       push     rbp
        push     rbx
        sub      rsp, 32
-       xor      ebx, ebx
-						;; size=9 bbWeight=1 PerfScore 3.50
+						;; size=10 bbWeight=1 PerfScore 5.25
 G_M17835_IG02:
        test     rcx, rcx
-       jne      SHORT G_M17835_IG10
-       xor      esi, esi
-       mov      edi, dword ptr [rdx+0x08]
+       jne      G_M17835_IG10
+       xor      ebx, ebx
+       mov      rsi, rdx
+       mov      edi, dword ptr [rsi+0x08]
        test     edi, edi
        jle      SHORT G_M17835_IG06
-						;; size=14 bbWeight=1 PerfScore 4.75
+						;; size=21 bbWeight=1 PerfScore 5.00
 G_M17835_IG03:
-       lea      rbx, bword ptr [rdx+0x10]
-						;; size=4 bbWeight=0.25 PerfScore 0.12
+       mov      ebp, 16
+						;; size=5 bbWeight=0.25 PerfScore 0.06
 G_M17835_IG04:
-       add      esi, dword ptr [rbx]
+       add      ebx, dword ptr [rsi+rbp]
        xor      edx, edx
        xor      rcx, rcx
        call     [System.Runtime.CompilerServices.AsyncHelpers:UnsafeAwaitAwaiter[System.Runtime.CompilerServices.YieldAwaitable+YieldAwaiter](System.Runtime.CompilerServices.YieldAwaitable+YieldAwaiter)]
        test     rcx, rcx
        jne      SHORT G_M17835_IG08
-						;; size=17 bbWeight=4 PerfScore 31.00
+						;; size=18 bbWeight=4 PerfScore 31.00
 G_M17835_IG05:
-       add      rbx, 4
+       add      rbp, 4
        dec      edi
        jne      SHORT G_M17835_IG04
 						;; size=8 bbWeight=4 PerfScore 6.00
 G_M17835_IG06:
-       mov      eax, esi
+       mov      eax, ebx
        xor      ecx, ecx
 						;; size=4 bbWeight=1 PerfScore 0.50
 G_M17835_IG07:
        add      rsp, 32
        pop      rbx
+       pop      rbp
        pop      rsi
        pop      rdi
+       pop      r14
        ret      
-						;; size=8 bbWeight=1 PerfScore 2.75
+						;; size=11 bbWeight=1 PerfScore 3.75
 G_M17835_IG08:
-       xor      edx, edx
-       mov      r8d, 8
+       mov      edx, 1
+       mov      r8d, 16
        call     [CORINFO_HELP_ALLOC_CONTINUATION]
+       mov      r14, rax
        mov      rcx, 0xD1FFAB1E      ; code for (dynamicClass):IL_STUB_AsyncResume_StrengthReduction_Optimized(System.Object):System.Object
-       mov      qword ptr [rax+0x20], rcx
+       mov      qword ptr [r14+0x20], rcx
        xor      ecx, ecx
-       mov      qword ptr [rax+0x28], rcx
-       mov      rcx, gword ptr [rax+0x10]
-       mov      dword ptr [rcx+0x10], esi
-       mov      dword ptr [rcx+0x14], edi
-       mov      rcx, rax
-						;; size=47 bbWeight=0 PerfScore 0.00
+       mov      qword ptr [r14+0x28], rcx
+       mov      rcx, gword ptr [r14+0x18]
+       lea      rcx, bword ptr [rcx+0x10]
+       mov      rdx, rsi
+       call     CORINFO_HELP_ASSIGN_REF
+       mov      rax, gword ptr [r14+0x10]
+       mov      qword ptr [rax+0x10], rbp
+       mov      dword ptr [rax+0x18], ebx
+       mov      dword ptr [rax+0x1C], edi
+       mov      rcx, r14
+						;; size=73 bbWeight=0 PerfScore 0.00
 G_M17835_IG09:
        add      rsp, 32
        pop      rbx
+       pop      rbp
        pop      rsi
        pop      rdi
+       pop      r14
        ret      
-						;; size=8 bbWeight=0 PerfScore 0.00
+						;; size=11 bbWeight=0 PerfScore 0.00
 G_M17835_IG10:
        mov      rdx, gword ptr [rcx+0x10]
-       mov      esi, dword ptr [rdx+0x10]
-       mov      edi, dword ptr [rdx+0x14]
-       jmp      SHORT G_M17835_IG05
-						;; size=12 bbWeight=0 PerfScore 0.00
+       mov      rbp, qword ptr [rdx+0x10]
+       mov      ebx, dword ptr [rdx+0x18]
+       mov      edi, dword ptr [rdx+0x1C]
+       mov      rdx, gword ptr [rcx+0x18]
+       mov      rsi, gword ptr [rdx+0x10]
+       jmp      G_M17835_IG05
+						;; size=27 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 131, prolog size 7, PerfScore 48.62, instruction count 48, allocated bytes for code 131 (MethodHash=4bf7ba54) for method StrengthReductionTest:StrengthReduction(int[]):int (FullOpts)
+; Total bytes of code 188, prolog size 10, PerfScore 51.56, instruction count 63, allocated bytes for code 188 (MethodHash=4bf7ba54) for method StrengthReductionTest:StrengthReduction(int[]):int (FullOpts)
 ; ============================================================
 
-System.AggregateException: One or more errors occurred. (Object reference not set to an instance of an object.)
- ---> System.NullReferenceException: Object reference not set to an instance of an object.
-   at StrengthReductionTest.StrengthReduction(Int32[] arr) in C:\dev\dotnet\runtimelab\src\tests\async\strength-reduction.cs:line 21
-   at System.Runtime.CompilerServices.AsyncHelpers.DispatchContinuations(Continuation continuation)
-   at System.Runtime.CompilerServices.AsyncHelpers.FinalizeTaskReturningThunk[T](Continuation continuation)
-   --- End of inner exception stack trace ---
-   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
-   at StrengthReductionTest.TestEntryPoint() in C:\dev\dotnet\runtimelab\src\tests\async\strength-reduction.cs:line 15
-   at __GeneratedMainWrapper.Main() in C:\dev\dotnet\runtimelab\artifacts\tests\coreclr\obj\windows.x64.Release\Managed\async\strength-reduction\XUnitWrapperGenerator\XUnitWrapperGenerator.XUnitWrapperGenerator\SimpleRunner.g.cs:line 7

Reducing to a byref type (e.g. when iterating through an array) in a
loop with suspension points is not valid, since the new byref IV will be
live across the suspension point. In those cases instead strength reduce
to the index.
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 15, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch jakobbotsch marked this pull request as ready for review May 15, 2025 16:19
@Copilot Copilot AI review requested due to automatic review settings May 15, 2025 16:19
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @AndyAyersMS

No diffs. Very minor TP regressions.

@jakobbotsch jakobbotsch requested a review from AndyAyersMS May 15, 2025 16:19
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses issue #115260 by ensuring that loops containing suspension points use an index‐based induction variable rather than a byref type. Key changes include updating test code to verify the new behavior, renaming and refactoring LoopLocalOccurrences to PerLoopInfo throughout the JIT, and adjusting register and variable type assignments accordingly.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/tests/async/strength-reduction/strength-reduction.csproj Added project file for the strength reduction test with optimizations enabled.
src/tests/async/strength-reduction/strength-reduction.cs Test case updated; note the questionable use of “async2” in the method modifier.
src/coreclr/jit/inductionvariableopts.cpp Refactored internal local occurrences tracking from LoopLocalOccurrences to PerLoopInfo; register adjustments and inline comments updated.
src/coreclr/jit/compiler.h Header updated to reflect renaming from LoopLocalOccurrences to PerLoopInfo.
Comments suppressed due to low confidence (1)

src/tests/async/strength-reduction/strength-reduction.cs:19

  • The 'async2' modifier appears non-standard and may result in a compilation error. Please verify if this is intentional or if the standard 'async' keyword should be used instead.
private static async2 Task<int> StrengthReduction(int[] arr)

@jakobbotsch jakobbotsch merged commit b85dd69 into dotnet:main May 16, 2025
114 of 116 checks passed
@jakobbotsch jakobbotsch deleted the fix-115260 branch May 16, 2025 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RuntimeAsync] Disallow introduction of captured byrefs in strength reduction
2 participants