Skip to content

[RISC-V] Utilize Zba extension instructions #113999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 18, 2025

Conversation

fuad1502
Copy link
Contributor

@fuad1502 fuad1502 commented Mar 28, 2025

Main changes:

  • In morph.cpp, group arrayRef and index together, since this is the natural addressing mode for RISC-V.
  • In lower.cpp, try to lower:
    • ADD(LSH) into a new SH(X)ADD(_UW) node.
    • ADD(CAST) into a new ADD_UW node.
    • LSH(CAST) into a new LSSI_UW node.
  • If possible, GT_INDEX_ADDR emits sh(x)add(.uw) instruction.
  • Zero extension uses add.uw rs rs zero instruction.

With the following C# code:

public static int Fun1(int[] a, int i) {
    return a[i];
}

public static ulong Fun2(ulong[] a, ulong i) {
    return a[i];
}

public static byte Fun3(byte[] a, int i) {
    return a[i];
}

public static long Fun4(uint a) {
    return a;
}

public static long Fun5(uint a) {
    long temp = a;
    return temp << 1;
}

Before patch:

; Fun 1
slli           a1, a1, 32
srli           a1, a1, 32
slli           a1, a1, 2
add            a2, a0, a1
lw             a0, 16(a2)

; Fun 2
slli           a1, a1, 3
add            a2, a0, a1
ld             a0, 16(a2)

; Fun 3
slli           a1, a1, 32
srli           a1, a1, 32
add            a2, a0, a1
lbu            a0, 16(a2)

; Fun 4
slli           a0, a0, 32
srli           a0, a0, 32

; Fun 5
slli           a0, a0, 32
srli           a0, a0, 32
slli           a0, a0, 1

After patch:

; Fun 1
sh2add.uw      a0, a1, a0
lw             a0, 16(a0)

; Fun 2
sh3add         a0, a1, a0
ld             a0, 16(a0)

; Fun 3
add.uw         a0, a1, a0
lbu            a0, 16(a0)

; Fun 4
zext.w         a0, a0

; Fun 5
slli.uw        a0, a0, 1

P.S. See #113678 for previous approach on utilizing sh(x)add(.uw) instructions.

Part of #84834, cc @dotnet/samsung

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 28, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 28, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@risc-vv
Copy link

risc-vv commented Mar 28, 2025

RISC-V Release-CLR-VF2: 9524 / 9544 (99.79%)
=======================
      passed: 9524
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 14min 48s 946ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: f8907d2bb2a8ca10ca166599e6387cc48374009c
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/utilize-shxadd
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 3304 / 9544 (34.62%)
=======================
      passed: 3304
      failed: 6223
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 46min 51s 603ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: f8907d2bb2a8ca10ca166599e6387cc48374009c
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/utilize-shxadd
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 0 / 258 (0.00%)
=======================
      passed: 0
      failed: 0
     skipped: 0
      killed: 258
------------------------
  TOTAL libs: 258
 TOTAL tests: 258
   REAL time: 10min 17s 189ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: f8907d2bb2a8ca10ca166599e6387cc48374009c
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/utilize-shxadd
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 631024 / 667795 (94.49%)
=======================
      passed: 631024
      failed: 692
     skipped: 1539
      killed: 36079
------------------------
  TOTAL libs: 258
 TOTAL tests: 669334
   REAL time: 2h 52min 58s 730ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: f8907d2bb2a8ca10ca166599e6387cc48374009c
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/utilize-shxadd
CONFIG: Release
LIB_CONFIG: Release

@fuad1502 fuad1502 mentioned this pull request Mar 28, 2025
3 tasks
@fuad1502
Copy link
Contributor Author

Regressions are due to grouping arrRef with index together in morph.cpp, which sometimes might hinder CSE optimization. However, I've confirmed that simply grouping them together results in an overall size reduction.

superpmi asmdiffs result for commit f8907d2 :

Diffs are based on 12,624 contexts (10,243 MinOpts, 2,381 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 2 (0.02%)

Overall (-20,492 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 6,930,664 -20,492 -0.48%
MinOpts (-7,496 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 5,385,844 -7,496 -0.31%
FullOpts (-12,996 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 1,544,820 -12,996 -0.69%
Example diffs
test.mch
-12 (-12.00%) : 3142.dasm - System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
@@ -33,13 +33,11 @@ G_M52328_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byre
             sext.w         a3, a1
             sext.w         a4, a2
             bgeu           a4, a3, G_M52328_IG04
-            slli           a1, a2, 32
-            srli           a1, a1, 32
-            slli           a1, a1, 3
-            add            a2, a0, a1
-            ; byrRegs +[a2]
-            ld             a0, 0xD1FFAB1E(a2)
-						;; size=40 bbWeight=1 PerfScore 12.50
+            sh3add.uw      a0, a2, a0
+            ; gcrRegs -[a0]
+            ld             a0, 0xD1FFAB1E(a0)
+            ; gcrRegs +[a0]
+						;; size=28 bbWeight=1 PerfScore 11.00
 G_M52328_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
@@ -47,7 +45,6 @@ G_M52328_IG03:        ; bbWeight=1, epilog, nogc, extend
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 G_M52328_IG04:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
             ; gcrRegs -[a0]
-            ; byrRegs -[a2]
             lui            a0, 0xD1FFAB1E
             addiw          a0, a0, 0xD1FFAB1E
             slli           a0, a0, 11
@@ -57,7 +54,7 @@ G_M52328_IG04:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             ebreak
 						;; size=28 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 100, prolog size 16, PerfScore 29.00, instruction count 25, allocated bytes for code 100 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
+; Total bytes of code 88, prolog size 16, PerfScore 27.50, instruction count 22, allocated bytes for code 88 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -68,7 +65,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 25 (0x00019) Actual length = 100 (0x000064)
+  Function Length   : 22 (0x00016) Actual length = 88 (0x000058)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-12 (-11.54%) : 3944.dasm - System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)
@@ -34,14 +34,10 @@ G_M46720_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byre
             sext.w         a3, a2
             sext.w         a4, a0
             bgeu           a4, a3, G_M46720_IG04
-            slli           a0, a0, 32
-            srli           a0, a0, 32
-            slli           a0, a0, 3
-            add            a2, a1, a0
-            ; byrRegs +[a2]
-            ld             a0, 0xD1FFAB1E(a2)
+            sh3add.uw      a0, a0, a1
+            ld             a0, 0xD1FFAB1E(a0)
             ; gcrRegs +[a0]
-						;; size=44 bbWeight=1 PerfScore 14.50
+						;; size=32 bbWeight=1 PerfScore 13.00
 G_M46720_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
@@ -49,7 +45,6 @@ G_M46720_IG03:        ; bbWeight=1, epilog, nogc, extend
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 G_M46720_IG04:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
             ; gcrRegs -[a0-a1]
-            ; byrRegs -[a2]
             lui            a0, 0xD1FFAB1E
             addiw          a0, a0, 0xD1FFAB1E
             slli           a0, a0, 11
@@ -59,7 +54,7 @@ G_M46720_IG04:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             ebreak
 						;; size=28 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 104, prolog size 16, PerfScore 31.00, instruction count 26, allocated bytes for code 104 (MethodHash=ea0e497f) for method System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)
+; Total bytes of code 92, prolog size 16, PerfScore 29.50, instruction count 23, allocated bytes for code 92 (MethodHash=ea0e497f) for method System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -70,7 +65,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 26 (0x0001a) Actual length = 104 (0x000068)
+  Function Length   : 23 (0x00017) Actual length = 92 (0x00005c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-32 (-10.13%) : 323.dasm - NumericSortJagged:NumSift(int[],int,int) (Tier1)
@@ -15,10 +15,10 @@
 ;  V04 loc1         [V04,T11] (  2, 16.07)     int  ->   a6        
 ;# V05 OutArgs      [V05    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V06 tmp1         [V06,T04] (  2, 32.14)     int  ->   a5         "Strict ordering of exceptions for Array store"
-;  V07 cse0         [V07,T06] (  3, 24.57)     int  ->   a6         "CSE #07: aggressive"
-;  V08 cse1         [V08,T07] (  3, 24.57)     int  ->   a5         "CSE #12: aggressive"
-;  V09 cse2         [V09,T08] (  3, 24.57)    long  ->   a4         "CSE #05: aggressive"
-;  V10 cse3         [V10,T09] (  3, 24.57)    long  ->   a1         "CSE #10: aggressive"
+;  V07 cse0         [V07,T06] (  3, 24.57)     int  ->   a6         "CSE #06: aggressive"
+;  V08 cse1         [V08,T07] (  3, 24.57)     int  ->   a5         "CSE #10: aggressive"
+;  V09 cse2         [V09,T08] (  3, 24.57)    long  ->   a4         "CSE #04: aggressive"
+;  V10 cse3         [V10,T09] (  3, 24.57)    long  ->   a1         "CSE #08: aggressive"
 ;  V11 cse4         [V11,T02] (  6, 49.57)     int  ->   a4         multi-def "CSE #01: aggressive"
 ;  V12 cse5         [V12,T05] (  4, 29.17)     int  ->   a6         "CSE #02: aggressive"
 ;
@@ -60,34 +60,36 @@ G_M30577_IG06:        ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
             slli           a1, a1, 32
             srli           a1, a1, 32
             slli           a1, a1, 2
-            addi           a1, a1, 0xD1FFAB1E
-            add            t6, a0, a1
-            ; byrRegs +[t6]
-            lw             a5, 0xD1FFAB1E(t6)
+            add            a5, a0, a1
+            ; byrRegs +[a5]
+            lw             a5, 0xD1FFAB1E(a5)
+            ; byrRegs -[a5]
             sext.w         a6, a4
             sext.w         a7, a3
             bgeu           a7, a6, G_M30577_IG11
             slli           a4, a3, 32
             srli           a4, a4, 32
             slli           a4, a4, 2
-            addi           a4, a4, 0xD1FFAB1E
-            add            t6, a0, a4
-            lw             a6, 0xD1FFAB1E(t6)
+            add            a6, a0, a4
+            ; byrRegs +[a6]
+            lw             a6, 0xD1FFAB1E(a6)
+            ; byrRegs -[a6]
             slliw          ra, a5, 0
             slliw          t6, a6, 0
-            ; byrRegs -[t6]
             bge            ra, t6, G_M30577_IG04
-						;; size=88 bbWeight=8.27 PerfScore 202.53
+						;; size=80 bbWeight=8.27 PerfScore 194.27
 G_M30577_IG07:        ; bbWeight=8.04, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
-            add            t6, a0, a4
-            ; byrRegs +[t6]
-            sw             a5, 0xD1FFAB1E(t6)
-            add            t6, a0, a1
-            sw             a6, 0xD1FFAB1E(t6)
+            add            a4, a0, a4
+            ; byrRegs +[a4]
+            sw             a5, 0xD1FFAB1E(a4)
+            add            a1, a0, a1
+            ; byrRegs +[a1]
+            sw             a6, 0xD1FFAB1E(a1)
             sext.w         a1, a3
+            ; byrRegs -[a1]
 						;; size=20 bbWeight=8.04 PerfScore 76.34
 G_M30577_IG08:        ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
-            ; byrRegs -[t6]
+            ; byrRegs -[a4]
             slliw          a3, a1, 1
             slliw          ra, a3, 0
             slliw          t6, a2, 0
@@ -104,31 +106,21 @@ G_M30577_IG10:        ; bbWeight=8.26, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
             sext.w         a5, a4
             sext.w         a6, a3
             bgeu           a6, a5, G_M30577_IG11
-            slli           a5, a3, 32
-            srli           a5, a5, 32
-            slli           a5, a5, 2
-            add            a6, a0, a5
-            ; byrRegs +[a6]
-            lw             a5, 0xD1FFAB1E(a6)
+            sh2add.uw      a5, a3, a0
+            lw             a5, 0xD1FFAB1E(a5)
             addiw          a6, a3, 0xD1FFAB1E
-            ; byrRegs -[a6]
             sext.w         t0, a4
             sext.w         a7, a6
             bgeu           a7, t0, G_M30577_IG11
-            slli           a4, a6, 32
-            srli           a4, a4, 32
-            slli           a4, a4, 2
-            add            a7, a0, a4
-            ; byrRegs +[a7]
-            lw             a4, 0xD1FFAB1E(a7)
+            sh2add.uw      a4, a6, a0
+            lw             a4, 0xD1FFAB1E(a4)
             slliw          ra, a5, 0
             slliw          t6, a4, 0
             bge            ra, t6, G_M30577_IG06
             j              G_M30577_IG05
-						;; size=88 bbWeight=8.26 PerfScore 210.59
+						;; size=64 bbWeight=8.26 PerfScore 185.81
 G_M30577_IG11:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcrRegs -[a0]
-            ; byrRegs -[a7]
             lui            a0, 0xD1FFAB1E
             addiw          a0, a0, 0xD1FFAB1E
             slli           a0, a0, 11
@@ -139,7 +131,7 @@ G_M30577_IG11:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ebreak
 						;; size=28 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 316, prolog size 16, PerfScore 596.28, instruction count 79, allocated bytes for code 316 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
+; Total bytes of code 284, prolog size 16, PerfScore 563.24, instruction count 71, allocated bytes for code 284 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -150,7 +142,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 79 (0x0004f) Actual length = 316 (0x00013c)
+  Function Length   : 71 (0x00047) Actual length = 284 (0x00011c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+8 (+0.49%) : 631.dasm - EMFloatClass:Run():double:this (Tier1)
@@ -30,10 +30,10 @@
 ;  V18 tmp10        [V18,T06] (  2, 668.27)     ref  ->   t4         class-hnd exact "NewArr temp" <<unknown class>>
 ;  V19 tmp11        [V19,T07] (  2, 668.27)     ref  ->   t4         class-hnd exact "NewArr temp" <<unknown class>>
 ;  V20 tmp12        [V20,T20] (  2,   0   )     ref  ->   a1         single-def "argument with side effect"
-;  V21 cse0         [V21,T04] (  4, 668.27)    long  ->   s8         "CSE #05: aggressive"
-;  V22 cse1         [V22,T15] (  4,   4   )    long  ->   s3         "CSE #02: aggressive"
+;  V21 cse0         [V21,T15] (  4,   4   )    long  ->   s3         "CSE #02: aggressive"
+;  V22 cse1         [V22,T04] (  4, 668.27)    long  ->   s8         "CSE #04: aggressive"
 ;  V23 cse2         [V23,T14] (  5, 171.07)     int  ->   s2         "CSE #01: aggressive"
-;  V24 cse3         [V24,T18] (  2,  65.29)  double  ->  fs7         hoist "CSE #06: aggressive"
+;  V24 cse3         [V24,T18] (  2,  65.29)  double  ->  fs7         hoist "CSE #05: aggressive"
 ;  V25 rat0         [V25,T11] (  3, 385.71)    long  ->   a0         "ReplaceWithLclVar is creating a new local variable"
 ;
 ; Lcl frame size = 0
@@ -176,10 +176,10 @@ G_M34029_IG03:        ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
             bgeu           t4, t0, G_M34029_IG16
             slli           t3, s6, 32
             srli           t3, t3, 32
-            slli           t3, t3, 3
-            addi           s8, t3, 0xD1FFAB1E
+            slli           s8, t3, 3
             add            t3, s4, s8
             ; byrRegs +[t3]
+            addi           t3, t3, 0xD1FFAB1E
             mv             t4, s7, 
             ; gcrRegs +[t4]
             lui            t2, 0xD1FFAB1E
@@ -249,6 +249,7 @@ G_M34029_IG03:        ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
             bgeu           t4, t0, G_M34029_IG16
             add            t3, s5, s8
             ; byrRegs +[t3]
+            addi           t3, t3, 0xD1FFAB1E
             mv             t4, s7, 
             ; gcrRegs +[t4]
             lui            t2, 0xD1FFAB1E
@@ -318,6 +319,7 @@ G_M34029_IG03:        ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
             bgeu           t4, t0, G_M34029_IG16
             add            t3, s3, s8
             ; byrRegs +[t3]
+            addi           t3, t3, 0xD1FFAB1E
             mv             t4, s7, 
             ; gcrRegs +[t4]
             lui            t2, 0xD1FFAB1E
@@ -334,7 +336,7 @@ G_M34029_IG03:        ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
             slliw          ra, s6, 0
             slliw          t6, s2, 0
             blt            ra, t6, G_M34029_IG03
-						;; size=692 bbWeight=167.07 PerfScore 30990.97
+						;; size=700 bbWeight=167.07 PerfScore 31158.04
 G_M34029_IG04:        ; bbWeight=1.00, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRegs=0000 {}, byref
             sext.w         a3, s2
             mv             a0, s4, 
@@ -567,7 +569,7 @@ RWD00  	dq	3FF0000000000000h	;            1
 RWD08  	dq	408F400000000000h	;         1000
 
 
-; Total bytes of code 1628, prolog size 60, PerfScore 33424.09, instruction count 355, allocated bytes for code 1628 (MethodHash=bd287b12) for method EMFloatClass:Run():double:this (Tier1)
+; Total bytes of code 1636, prolog size 60, PerfScore 33591.16, instruction count 357, allocated bytes for code 1636 (MethodHash=bd287b12) for method EMFloatClass:Run():double:this (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -578,7 +580,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 407 (0x00197) Actual length = 1628 (0x00065c)
+  Function Length   : 409 (0x00199) Actual length = 1636 (0x000664)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 12432.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SymbolExtensions:GetTypeOrReturnType(Microsoft.CodeAnalysis.CSharp.Symbol,byref,byref,byref) (Tier0)

No diffs found?

+0 (0.00%) : 11328.dasm - Microsoft.Cci.MetadataWriter:SerializePrimitiveType(System.Reflection.Metadata.Ecma335.CustomAttributeElementTypeEncoder,int) (Tier0)

No diffs found?

Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
test.mch 1,454 830 1 623 -20,500 +8

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
test.mch 1,454 783 6 665 -0.88% +0.41% -0.1819%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
test.mch 12,624 10,243 2,381 0 (0.00%) 2 (0.02%)

jit-analyze output

@risc-vv
Copy link

risc-vv commented Mar 28, 2025

e46b97e is being scheduled for building and testing

GIT: e46b97e8d5568393f9e99c0e8819454cb6944749
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/utilize-shxadd

@risc-vv
Copy link

risc-vv commented Mar 29, 2025

RISC-V Release-CLR-VF2: 9524 / 9544 (99.79%)
=======================
      passed: 9524
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 17min 21s 368ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 0273889eae729b1c57b66f3bceeb381a77ef0bb8
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/utilize-shxadd
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9524 / 9544 (99.79%)
=======================
      passed: 9524
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 47min 57s 651ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: 0273889eae729b1c57b66f3bceeb381a77ef0bb8
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/utilize-shxadd
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 625306 / 650316 (96.15%)
=======================
      passed: 625306
      failed: 872
     skipped: 1703
      killed: 24138
------------------------
  TOTAL libs: 258
 TOTAL tests: 652019
   REAL time: 1h 8min 11s 835ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 0273889eae729b1c57b66f3bceeb381a77ef0bb8
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/utilize-shxadd
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 633660 / 669510 (94.65%)
=======================
      passed: 633660
      failed: 688
     skipped: 1517
      killed: 35162
------------------------
  TOTAL libs: 258
 TOTAL tests: 671027
   REAL time: 3h 38min 4s 349ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 0273889eae729b1c57b66f3bceeb381a77ef0bb8
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/utilize-shxadd
CONFIG: Release
LIB_CONFIG: Release

@fuad1502
Copy link
Contributor Author

superpmi asmdiffs result for commit 0273889 :

Diffs are based on 12,626 contexts (10,243 MinOpts, 2,383 FullOpts).

Overall (-17,260 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 6,935,716 -17,260 -0.43%
MinOpts (-7,028 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 5,385,844 -7,028 -0.29%
FullOpts (-10,232 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 1,549,872 -10,232 -0.63%
Example diffs
test.mch
-12 (-12.00%) : 3142.dasm - System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
@@ -33,13 +33,11 @@ G_M52328_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byre
             sext.w         a3, a1
             sext.w         a4, a2
             bgeu           a4, a3, G_M52328_IG04
-            slli           a1, a2, 32
-            srli           a1, a1, 32
-            slli           a1, a1, 3
-            add            a2, a0, a1
-            ; byrRegs +[a2]
-            ld             a0, 0xD1FFAB1E(a2)
-						;; size=40 bbWeight=1 PerfScore 12.50
+            sh3add.uw      a0, a2, a0
+            ; gcrRegs -[a0]
+            ld             a0, 0xD1FFAB1E(a0)
+            ; gcrRegs +[a0]
+						;; size=28 bbWeight=1 PerfScore 11.00
 G_M52328_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
@@ -47,7 +45,6 @@ G_M52328_IG03:        ; bbWeight=1, epilog, nogc, extend
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 G_M52328_IG04:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
             ; gcrRegs -[a0]
-            ; byrRegs -[a2]
             lui            a0, 0xD1FFAB1E
             addiw          a0, a0, 0xD1FFAB1E
             slli           a0, a0, 11
@@ -57,7 +54,7 @@ G_M52328_IG04:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             ebreak
 						;; size=28 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 100, prolog size 16, PerfScore 29.00, instruction count 25, allocated bytes for code 100 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
+; Total bytes of code 88, prolog size 16, PerfScore 27.50, instruction count 22, allocated bytes for code 88 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -68,7 +65,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 25 (0x00019) Actual length = 100 (0x000064)
+  Function Length   : 22 (0x00016) Actual length = 88 (0x000058)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-12 (-11.54%) : 3944.dasm - System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)
@@ -34,14 +34,10 @@ G_M46720_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byre
             sext.w         a3, a2
             sext.w         a4, a0
             bgeu           a4, a3, G_M46720_IG04
-            slli           a0, a0, 32
-            srli           a0, a0, 32
-            slli           a0, a0, 3
-            add            a2, a1, a0
-            ; byrRegs +[a2]
-            ld             a0, 0xD1FFAB1E(a2)
+            sh3add.uw      a0, a0, a1
+            ld             a0, 0xD1FFAB1E(a0)
             ; gcrRegs +[a0]
-						;; size=44 bbWeight=1 PerfScore 14.50
+						;; size=32 bbWeight=1 PerfScore 13.00
 G_M46720_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
@@ -49,7 +45,6 @@ G_M46720_IG03:        ; bbWeight=1, epilog, nogc, extend
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 G_M46720_IG04:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
             ; gcrRegs -[a0-a1]
-            ; byrRegs -[a2]
             lui            a0, 0xD1FFAB1E
             addiw          a0, a0, 0xD1FFAB1E
             slli           a0, a0, 11
@@ -59,7 +54,7 @@ G_M46720_IG04:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             ebreak
 						;; size=28 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 104, prolog size 16, PerfScore 31.00, instruction count 26, allocated bytes for code 104 (MethodHash=ea0e497f) for method System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)
+; Total bytes of code 92, prolog size 16, PerfScore 29.50, instruction count 23, allocated bytes for code 92 (MethodHash=ea0e497f) for method System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -70,7 +65,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 26 (0x0001a) Actual length = 104 (0x000068)
+  Function Length   : 23 (0x00017) Actual length = 92 (0x00005c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-32 (-10.13%) : 323.dasm - NumericSortJagged:NumSift(int[],int,int) (Tier1)
@@ -15,10 +15,10 @@
 ;  V04 loc1         [V04,T11] (  2, 16.07)     int  ->   a6        
 ;# V05 OutArgs      [V05    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V06 tmp1         [V06,T04] (  2, 32.14)     int  ->   a5         "Strict ordering of exceptions for Array store"
-;  V07 cse0         [V07,T06] (  3, 24.57)     int  ->   a6         "CSE #07: aggressive"
-;  V08 cse1         [V08,T07] (  3, 24.57)     int  ->   a5         "CSE #12: aggressive"
-;  V09 cse2         [V09,T08] (  3, 24.57)    long  ->   a4         "CSE #05: aggressive"
-;  V10 cse3         [V10,T09] (  3, 24.57)    long  ->   a1         "CSE #10: aggressive"
+;  V07 cse0         [V07,T06] (  3, 24.57)     int  ->   a6         "CSE #06: aggressive"
+;  V08 cse1         [V08,T07] (  3, 24.57)     int  ->   a5         "CSE #10: aggressive"
+;  V09 cse2         [V09,T08] (  3, 24.57)    long  ->   a4         "CSE #04: aggressive"
+;  V10 cse3         [V10,T09] (  3, 24.57)    long  ->   a1         "CSE #08: aggressive"
 ;  V11 cse4         [V11,T02] (  6, 49.57)     int  ->   a4         multi-def "CSE #01: aggressive"
 ;  V12 cse5         [V12,T05] (  4, 29.17)     int  ->   a6         "CSE #02: aggressive"
 ;
@@ -60,34 +60,36 @@ G_M30577_IG06:        ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
             slli           a1, a1, 32
             srli           a1, a1, 32
             slli           a1, a1, 2
-            addi           a1, a1, 0xD1FFAB1E
-            add            t6, a0, a1
-            ; byrRegs +[t6]
-            lw             a5, 0xD1FFAB1E(t6)
+            add            a5, a0, a1
+            ; byrRegs +[a5]
+            lw             a5, 0xD1FFAB1E(a5)
+            ; byrRegs -[a5]
             sext.w         a6, a4
             sext.w         a7, a3
             bgeu           a7, a6, G_M30577_IG11
             slli           a4, a3, 32
             srli           a4, a4, 32
             slli           a4, a4, 2
-            addi           a4, a4, 0xD1FFAB1E
-            add            t6, a0, a4
-            lw             a6, 0xD1FFAB1E(t6)
+            add            a6, a0, a4
+            ; byrRegs +[a6]
+            lw             a6, 0xD1FFAB1E(a6)
+            ; byrRegs -[a6]
             slliw          ra, a5, 0
             slliw          t6, a6, 0
-            ; byrRegs -[t6]
             bge            ra, t6, G_M30577_IG04
-						;; size=88 bbWeight=8.27 PerfScore 202.53
+						;; size=80 bbWeight=8.27 PerfScore 194.27
 G_M30577_IG07:        ; bbWeight=8.04, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
-            add            t6, a0, a4
-            ; byrRegs +[t6]
-            sw             a5, 0xD1FFAB1E(t6)
-            add            t6, a0, a1
-            sw             a6, 0xD1FFAB1E(t6)
+            add            a4, a0, a4
+            ; byrRegs +[a4]
+            sw             a5, 0xD1FFAB1E(a4)
+            add            a1, a0, a1
+            ; byrRegs +[a1]
+            sw             a6, 0xD1FFAB1E(a1)
             sext.w         a1, a3
+            ; byrRegs -[a1]
 						;; size=20 bbWeight=8.04 PerfScore 76.34
 G_M30577_IG08:        ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
-            ; byrRegs -[t6]
+            ; byrRegs -[a4]
             slliw          a3, a1, 1
             slliw          ra, a3, 0
             slliw          t6, a2, 0
@@ -104,31 +106,21 @@ G_M30577_IG10:        ; bbWeight=8.26, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
             sext.w         a5, a4
             sext.w         a6, a3
             bgeu           a6, a5, G_M30577_IG11
-            slli           a5, a3, 32
-            srli           a5, a5, 32
-            slli           a5, a5, 2
-            add            a6, a0, a5
-            ; byrRegs +[a6]
-            lw             a5, 0xD1FFAB1E(a6)
+            sh2add.uw      a5, a3, a0
+            lw             a5, 0xD1FFAB1E(a5)
             addiw          a6, a3, 0xD1FFAB1E
-            ; byrRegs -[a6]
             sext.w         t0, a4
             sext.w         a7, a6
             bgeu           a7, t0, G_M30577_IG11
-            slli           a4, a6, 32
-            srli           a4, a4, 32
-            slli           a4, a4, 2
-            add            a7, a0, a4
-            ; byrRegs +[a7]
-            lw             a4, 0xD1FFAB1E(a7)
+            sh2add.uw      a4, a6, a0
+            lw             a4, 0xD1FFAB1E(a4)
             slliw          ra, a5, 0
             slliw          t6, a4, 0
             bge            ra, t6, G_M30577_IG06
             j              G_M30577_IG05
-						;; size=88 bbWeight=8.26 PerfScore 210.59
+						;; size=64 bbWeight=8.26 PerfScore 185.81
 G_M30577_IG11:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcrRegs -[a0]
-            ; byrRegs -[a7]
             lui            a0, 0xD1FFAB1E
             addiw          a0, a0, 0xD1FFAB1E
             slli           a0, a0, 11
@@ -139,7 +131,7 @@ G_M30577_IG11:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ebreak
 						;; size=28 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 316, prolog size 16, PerfScore 596.28, instruction count 79, allocated bytes for code 316 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
+; Total bytes of code 284, prolog size 16, PerfScore 563.24, instruction count 71, allocated bytes for code 284 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -150,7 +142,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 79 (0x0004f) Actual length = 316 (0x00013c)
+  Function Length   : 71 (0x00047) Actual length = 284 (0x00011c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+8 (+0.49%) : 631.dasm - EMFloatClass:Run():double:this (Tier1)
@@ -30,10 +30,10 @@
 ;  V18 tmp10        [V18,T06] (  2, 668.27)     ref  ->   t4         class-hnd exact "NewArr temp" <<unknown class>>
 ;  V19 tmp11        [V19,T07] (  2, 668.27)     ref  ->   t4         class-hnd exact "NewArr temp" <<unknown class>>
 ;  V20 tmp12        [V20,T20] (  2,   0   )     ref  ->   a1         single-def "argument with side effect"
-;  V21 cse0         [V21,T04] (  4, 668.27)    long  ->   s8         "CSE #05: aggressive"
-;  V22 cse1         [V22,T15] (  4,   4   )    long  ->   s3         "CSE #02: aggressive"
+;  V21 cse0         [V21,T15] (  4,   4   )    long  ->   s3         "CSE #02: aggressive"
+;  V22 cse1         [V22,T04] (  4, 668.27)    long  ->   s8         "CSE #04: aggressive"
 ;  V23 cse2         [V23,T14] (  5, 171.07)     int  ->   s2         "CSE #01: aggressive"
-;  V24 cse3         [V24,T18] (  2,  65.29)  double  ->  fs7         hoist "CSE #06: aggressive"
+;  V24 cse3         [V24,T18] (  2,  65.29)  double  ->  fs7         hoist "CSE #05: aggressive"
 ;  V25 rat0         [V25,T11] (  3, 385.71)    long  ->   a0         "ReplaceWithLclVar is creating a new local variable"
 ;
 ; Lcl frame size = 0
@@ -176,10 +176,10 @@ G_M34029_IG03:        ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
             bgeu           t4, t0, G_M34029_IG16
             slli           t3, s6, 32
             srli           t3, t3, 32
-            slli           t3, t3, 3
-            addi           s8, t3, 0xD1FFAB1E
+            slli           s8, t3, 3
             add            t3, s4, s8
             ; byrRegs +[t3]
+            addi           t3, t3, 0xD1FFAB1E
             mv             t4, s7, 
             ; gcrRegs +[t4]
             lui            t2, 0xD1FFAB1E
@@ -249,6 +249,7 @@ G_M34029_IG03:        ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
             bgeu           t4, t0, G_M34029_IG16
             add            t3, s5, s8
             ; byrRegs +[t3]
+            addi           t3, t3, 0xD1FFAB1E
             mv             t4, s7, 
             ; gcrRegs +[t4]
             lui            t2, 0xD1FFAB1E
@@ -318,6 +319,7 @@ G_M34029_IG03:        ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
             bgeu           t4, t0, G_M34029_IG16
             add            t3, s3, s8
             ; byrRegs +[t3]
+            addi           t3, t3, 0xD1FFAB1E
             mv             t4, s7, 
             ; gcrRegs +[t4]
             lui            t2, 0xD1FFAB1E
@@ -334,7 +336,7 @@ G_M34029_IG03:        ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
             slliw          ra, s6, 0
             slliw          t6, s2, 0
             blt            ra, t6, G_M34029_IG03
-						;; size=692 bbWeight=167.07 PerfScore 30990.97
+						;; size=700 bbWeight=167.07 PerfScore 31158.04
 G_M34029_IG04:        ; bbWeight=1.00, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRegs=0000 {}, byref
             sext.w         a3, s2
             mv             a0, s4, 
@@ -567,7 +569,7 @@ RWD00  	dq	3FF0000000000000h	;            1
 RWD08  	dq	408F400000000000h	;         1000
 
 
-; Total bytes of code 1628, prolog size 60, PerfScore 33424.09, instruction count 355, allocated bytes for code 1628 (MethodHash=bd287b12) for method EMFloatClass:Run():double:this (Tier1)
+; Total bytes of code 1636, prolog size 60, PerfScore 33591.16, instruction count 357, allocated bytes for code 1636 (MethodHash=bd287b12) for method EMFloatClass:Run():double:this (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -578,7 +580,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 407 (0x00197) Actual length = 1628 (0x00065c)
+  Function Length   : 409 (0x00199) Actual length = 1636 (0x000664)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 12432.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SymbolExtensions:GetTypeOrReturnType(Microsoft.CodeAnalysis.CSharp.Symbol,byref,byref,byref) (Tier0)

No diffs found?

+0 (0.00%) : 11328.dasm - Microsoft.Cci.MetadataWriter:SerializePrimitiveType(System.Reflection.Metadata.Ecma335.CustomAttributeElementTypeEncoder,int) (Tier0)

No diffs found?

Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
test.mch 1,350 718 1 631 -17,268 +8

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
test.mch 1,350 675 6 669 -0.86% +0.41% -0.1466%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
test.mch 12,626 10,243 2,383 0 (0.00%) 0 (0.00%)

jit-analyze output

@fuad1502 fuad1502 marked this pull request as ready for review March 29, 2025 06:52
@BruceForstall BruceForstall self-requested a review April 2, 2025 22:10
@@ -3160,12 +3160,17 @@ GenTree* Compiler::fgMorphIndexAddr(GenTreeIndexAddr* indexAddr)
//
// 1) "arrRef + (index + elemOffset)"
// 2) "(arrRef + elemOffset) + index"
// 3) "(arrRef + index) + elemOffset"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This -- your new option 3 -- is fundamentally unsound for GC, and in fact, is exactly what the comments here are trying to prevent. There is no guarantee that "index" is >0 and < the size of the array object. That means that arrRef + index can point outside the array, and thus won't be properly handled by the garbage collector. You can see dotnet/coreclr#17524 where these restrictions/comments were originally added.

Copy link
Contributor Author

@fuad1502 fuad1502 Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out, sorry I didn’t understand the warning statement properly before.

After reading the original PR, I can now see why it might point outside the object. However, now I don’t understand why form 2 for ARM, which has (arrRef + offset) BYREF ADD is valid. From your example (a + (i - 10) * 4 + 8 = a + (i * 4) - 32), offset can be negative, and therefore (arrRef + offset) might point outside the object, right? Or am I misunderstanding something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve looked at the PR introducing form 2 (#61293), but I somehow still can’t see how the first BYREF ADD won’t point outside the object, especially for the example above.

Copy link
Member

@jakobbotsch jakobbotsch Apr 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset in the example is always 8 (for arm32); it represents the distance from the object reference to the start of the element data in the array.

However, I'm also confused by the comment. Certainly reordering the formation of the byref with the bounds check would be an illegal transformation by the JIT, but if we have already bounds checked that index is within the array, then I do not see why (arrRef + index) + elemOffset would cause problems.

Problem 2 fixed in dotnet/coreclr#17524 seems like the "actual" problem to me, since it looked like we were assuming addition between byrefs and integers was associative in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why form 2 does not cause problems. It's possible that our expression optimizations are not sufficiently capable to convert:

a + 8 + (i - 10) * 4
=>
a + 8 + i * 4 - 40
=>
a - 32 + i * 4
=>
a + i * 4 - 32

which was the problematic original case. I don't see why this would be prevented. Does the JIT actually prevent expression optimization in byref expression trees? @EgorBo do you remember why case 2 here is sound?

The original problem case was:

a + (i - 10) * 4 + 8
=>
a + i * 4 - 40 + 8
=>
a + i * 4 - 32

Perhaps either the original fix helped prevent this, or something in the interim years did.

Problem 2 fixed in dotnet/coreclr#17524 seems like the "actual" problem to me, since it looked like we were assuming addition between byrefs and integers was associative in general.

Without that fix, the "actual fix" (fix 1: changing the types of the array index morphed terms) would be "un-done" by the subsequent morph.

Certainly reordering the formation of the byref with the bounds check would be an illegal transformation by the JIT, but if we have already bounds checked that index is within the array, then I do not see why (arrRef + index) + elemOffset would cause problems.

The problem is not the full expression, or whether it is bounds checked, it is when a partially computed byref expression is reported in a register, and that partial computation does not point within the object.

Copy link
Member

@jakobbotsch jakobbotsch Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is not the full expression, or whether it is bounds checked, it is when a partially computed byref expression is reported in a register, and that partial computation does not point within the object

Right, but none of these patterns produce a byref outside the array object if the index has been bounds checked. They all look sound to me. If we have expression transformations that generally transform gc+(x+y) addition into (gc+x)+y, then those transformations are illegal, not the morphing that happens here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding an assertion to offset (e.g. offset > 0) to check that our assumption are correct? CI should catch it if it weren't true, right? Sorry if I somehow misunderstood.

Copy link
Contributor Author

@fuad1502 fuad1502 Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all,

I tested the following C# functions both on ARM64 and RISCV64 (after introducing form 3):

[MethodImpl(MethodImplOptions.NoInlining)]
private static unsafe int Fun(int[] a, long b)
{
      fixed (int* ptr = a) {
              return ptr[b - 10];
      }
}

[MethodImpl(MethodImplOptions.NoInlining)]
private static int FunChecked(int[] a, long b)
{
      return a[b - 10];
}

Here's the disassembly for ARM64:

;; Fun
G_M48370_IG05:  ;; offset=0x0024
            add     x0, x0, x1,  LSL #2
            ldr     w0, [x0, #-0x28]

;; FunChecked
G_M57461_IG02:  ;; offset=0x0008
            sub     x2, x1, #10
            ldr     w3, [x0, #0x08]
            cmp     x2, x3
            bhs     G_M57461_IG04
            add     x0, x0, #16
            lsl     x1, x1, #2
            sub     x1, x1, #40
            ldr     w0, [x0, x1]

And for RISCV64:

;; Fun
G_M48370_IG05:  ;; offset=0x0030
            sh2add         a0, a1, a0
            lw             a0, -40(a0)

;; FunChecked
G_M57461_IG02:  ;; offset=0x0010
            addi           a2, a1, -10
            lw             a3, 8(a0)
            zext.w         a3, a3
            bgeu           a2, a3, G_M57461_IG04
            slli           a1, a1, 2
            addi           a1, a1, -40
            add            a0, a0, a1
            lw             a0, 16(a0)

As additional info, I'll put the IR before rationalization here too. For ARM64:

// Fun
STMT00003 ( 0x018[E-] ... 0x024 )
N009 ( 11, 10) [000019] ---XG+-----                         *  RETURN    int    $207
N008 ( 10,  9) [000018] ---XG+-----                         \--*  IND       int    <l:$184, c:$185>
N007 (  9, 10) [000017] -----+-N---                            \--*  ADD       long   $304
N005 (  7,  7) [000016] -----+-N---                               +--*  ADD       long   $303
N001 (  3,  2) [000009] -----+-----                               |  +--*  LCL_VAR   long   V02 loc0         u:4 (last use) $340
N004 (  3,  4) [000013] -----+-----                               |  \--*  LSH       long   $302
N002 (  1,  1) [000010] -----+-----                               |     +--*  LCL_VAR   long   V01 arg1         u:1 (last use) $c0
N003 (  1,  2) [000012] -----+-----                               |     \--*  CNS_INT   long   2 $102
N006 (  1,  2) [000015] -----+-----                               \--*  CNS_INT   long   -40 $103

// FunChecked
STMT00000 ( 0x000[E-] ... 0x008 )
N020 ( 23, 30) [000008] ---XG+-----                         *  RETURN    int    $205
N019 ( 22, 29) [000023] ---XG+-----                         \--*  COMMA     int    <l:$241, c:$242>
N007 ( 11, 16) [000016] ---X-+-----                            +--*  BOUNDS_CHECK_Rng void   $205
N003 (  3,  4) [000004] -----+-----                            |  +--*  ADD       long   $180
N001 (  1,  1) [000001] -----+-----                            |  |  +--*  LCL_VAR   long   V01 arg1         u:1 $c0
N002 (  1,  2) [000003] -----+-----                            |  |  \--*  CNS_INT   long   -10 $140
N006 (  4,  5) [000015] ---X-+---U-                            |  \--*  CAST      long <- uint $182
N005 (  3,  3) [000014] ---X-+-----                            |     \--*  ARR_LENGTH int    $240
N004 (  1,  1) [000000] -----+-----                            |        \--*  LCL_VAR   ref    V00 arg0         u:1 $80
N018 ( 11, 13) [000024] n---G+-----                            \--*  IND       int    <l:$300, c:$1c1>
N017 (  8, 11) [000022] -----+-----                               \--*  ARR_ADDR  byref int[] $2c0
N016 (  8, 11) [000021] -----+-N---                                  \--*  ADD       byref  $281
N010 (  3,  4) [000020] -----+-----                                     +--*  ADD       byref  $280
N008 (  1,  1) [000009] -----+-----                                     |  +--*  LCL_VAR   ref    V00 arg0         u:1 (last use) $80
N009 (  1,  2) [000019] -----+-----                                     |  \--*  CNS_INT   long   16 $142
N015 (  5,  7) [000018] -----+-----                                     \--*  ADD       long   $184
N013 (  3,  4) [000011] -----+-----                                        +--*  LSH       long   $183
N011 (  1,  1) [000012] -----+-----                                        |  +--*  LCL_VAR   long   V01 arg1         u:1 (last use) $c0
N012 (  1,  2) [000013] -----+-----                                        |  \--*  CNS_INT   long   2 $143
N014 (  1,  2) [000017] -----+-N---                                        \--*  CNS_INT   long   -40 $144

For RISCV64:

// Fun
N009 ( 11, 12) [000019] ---XG+-----                         *  RETURN    int    $207
N008 ( 10, 11) [000018] ---XG+-----                         \--*  IND       int    <l:$184, c:$185>
N007 (  9, 14) [000017] -----+-N---                            \--*  ADD       long   $304
N005 (  7,  9) [000016] -----+-N---                               +--*  ADD       long   $303
N001 (  3,  2) [000009] -----+-----                               |  +--*  LCL_VAR   long   V02 loc0         u:4 (last use) $340
N004 (  3,  6) [000013] -----+-----                               |  \--*  LSH       long   $302
N002 (  1,  1) [000010] -----+-----                               |     +--*  LCL_VAR   long   V01 arg1         u:1 (last use) $c0
N003 (  1,  4) [000012] -----+-----                               |     \--*  CNS_INT   long   2 $102
N006 (  1,  4) [000015] -----+-----                               \--*  CNS_INT   long   -40 $103

// FunChecked
N020 ( 22, 36) [000008] ---XG+-----                         *  RETURN    int    $205
N019 ( 21, 35) [000023] ---XG+-----                         \--*  COMMA     int    <l:$241, c:$242>
N007 ( 11, 20) [000016] ---X-+-----                            +--*  BOUNDS_CHECK_Rng void   $205
N003 (  3,  6) [000004] -----+-----                            |  +--*  ADD       long   $180
N001 (  1,  1) [000001] -----+-----                            |  |  +--*  LCL_VAR   long   V01 arg1         u:1 $c0
N002 (  1,  4) [000003] -----+-----                            |  |  \--*  CNS_INT   long   -10 $140
N006 (  4,  7) [000015] ---X-+---U-                            |  \--*  CAST      long <- uint $182
N005 (  3,  3) [000014] ---X-+-----                            |     \--*  ARR_LENGTH int    $240
N004 (  1,  1) [000000] -----+-----                            |        \--*  LCL_VAR   ref    V00 arg0         u:1 $80
N018 ( 10, 15) [000024] n---G+-----                            \--*  IND       int    <l:$300, c:$1c1>
N017 (  7, 13) [000022] -----+-----                               \--*  ARR_ADDR  byref int[] $2c0
N016 (  7, 13) [000021] -----+-N---                                  \--*  ADD       byref  $281
N014 (  7, 13) [000020] -----+-N---                                     +--*  ADD       byref  $280
N008 (  1,  1) [000009] -----+-----                                     |  +--*  LCL_VAR   ref    V00 arg0         u:1 (last use) $80
N013 (  5, 11) [000018] -----+-----                                     |  \--*  ADD       long   $184
N011 (  3,  6) [000011] -----+-----                                     |     +--*  LSH       long   $183
N009 (  1,  1) [000012] -----+-----                                     |     |  +--*  LCL_VAR   long   V01 arg1        
 u:1 (last use) $c0
N010 (  1,  4) [000013] -----+-----                                     |     |  \--*  CNS_INT   long   2 $142
N012 (  1,  4) [000017] -----+-N---                                     |     \--*  CNS_INT   long   -40 $143
N015 (  1,  4) [000019] -----+-----                                     \--*  CNS_INT   long   16 $144

So, in conclusion, offset can indeed be negative when lowering to LEA, but not when morphing (I am mistaken in my previous comment), since after fgMorphIndexAddr, the expression will be morphed again (EDIT: see additional IR dump I put at the end). Does that still mean we're creating a BYREF outside of the object? From the final pre-rationalization IR, it looks like byrefs are only marked for the final expression.

I don't understand why form 2 does not cause problems. It's possible that our expression optimizations are not sufficiently capable to convert:

a + 8 + (i - 10) * 4
=>
a + 8 + i * 4 - 40
=>
a - 32 + i * 4
=>
a + i * 4 - 32

It seems that it only optimizes it till here:

// ARM64
(a + 16) + ((i - 10) * 4)
=>
(a + 16) + (i * 4 - 40)

// RISCV64
16 + (a + ((i - 10) * 4))
=>
16 + (a + (i * 4 - 40))

If that's the case, it is still problematic? (I think this looks safe?)

Any suggestion on what we should do here? Maybe we should mark the index (in fgMorphIndexAddr) so that it won't be morphed after fgMorphIndexAddr? Or maybe we should open up a separate issue for this?

Any input on how we should proceed regarding this finding would be very much appreciated! 😀 In the meantime I'll try to understand this issue further 🙏

EDIT:

More IR dumps (for ARM64 FunChecked only):

fgMorphIndexAddr (before remorph):
               [000023] ---X-O-----                         *  COMMA     byref 
               [000016] ---X-O-----                         +--*  BOUNDS_CHECK_Rng void  
               [000005] ---X-------                         |  +--*  CAST_ovfl long <- long
               [000004] -----------                         |  |  \--*  SUB       long  
               [000001] -----------                         |  |     +--*  LCL_VAR   long   V01 arg1          (last use)
               [000003] -----------                         |  |     \--*  CNS_INT   long   10
               [000015] ---X-----U-                         |  \--*  CAST      long <- uint
               [000014] ---X-------                         |     \--*  ARR_LENGTH int   
               [000000] -----------                         |        \--*  LCL_VAR   ref    V00 arg0          (last use)
               [000022] ---X-O-----                         \--*  ARR_ADDR  byref int[]
               [000021] ---X-------                            \--*  ADD       byref 
               [000020] -----------                               +--*  ADD       byref 
               [000009] -----------                               |  +--*  LCL_VAR   ref    V00 arg0          (last use)
               [000019] -----------                               |  \--*  CNS_INT   long   16
               [000018] ---X-------                               \--*  MUL       long  
               [000010] ---X-------                                  +--*  CAST_ovfl long <- long
               [000011] -----------                                  |  \--*  SUB       long  
               [000012] -----------                                  |     +--*  LCL_VAR   long   V01 arg1          (last use)
               [000013] -----------                                  |     \--*  CNS_INT   long   10
               [000017] -------N---                                  \--*  CNS_INT   long   4
GenTreeNode creates assertion:
               [000014] ---X-+-----                         *  ARR_LENGTH int   
In BB01 New Local Constant Assertion: V00 != null, index = #01
BB01 requires throw helper block for SCK_RNGCHK_FAIL, sharing ACD0 (data 0x00000000)
fgMorphIndexAddr (after remorph):
               [000023] ---X-+-----                         *  COMMA     byref 
               [000016] ---X-+-----                         +--*  BOUNDS_CHECK_Rng void  
               [000004] -----+-----                         |  +--*  ADD       long  
               [000001] -----+-----                         |  |  +--*  LCL_VAR   long   V01 arg1          (last use)
               [000003] -----+-----                         |  |  \--*  CNS_INT   long   -10
               [000015] ---X-+---U-                         |  \--*  CAST      long <- uint
               [000014] ---X-+-----                         |     \--*  ARR_LENGTH int   
               [000000] -----+-----                         |        \--*  LCL_VAR   ref    V00 arg0          (last use)
               [000022] -----+-----                         \--*  ARR_ADDR  byref int[]
               [000021] -----+-----                            \--*  ADD       byref 
               [000020] -----+-----                               +--*  ADD       byref 
               [000009] -----+-----                               |  +--*  LCL_VAR   ref    V00 arg0          (last use)
               [000019] -----+-----                               |  \--*  CNS_INT   long   16
               [000018] -----+-----                               \--*  ADD       long  
               [000011] -----+-----                                  +--*  LSH       long  
               [000012] -----+-----                                  |  +--*  LCL_VAR   long   V01 arg1          (last use)
               [000013] -----+-----                                  |  \--*  CNS_INT   long   2
               [000017] -----+-N---                                  \--*  CNS_INT   long   -40

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow response, and thanks for the continued investigation. After some investigation and internal conversation it seems like the problem that led to the avoidance of form (3) may no longer be an issue, such that your change here is ok. As an experiment, I implemented form (3) for all platforms and ran stress, including GCStress, and there were no failures (besides known failures). (I kicked it off again just to see it succeed more cleanly with our currently (hopefully) cleaner CI test runs: #114388.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, no problem, and thanks for looking into it! 👍

@fuad1502 fuad1502 changed the title [RISC-V] Utilize sh(x)add(.uw) instructions [RISC-V] Utilize Zba extension instructions Apr 9, 2025
@risc-vv
Copy link

risc-vv commented Apr 9, 2025

8c5ac35 is being scheduled for building and testing

GIT: 8c5ac35dfa3b8a32adea19593129fc0675872c50
REPO: dotnet/runtime
BRANCH: main

@fuad1502
Copy link
Contributor Author

fuad1502 commented Apr 9, 2025

superpmi asmdiffs result for commit 0fd562e :

Diffs are based on 12,626 contexts (10,243 MinOpts, 2,383 FullOpts).

Overall (-31,764 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 6,935,716 -31,764 -0.74%
MinOpts (-9,132 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 5,385,844 -9,132 -0.36%
FullOpts (-22,632 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 1,549,872 -22,632 -1.10%
Example diffs
test.mch
-48 (-15.19%) : 323.dasm - NumericSortJagged:NumSift(int[],int,int) (Tier1)
@@ -15,10 +15,10 @@
 ;  V04 loc1         [V04,T11] (  2, 16.07)     int  ->   a6        
 ;# V05 OutArgs      [V05    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V06 tmp1         [V06,T04] (  2, 32.14)     int  ->   a5         "Strict ordering of exceptions for Array store"
-;  V07 cse0         [V07,T06] (  3, 24.57)     int  ->   a6         "CSE #07: aggressive"
-;  V08 cse1         [V08,T07] (  3, 24.57)     int  ->   a5         "CSE #12: aggressive"
-;  V09 cse2         [V09,T08] (  3, 24.57)    long  ->   a4         "CSE #05: aggressive"
-;  V10 cse3         [V10,T09] (  3, 24.57)    long  ->   a1         "CSE #10: aggressive"
+;  V07 cse0         [V07,T06] (  3, 24.57)     int  ->   a6         "CSE #06: aggressive"
+;  V08 cse1         [V08,T07] (  3, 24.57)     int  ->   a5         "CSE #10: aggressive"
+;  V09 cse2         [V09,T08] (  3, 24.57)    long  ->   a4         "CSE #04: aggressive"
+;  V10 cse3         [V10,T09] (  3, 24.57)    long  ->   a1         "CSE #08: aggressive"
 ;  V11 cse4         [V11,T02] (  6, 49.57)     int  ->   a4         multi-def "CSE #01: aggressive"
 ;  V12 cse5         [V12,T05] (  4, 29.17)     int  ->   a6         "CSE #02: aggressive"
 ;
@@ -57,37 +57,35 @@ G_M30577_IG06:        ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
             sext.w         a5, a4
             sext.w         a6, a1
             bgeu           a6, a5, G_M30577_IG11
-            slli           a1, a1, 32
-            srli           a1, a1, 32
-            slli           a1, a1, 2
-            addi           a1, a1, 0xD1FFAB1E
-            add            t6, a0, a1
-            ; byrRegs +[t6]
-            lw             a5, 0xD1FFAB1E(t6)
+            slli.uw        a1, a1, 2
+            add            a5, a0, a1
+            ; byrRegs +[a5]
+            lw             a5, 0xD1FFAB1E(a5)
+            ; byrRegs -[a5]
             sext.w         a6, a4
             sext.w         a7, a3
             bgeu           a7, a6, G_M30577_IG11
-            slli           a4, a3, 32
-            srli           a4, a4, 32
-            slli           a4, a4, 2
-            addi           a4, a4, 0xD1FFAB1E
-            add            t6, a0, a4
-            lw             a6, 0xD1FFAB1E(t6)
+            slli.uw        a4, a3, 2
+            add            a6, a0, a4
+            ; byrRegs +[a6]
+            lw             a6, 0xD1FFAB1E(a6)
+            ; byrRegs -[a6]
             slliw          ra, a5, 0
             slliw          t6, a6, 0
-            ; byrRegs -[t6]
             bge            ra, t6, G_M30577_IG04
-						;; size=88 bbWeight=8.27 PerfScore 202.53
+						;; size=64 bbWeight=8.27 PerfScore 177.73
 G_M30577_IG07:        ; bbWeight=8.04, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
-            add            t6, a0, a4
-            ; byrRegs +[t6]
-            sw             a5, 0xD1FFAB1E(t6)
-            add            t6, a0, a1
-            sw             a6, 0xD1FFAB1E(t6)
+            add            a4, a0, a4
+            ; byrRegs +[a4]
+            sw             a5, 0xD1FFAB1E(a4)
+            add            a1, a0, a1
+            ; byrRegs +[a1]
+            sw             a6, 0xD1FFAB1E(a1)
             sext.w         a1, a3
+            ; byrRegs -[a1]
 						;; size=20 bbWeight=8.04 PerfScore 76.34
 G_M30577_IG08:        ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
-            ; byrRegs -[t6]
+            ; byrRegs -[a4]
             slliw          a3, a1, 1
             slliw          ra, a3, 0
             slliw          t6, a2, 0
@@ -104,31 +102,25 @@ G_M30577_IG10:        ; bbWeight=8.26, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
             sext.w         a5, a4
             sext.w         a6, a3
             bgeu           a6, a5, G_M30577_IG11
-            slli           a5, a3, 32
-            srli           a5, a5, 32
-            slli           a5, a5, 2
-            add            a6, a0, a5
-            ; byrRegs +[a6]
-            lw             a5, 0xD1FFAB1E(a6)
+            sh2add.uw      a5, a3, a0
+            ; byrRegs +[a5]
+            lw             a5, 0xD1FFAB1E(a5)
+            ; byrRegs -[a5]
             addiw          a6, a3, 0xD1FFAB1E
-            ; byrRegs -[a6]
             sext.w         t0, a4
             sext.w         a7, a6
             bgeu           a7, t0, G_M30577_IG11
-            slli           a4, a6, 32
-            srli           a4, a4, 32
-            slli           a4, a4, 2
-            add            a7, a0, a4
-            ; byrRegs +[a7]
-            lw             a4, 0xD1FFAB1E(a7)
+            sh2add.uw      a4, a6, a0
+            ; byrRegs +[a4]
+            lw             a4, 0xD1FFAB1E(a4)
+            ; byrRegs -[a4]
             slliw          ra, a5, 0
             slliw          t6, a4, 0
             bge            ra, t6, G_M30577_IG06
             j              G_M30577_IG05
-						;; size=88 bbWeight=8.26 PerfScore 210.59
+						;; size=64 bbWeight=8.26 PerfScore 185.81
 G_M30577_IG11:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcrRegs -[a0]
-            ; byrRegs -[a7]
             lui            a0, 0xD1FFAB1E
             addiw          a0, a0, 0xD1FFAB1E
             slli           a0, a0, 11
@@ -139,7 +131,7 @@ G_M30577_IG11:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ebreak
 						;; size=28 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 316, prolog size 16, PerfScore 596.28, instruction count 79, allocated bytes for code 316 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
+; Total bytes of code 268, prolog size 16, PerfScore 546.71, instruction count 67, allocated bytes for code 268 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -150,7 +142,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 79 (0x0004f) Actual length = 316 (0x00013c)
+  Function Length   : 67 (0x00043) Actual length = 268 (0x00010c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-12 (-12.50%) : 3104.dasm - System.Reflection.Internal.MemoryBlock:CheckBounds(int,int):this (Tier1)
@@ -24,16 +24,13 @@ G_M49493_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M49493_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byref
             ; byrRegs +[a0]
-            slli           a1, a1, 32
-            srli           a1, a1, 32
-            slli           a2, a2, 32
-            srli           a2, a2, 32
-            add            a1, a1, a2
+            zext.w         a2, a2
+            add.uw         a1, a1, a2
             lw             a0, 0xD1FFAB1E(a0)
             ; byrRegs -[a0]
             slliw          a0, a0, 0
             bltu           a0, a1, G_M49493_IG04
-						;; size=32 bbWeight=1 PerfScore 8.50
+						;; size=20 bbWeight=1 PerfScore 7.00
 G_M49493_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
@@ -50,7 +47,7 @@ G_M49493_IG04:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             ebreak
 						;; size=32 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 96, prolog size 16, PerfScore 25.00, instruction count 24, allocated bytes for code 96 (MethodHash=09153eaa) for method System.Reflection.Internal.MemoryBlock:CheckBounds(int,int):this (Tier1)
+; Total bytes of code 84, prolog size 16, PerfScore 23.50, instruction count 21, allocated bytes for code 84 (MethodHash=09153eaa) for method System.Reflection.Internal.MemoryBlock:CheckBounds(int,int):this (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -61,7 +58,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 24 (0x00018) Actual length = 96 (0x000060)
+  Function Length   : 21 (0x00015) Actual length = 84 (0x000054)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-12 (-12.00%) : 3142.dasm - System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
@@ -33,13 +33,13 @@ G_M52328_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byre
             sext.w         a3, a1
             sext.w         a4, a2
             bgeu           a4, a3, G_M52328_IG04
-            slli           a1, a2, 32
-            srli           a1, a1, 32
-            slli           a1, a1, 3
-            add            a2, a0, a1
-            ; byrRegs +[a2]
-            ld             a0, 0xD1FFAB1E(a2)
-						;; size=40 bbWeight=1 PerfScore 12.50
+            sh3add.uw      a0, a2, a0
+            ; gcrRegs -[a0]
+            ; byrRegs +[a0]
+            ld             a0, 0xD1FFAB1E(a0)
+            ; gcrRegs +[a0]
+            ; byrRegs -[a0]
+						;; size=28 bbWeight=1 PerfScore 11.00
 G_M52328_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
@@ -47,7 +47,6 @@ G_M52328_IG03:        ; bbWeight=1, epilog, nogc, extend
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 G_M52328_IG04:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
             ; gcrRegs -[a0]
-            ; byrRegs -[a2]
             lui            a0, 0xD1FFAB1E
             addiw          a0, a0, 0xD1FFAB1E
             slli           a0, a0, 11
@@ -57,7 +56,7 @@ G_M52328_IG04:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             ebreak
 						;; size=28 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 100, prolog size 16, PerfScore 29.00, instruction count 25, allocated bytes for code 100 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
+; Total bytes of code 88, prolog size 16, PerfScore 27.50, instruction count 22, allocated bytes for code 88 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
 ; ============================================================
 
 Unwind Info:
@@ -68,7 +67,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 25 (0x00019) Actual length = 100 (0x000064)
+  Function Length   : 22 (0x00016) Actual length = 88 (0x000058)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 10416.dasm - Microsoft.CodeAnalysis.PEModule:TryGetUnmanagedCallersOnlyAttribute(System.Reflection.Metadata.EntityHandle,Microsoft.CodeAnalysis.IAttributeNamedArgumentDecoder,System.Func`4[System.String,Microsoft.CodeAnalysis.TypedConstant,ubyte,System.ValueTuple`2[ubyte,System.Collections.Immutable.ImmutableHashSet`1[Microsoft.CodeAnalysis.Symbols.INamedTypeSymbolInternal]]]):Microsoft.CodeAnalysis.UnmanagedCallersOnlyAttributeData:this (Instrumented Tier0)

No diffs found?

+0 (0.00%) : 8912.dasm - Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator:.ctor(Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol,Microsoft.CodeAnalysis.CSharp.BoundStatement,Microsoft.CodeAnalysis.CodeGen.ILBuilder,Microsoft.CodeAnalysis.CSharp.Emit.PEModuleBuilder,Microsoft.CodeAnalysis.CSharp.BindingDiagnosticBag,int,ubyte):this (Tier0)

No diffs found?

+0 (0.00%) : 8752.dasm - Microsoft.CodeAnalysis.CSharp.LocalRewriter:Rewrite(Microsoft.CodeAnalysis.CSharp.CSharpCompilation,Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol,int,Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol,Microsoft.CodeAnalysis.CSharp.BoundStatement,Microsoft.CodeAnalysis.CSharp.TypeCompilationState,Microsoft.CodeAnalysis.CSharp.SynthesizedSubmissionFields,ubyte,Microsoft.CodeAnalysis.Emit.MethodInstrumentation,Microsoft.CodeAnalysis.CodeGen.DebugDocumentProvider,Microsoft.CodeAnalysis.CSharp.BindingDiagnosticBag,byref,byref,byref,byref):Microsoft.CodeAnalysis.CSharp.BoundStatement (Tier0)

No diffs found?

Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
test.mch 1,738 1,455 0 283 -31,764 +0

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
test.mch 1,738 1,396 2 340 -0.92% +0.47% -0.4051%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
test.mch 12,626 10,243 2,383 0 (0.00%) 0 (0.00%)

jit-analyze output

@risc-vv
Copy link

risc-vv commented Apr 9, 2025

0fd562e is being scheduled for building and testing

GIT: 0fd562e2dee6e00c5088372ef3ee4eccb4c78bba
REPO: dotnet/runtime
BRANCH: main

Release-build FAILED

buildinfo.json
Compilation failed during core build

@fuad1502
Copy link
Contributor Author

fuad1502 commented Apr 9, 2025

Should I rebase my commits or merge main to resolve the conflicts?

@am11
Copy link
Member

am11 commented Apr 9, 2025

Either one is fine. Maintainers usually squash and merge PRs so what lands in main branch history is a single commit; good for long-term as once it's a history, W.I.P type of commit messages are undesired. The benefit of rebasing as single commit yourself is you can avoid ending up this type of verbose and useless commit description 2129f3f when the box, which shows up for maintainer during squash+merge, isn't cleared. We have PR link for related conversation so this text is redundant, void of context and it is basically a bad default choice from GitHub; better choice would've been the PR description.

@fuad1502 fuad1502 force-pushed the riscv-jit-opt/utilize-shxadd branch from 0fd562e to 8d42b94 Compare April 9, 2025 12:10
@risc-vv
Copy link

risc-vv commented Apr 9, 2025

RISC-V Release-CLR-VF2: 9532 / 9552 (99.79%)
=======================
      passed: 9532
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9658
 TOTAL tests: 9658
   REAL time: 2h 15min 28s 373ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 8d42b945f71a92488ff672b88fd8b01c1de49e61
CI: 09909bfe3d23ad26455327811013bcbb48915255
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9532 / 9552 (99.79%)
=======================
      passed: 9532
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9658
 TOTAL tests: 9658
   REAL time: 2h 47min 56s 462ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: 8d42b945f71a92488ff672b88fd8b01c1de49e61
CI: 09909bfe3d23ad26455327811013bcbb48915255
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@BruceForstall
Copy link
Contributor

@fuad1502 Please resolve the merge conflicts.

Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
@risc-vv
Copy link

risc-vv commented Apr 18, 2025

RISC-V Release-CLR-VF2: 9701 / 9750 (99.50%)

=======================
      passed: 9701
      failed: 29
     skipped: 70
      killed: 20
------------------------
  TOTAL libs: 9820
 TOTAL tests: 9820
   REAL time: 2h 0min 16s 814ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 876e58b6c8c6f20221b64fdadd0beb0bcd93c5bf
CI: 2d916d20de463f9bba05ae71b3d1f37d439a8cb1
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@risc-vv
Copy link

risc-vv commented Apr 18, 2025

1522ba2 is being scheduled for building and testing

GIT: 1522ba20c131fa4a6ab799f9e8b39d641f7c33b5
REPO: dotnet/runtime
BRANCH: main

@BruceForstall
Copy link
Contributor

/ba-g unrelated tests

@BruceForstall BruceForstall merged commit 7e297e6 into dotnet:main Apr 18, 2025
107 of 113 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators May 19, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants