-
Notifications
You must be signed in to change notification settings - Fork 5k
[RISC-V] Utilize Zba
extension instructions
#113999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISC-V] Utilize Zba
extension instructions
#113999
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
RISC-V Release-CLR-VF2: 9524 / 9544 (99.79%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-CLR-QEMU: 3304 / 9544 (34.62%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-QEMU: 0 / 258 (0.00%)
Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-VF2: 631024 / 667795 (94.49%)
Build information and commandsGIT: |
Regressions are due to grouping superpmi asmdiffs result for commit f8907d2 : Diffs are based on 12,624 contexts (10,243 MinOpts, 2,381 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 2 (0.02%) Overall (-20,492 bytes)
MinOpts (-7,496 bytes)
FullOpts (-12,996 bytes)
Example diffstest.mch-12 (-12.00%) : 3142.dasm - System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)@@ -33,13 +33,11 @@ G_M52328_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byre
sext.w a3, a1
sext.w a4, a2
bgeu a4, a3, G_M52328_IG04
- slli a1, a2, 32
- srli a1, a1, 32
- slli a1, a1, 3
- add a2, a0, a1
- ; byrRegs +[a2]
- ld a0, 0xD1FFAB1E(a2)
- ;; size=40 bbWeight=1 PerfScore 12.50
+ sh3add.uw a0, a2, a0
+ ; gcrRegs -[a0]
+ ld a0, 0xD1FFAB1E(a0)
+ ; gcrRegs +[a0]
+ ;; size=28 bbWeight=1 PerfScore 11.00
G_M52328_IG03: ; bbWeight=1, epilog, nogc, extend
ld ra, 8(sp)
ld fp, 0(sp)
@@ -47,7 +45,6 @@ G_M52328_IG03: ; bbWeight=1, epilog, nogc, extend
ret ;; size=16 bbWeight=1 PerfScore 7.50
G_M52328_IG04: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
; gcrRegs -[a0]
- ; byrRegs -[a2]
lui a0, 0xD1FFAB1E
addiw a0, a0, 0xD1FFAB1E
slli a0, a0, 11
@@ -57,7 +54,7 @@ G_M52328_IG04: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
ebreak
;; size=28 bbWeight=0 PerfScore 0.00
-; Total bytes of code 100, prolog size 16, PerfScore 29.00, instruction count 25, allocated bytes for code 100 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
+; Total bytes of code 88, prolog size 16, PerfScore 27.50, instruction count 22, allocated bytes for code 88 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
; ============================================================
Unwind Info:
@@ -68,7 +65,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 25 (0x00019) Actual length = 100 (0x000064)
+ Function Length : 22 (0x00016) Actual length = 88 (0x000058)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -12 (-11.54%) : 3944.dasm - System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)@@ -34,14 +34,10 @@ G_M46720_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byre
sext.w a3, a2
sext.w a4, a0
bgeu a4, a3, G_M46720_IG04
- slli a0, a0, 32
- srli a0, a0, 32
- slli a0, a0, 3
- add a2, a1, a0
- ; byrRegs +[a2]
- ld a0, 0xD1FFAB1E(a2)
+ sh3add.uw a0, a0, a1
+ ld a0, 0xD1FFAB1E(a0)
; gcrRegs +[a0]
- ;; size=44 bbWeight=1 PerfScore 14.50
+ ;; size=32 bbWeight=1 PerfScore 13.00
G_M46720_IG03: ; bbWeight=1, epilog, nogc, extend
ld ra, 8(sp)
ld fp, 0(sp)
@@ -49,7 +45,6 @@ G_M46720_IG03: ; bbWeight=1, epilog, nogc, extend
ret ;; size=16 bbWeight=1 PerfScore 7.50
G_M46720_IG04: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
; gcrRegs -[a0-a1]
- ; byrRegs -[a2]
lui a0, 0xD1FFAB1E
addiw a0, a0, 0xD1FFAB1E
slli a0, a0, 11
@@ -59,7 +54,7 @@ G_M46720_IG04: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
ebreak
;; size=28 bbWeight=0 PerfScore 0.00
-; Total bytes of code 104, prolog size 16, PerfScore 31.00, instruction count 26, allocated bytes for code 104 (MethodHash=ea0e497f) for method System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)
+; Total bytes of code 92, prolog size 16, PerfScore 29.50, instruction count 23, allocated bytes for code 92 (MethodHash=ea0e497f) for method System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)
; ============================================================
Unwind Info:
@@ -70,7 +65,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 26 (0x0001a) Actual length = 104 (0x000068)
+ Function Length : 23 (0x00017) Actual length = 92 (0x00005c)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -32 (-10.13%) : 323.dasm - NumericSortJagged:NumSift(int[],int,int) (Tier1)@@ -15,10 +15,10 @@
; V04 loc1 [V04,T11] ( 2, 16.07) int -> a6
;# V05 OutArgs [V05 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V06 tmp1 [V06,T04] ( 2, 32.14) int -> a5 "Strict ordering of exceptions for Array store"
-; V07 cse0 [V07,T06] ( 3, 24.57) int -> a6 "CSE #07: aggressive"
-; V08 cse1 [V08,T07] ( 3, 24.57) int -> a5 "CSE #12: aggressive"
-; V09 cse2 [V09,T08] ( 3, 24.57) long -> a4 "CSE #05: aggressive"
-; V10 cse3 [V10,T09] ( 3, 24.57) long -> a1 "CSE #10: aggressive"
+; V07 cse0 [V07,T06] ( 3, 24.57) int -> a6 "CSE #06: aggressive"
+; V08 cse1 [V08,T07] ( 3, 24.57) int -> a5 "CSE #10: aggressive"
+; V09 cse2 [V09,T08] ( 3, 24.57) long -> a4 "CSE #04: aggressive"
+; V10 cse3 [V10,T09] ( 3, 24.57) long -> a1 "CSE #08: aggressive"
; V11 cse4 [V11,T02] ( 6, 49.57) int -> a4 multi-def "CSE #01: aggressive"
; V12 cse5 [V12,T05] ( 4, 29.17) int -> a6 "CSE #02: aggressive"
;
@@ -60,34 +60,36 @@ G_M30577_IG06: ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
slli a1, a1, 32
srli a1, a1, 32
slli a1, a1, 2
- addi a1, a1, 0xD1FFAB1E
- add t6, a0, a1
- ; byrRegs +[t6]
- lw a5, 0xD1FFAB1E(t6)
+ add a5, a0, a1
+ ; byrRegs +[a5]
+ lw a5, 0xD1FFAB1E(a5)
+ ; byrRegs -[a5]
sext.w a6, a4
sext.w a7, a3
bgeu a7, a6, G_M30577_IG11
slli a4, a3, 32
srli a4, a4, 32
slli a4, a4, 2
- addi a4, a4, 0xD1FFAB1E
- add t6, a0, a4
- lw a6, 0xD1FFAB1E(t6)
+ add a6, a0, a4
+ ; byrRegs +[a6]
+ lw a6, 0xD1FFAB1E(a6)
+ ; byrRegs -[a6]
slliw ra, a5, 0
slliw t6, a6, 0
- ; byrRegs -[t6]
bge ra, t6, G_M30577_IG04
- ;; size=88 bbWeight=8.27 PerfScore 202.53
+ ;; size=80 bbWeight=8.27 PerfScore 194.27
G_M30577_IG07: ; bbWeight=8.04, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
- add t6, a0, a4
- ; byrRegs +[t6]
- sw a5, 0xD1FFAB1E(t6)
- add t6, a0, a1
- sw a6, 0xD1FFAB1E(t6)
+ add a4, a0, a4
+ ; byrRegs +[a4]
+ sw a5, 0xD1FFAB1E(a4)
+ add a1, a0, a1
+ ; byrRegs +[a1]
+ sw a6, 0xD1FFAB1E(a1)
sext.w a1, a3
+ ; byrRegs -[a1]
;; size=20 bbWeight=8.04 PerfScore 76.34
G_M30577_IG08: ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
- ; byrRegs -[t6]
+ ; byrRegs -[a4]
slliw a3, a1, 1
slliw ra, a3, 0
slliw t6, a2, 0
@@ -104,31 +106,21 @@ G_M30577_IG10: ; bbWeight=8.26, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
sext.w a5, a4
sext.w a6, a3
bgeu a6, a5, G_M30577_IG11
- slli a5, a3, 32
- srli a5, a5, 32
- slli a5, a5, 2
- add a6, a0, a5
- ; byrRegs +[a6]
- lw a5, 0xD1FFAB1E(a6)
+ sh2add.uw a5, a3, a0
+ lw a5, 0xD1FFAB1E(a5)
addiw a6, a3, 0xD1FFAB1E
- ; byrRegs -[a6]
sext.w t0, a4
sext.w a7, a6
bgeu a7, t0, G_M30577_IG11
- slli a4, a6, 32
- srli a4, a4, 32
- slli a4, a4, 2
- add a7, a0, a4
- ; byrRegs +[a7]
- lw a4, 0xD1FFAB1E(a7)
+ sh2add.uw a4, a6, a0
+ lw a4, 0xD1FFAB1E(a4)
slliw ra, a5, 0
slliw t6, a4, 0
bge ra, t6, G_M30577_IG06
j G_M30577_IG05
- ;; size=88 bbWeight=8.26 PerfScore 210.59
+ ;; size=64 bbWeight=8.26 PerfScore 185.81
G_M30577_IG11: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcrRegs -[a0]
- ; byrRegs -[a7]
lui a0, 0xD1FFAB1E
addiw a0, a0, 0xD1FFAB1E
slli a0, a0, 11
@@ -139,7 +131,7 @@ G_M30577_IG11: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
ebreak
;; size=28 bbWeight=0 PerfScore 0.00
-; Total bytes of code 316, prolog size 16, PerfScore 596.28, instruction count 79, allocated bytes for code 316 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
+; Total bytes of code 284, prolog size 16, PerfScore 563.24, instruction count 71, allocated bytes for code 284 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
; ============================================================
Unwind Info:
@@ -150,7 +142,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 79 (0x0004f) Actual length = 316 (0x00013c)
+ Function Length : 71 (0x00047) Actual length = 284 (0x00011c)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +8 (+0.49%) : 631.dasm - EMFloatClass:Run():double:this (Tier1)@@ -30,10 +30,10 @@
; V18 tmp10 [V18,T06] ( 2, 668.27) ref -> t4 class-hnd exact "NewArr temp" <<unknown class>>
; V19 tmp11 [V19,T07] ( 2, 668.27) ref -> t4 class-hnd exact "NewArr temp" <<unknown class>>
; V20 tmp12 [V20,T20] ( 2, 0 ) ref -> a1 single-def "argument with side effect"
-; V21 cse0 [V21,T04] ( 4, 668.27) long -> s8 "CSE #05: aggressive"
-; V22 cse1 [V22,T15] ( 4, 4 ) long -> s3 "CSE #02: aggressive"
+; V21 cse0 [V21,T15] ( 4, 4 ) long -> s3 "CSE #02: aggressive"
+; V22 cse1 [V22,T04] ( 4, 668.27) long -> s8 "CSE #04: aggressive"
; V23 cse2 [V23,T14] ( 5, 171.07) int -> s2 "CSE #01: aggressive"
-; V24 cse3 [V24,T18] ( 2, 65.29) double -> fs7 hoist "CSE #06: aggressive"
+; V24 cse3 [V24,T18] ( 2, 65.29) double -> fs7 hoist "CSE #05: aggressive"
; V25 rat0 [V25,T11] ( 3, 385.71) long -> a0 "ReplaceWithLclVar is creating a new local variable"
;
; Lcl frame size = 0
@@ -176,10 +176,10 @@ G_M34029_IG03: ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
bgeu t4, t0, G_M34029_IG16
slli t3, s6, 32
srli t3, t3, 32
- slli t3, t3, 3
- addi s8, t3, 0xD1FFAB1E
+ slli s8, t3, 3
add t3, s4, s8
; byrRegs +[t3]
+ addi t3, t3, 0xD1FFAB1E
mv t4, s7,
; gcrRegs +[t4]
lui t2, 0xD1FFAB1E
@@ -249,6 +249,7 @@ G_M34029_IG03: ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
bgeu t4, t0, G_M34029_IG16
add t3, s5, s8
; byrRegs +[t3]
+ addi t3, t3, 0xD1FFAB1E
mv t4, s7,
; gcrRegs +[t4]
lui t2, 0xD1FFAB1E
@@ -318,6 +319,7 @@ G_M34029_IG03: ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
bgeu t4, t0, G_M34029_IG16
add t3, s3, s8
; byrRegs +[t3]
+ addi t3, t3, 0xD1FFAB1E
mv t4, s7,
; gcrRegs +[t4]
lui t2, 0xD1FFAB1E
@@ -334,7 +336,7 @@ G_M34029_IG03: ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
slliw ra, s6, 0
slliw t6, s2, 0
blt ra, t6, G_M34029_IG03
- ;; size=692 bbWeight=167.07 PerfScore 30990.97
+ ;; size=700 bbWeight=167.07 PerfScore 31158.04
G_M34029_IG04: ; bbWeight=1.00, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRegs=0000 {}, byref
sext.w a3, s2
mv a0, s4,
@@ -567,7 +569,7 @@ RWD00 dq 3FF0000000000000h ; 1
RWD08 dq 408F400000000000h ; 1000
-; Total bytes of code 1628, prolog size 60, PerfScore 33424.09, instruction count 355, allocated bytes for code 1628 (MethodHash=bd287b12) for method EMFloatClass:Run():double:this (Tier1)
+; Total bytes of code 1636, prolog size 60, PerfScore 33591.16, instruction count 357, allocated bytes for code 1636 (MethodHash=bd287b12) for method EMFloatClass:Run():double:this (Tier1)
; ============================================================
Unwind Info:
@@ -578,7 +580,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 407 (0x00197) Actual length = 1628 (0x00065c)
+ Function Length : 409 (0x00199) Actual length = 1636 (0x000664)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +0 (0.00%) : 12432.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SymbolExtensions:GetTypeOrReturnType(Microsoft.CodeAnalysis.CSharp.Symbol,byref,byref,byref) (Tier0)No diffs found? +0 (0.00%) : 11328.dasm - Microsoft.Cci.MetadataWriter:SerializePrimitiveType(System.Reflection.Metadata.Ecma335.CustomAttributeElementTypeEncoder,int) (Tier0)No diffs found? DetailsSize improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze output |
e46b97e is being scheduled for building and testingGIT: |
RISC-V Release-CLR-VF2: 9524 / 9544 (99.79%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-CLR-QEMU: 9524 / 9544 (99.79%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-QEMU: 625306 / 650316 (96.15%)
Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-VF2: 633660 / 669510 (94.65%)
Build information and commandsGIT: |
superpmi asmdiffs result for commit 0273889 : Diffs are based on 12,626 contexts (10,243 MinOpts, 2,383 FullOpts). Overall (-17,260 bytes)
MinOpts (-7,028 bytes)
FullOpts (-10,232 bytes)
Example diffstest.mch-12 (-12.00%) : 3142.dasm - System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)@@ -33,13 +33,11 @@ G_M52328_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byre
sext.w a3, a1
sext.w a4, a2
bgeu a4, a3, G_M52328_IG04
- slli a1, a2, 32
- srli a1, a1, 32
- slli a1, a1, 3
- add a2, a0, a1
- ; byrRegs +[a2]
- ld a0, 0xD1FFAB1E(a2)
- ;; size=40 bbWeight=1 PerfScore 12.50
+ sh3add.uw a0, a2, a0
+ ; gcrRegs -[a0]
+ ld a0, 0xD1FFAB1E(a0)
+ ; gcrRegs +[a0]
+ ;; size=28 bbWeight=1 PerfScore 11.00
G_M52328_IG03: ; bbWeight=1, epilog, nogc, extend
ld ra, 8(sp)
ld fp, 0(sp)
@@ -47,7 +45,6 @@ G_M52328_IG03: ; bbWeight=1, epilog, nogc, extend
ret ;; size=16 bbWeight=1 PerfScore 7.50
G_M52328_IG04: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
; gcrRegs -[a0]
- ; byrRegs -[a2]
lui a0, 0xD1FFAB1E
addiw a0, a0, 0xD1FFAB1E
slli a0, a0, 11
@@ -57,7 +54,7 @@ G_M52328_IG04: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
ebreak
;; size=28 bbWeight=0 PerfScore 0.00
-; Total bytes of code 100, prolog size 16, PerfScore 29.00, instruction count 25, allocated bytes for code 100 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
+; Total bytes of code 88, prolog size 16, PerfScore 27.50, instruction count 22, allocated bytes for code 88 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
; ============================================================
Unwind Info:
@@ -68,7 +65,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 25 (0x00019) Actual length = 100 (0x000064)
+ Function Length : 22 (0x00016) Actual length = 88 (0x000058)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -12 (-11.54%) : 3944.dasm - System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)@@ -34,14 +34,10 @@ G_M46720_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byre
sext.w a3, a2
sext.w a4, a0
bgeu a4, a3, G_M46720_IG04
- slli a0, a0, 32
- srli a0, a0, 32
- slli a0, a0, 3
- add a2, a1, a0
- ; byrRegs +[a2]
- ld a0, 0xD1FFAB1E(a2)
+ sh3add.uw a0, a0, a1
+ ld a0, 0xD1FFAB1E(a0)
; gcrRegs +[a0]
- ;; size=44 bbWeight=1 PerfScore 14.50
+ ;; size=32 bbWeight=1 PerfScore 13.00
G_M46720_IG03: ; bbWeight=1, epilog, nogc, extend
ld ra, 8(sp)
ld fp, 0(sp)
@@ -49,7 +45,6 @@ G_M46720_IG03: ; bbWeight=1, epilog, nogc, extend
ret ;; size=16 bbWeight=1 PerfScore 7.50
G_M46720_IG04: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
; gcrRegs -[a0-a1]
- ; byrRegs -[a2]
lui a0, 0xD1FFAB1E
addiw a0, a0, 0xD1FFAB1E
slli a0, a0, 11
@@ -59,7 +54,7 @@ G_M46720_IG04: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
ebreak
;; size=28 bbWeight=0 PerfScore 0.00
-; Total bytes of code 104, prolog size 16, PerfScore 31.00, instruction count 26, allocated bytes for code 104 (MethodHash=ea0e497f) for method System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)
+; Total bytes of code 92, prolog size 16, PerfScore 29.50, instruction count 23, allocated bytes for code 92 (MethodHash=ea0e497f) for method System.Collections.Immutable.ImmutableArray`1+Enumerator[System.__Canon]:get_Current():System.__Canon:this (Tier1)
; ============================================================
Unwind Info:
@@ -70,7 +65,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 26 (0x0001a) Actual length = 104 (0x000068)
+ Function Length : 23 (0x00017) Actual length = 92 (0x00005c)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -32 (-10.13%) : 323.dasm - NumericSortJagged:NumSift(int[],int,int) (Tier1)@@ -15,10 +15,10 @@
; V04 loc1 [V04,T11] ( 2, 16.07) int -> a6
;# V05 OutArgs [V05 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V06 tmp1 [V06,T04] ( 2, 32.14) int -> a5 "Strict ordering of exceptions for Array store"
-; V07 cse0 [V07,T06] ( 3, 24.57) int -> a6 "CSE #07: aggressive"
-; V08 cse1 [V08,T07] ( 3, 24.57) int -> a5 "CSE #12: aggressive"
-; V09 cse2 [V09,T08] ( 3, 24.57) long -> a4 "CSE #05: aggressive"
-; V10 cse3 [V10,T09] ( 3, 24.57) long -> a1 "CSE #10: aggressive"
+; V07 cse0 [V07,T06] ( 3, 24.57) int -> a6 "CSE #06: aggressive"
+; V08 cse1 [V08,T07] ( 3, 24.57) int -> a5 "CSE #10: aggressive"
+; V09 cse2 [V09,T08] ( 3, 24.57) long -> a4 "CSE #04: aggressive"
+; V10 cse3 [V10,T09] ( 3, 24.57) long -> a1 "CSE #08: aggressive"
; V11 cse4 [V11,T02] ( 6, 49.57) int -> a4 multi-def "CSE #01: aggressive"
; V12 cse5 [V12,T05] ( 4, 29.17) int -> a6 "CSE #02: aggressive"
;
@@ -60,34 +60,36 @@ G_M30577_IG06: ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
slli a1, a1, 32
srli a1, a1, 32
slli a1, a1, 2
- addi a1, a1, 0xD1FFAB1E
- add t6, a0, a1
- ; byrRegs +[t6]
- lw a5, 0xD1FFAB1E(t6)
+ add a5, a0, a1
+ ; byrRegs +[a5]
+ lw a5, 0xD1FFAB1E(a5)
+ ; byrRegs -[a5]
sext.w a6, a4
sext.w a7, a3
bgeu a7, a6, G_M30577_IG11
slli a4, a3, 32
srli a4, a4, 32
slli a4, a4, 2
- addi a4, a4, 0xD1FFAB1E
- add t6, a0, a4
- lw a6, 0xD1FFAB1E(t6)
+ add a6, a0, a4
+ ; byrRegs +[a6]
+ lw a6, 0xD1FFAB1E(a6)
+ ; byrRegs -[a6]
slliw ra, a5, 0
slliw t6, a6, 0
- ; byrRegs -[t6]
bge ra, t6, G_M30577_IG04
- ;; size=88 bbWeight=8.27 PerfScore 202.53
+ ;; size=80 bbWeight=8.27 PerfScore 194.27
G_M30577_IG07: ; bbWeight=8.04, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
- add t6, a0, a4
- ; byrRegs +[t6]
- sw a5, 0xD1FFAB1E(t6)
- add t6, a0, a1
- sw a6, 0xD1FFAB1E(t6)
+ add a4, a0, a4
+ ; byrRegs +[a4]
+ sw a5, 0xD1FFAB1E(a4)
+ add a1, a0, a1
+ ; byrRegs +[a1]
+ sw a6, 0xD1FFAB1E(a1)
sext.w a1, a3
+ ; byrRegs -[a1]
;; size=20 bbWeight=8.04 PerfScore 76.34
G_M30577_IG08: ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
- ; byrRegs -[t6]
+ ; byrRegs -[a4]
slliw a3, a1, 1
slliw ra, a3, 0
slliw t6, a2, 0
@@ -104,31 +106,21 @@ G_M30577_IG10: ; bbWeight=8.26, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
sext.w a5, a4
sext.w a6, a3
bgeu a6, a5, G_M30577_IG11
- slli a5, a3, 32
- srli a5, a5, 32
- slli a5, a5, 2
- add a6, a0, a5
- ; byrRegs +[a6]
- lw a5, 0xD1FFAB1E(a6)
+ sh2add.uw a5, a3, a0
+ lw a5, 0xD1FFAB1E(a5)
addiw a6, a3, 0xD1FFAB1E
- ; byrRegs -[a6]
sext.w t0, a4
sext.w a7, a6
bgeu a7, t0, G_M30577_IG11
- slli a4, a6, 32
- srli a4, a4, 32
- slli a4, a4, 2
- add a7, a0, a4
- ; byrRegs +[a7]
- lw a4, 0xD1FFAB1E(a7)
+ sh2add.uw a4, a6, a0
+ lw a4, 0xD1FFAB1E(a4)
slliw ra, a5, 0
slliw t6, a4, 0
bge ra, t6, G_M30577_IG06
j G_M30577_IG05
- ;; size=88 bbWeight=8.26 PerfScore 210.59
+ ;; size=64 bbWeight=8.26 PerfScore 185.81
G_M30577_IG11: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcrRegs -[a0]
- ; byrRegs -[a7]
lui a0, 0xD1FFAB1E
addiw a0, a0, 0xD1FFAB1E
slli a0, a0, 11
@@ -139,7 +131,7 @@ G_M30577_IG11: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
ebreak
;; size=28 bbWeight=0 PerfScore 0.00
-; Total bytes of code 316, prolog size 16, PerfScore 596.28, instruction count 79, allocated bytes for code 316 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
+; Total bytes of code 284, prolog size 16, PerfScore 563.24, instruction count 71, allocated bytes for code 284 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
; ============================================================
Unwind Info:
@@ -150,7 +142,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 79 (0x0004f) Actual length = 316 (0x00013c)
+ Function Length : 71 (0x00047) Actual length = 284 (0x00011c)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +8 (+0.49%) : 631.dasm - EMFloatClass:Run():double:this (Tier1)@@ -30,10 +30,10 @@
; V18 tmp10 [V18,T06] ( 2, 668.27) ref -> t4 class-hnd exact "NewArr temp" <<unknown class>>
; V19 tmp11 [V19,T07] ( 2, 668.27) ref -> t4 class-hnd exact "NewArr temp" <<unknown class>>
; V20 tmp12 [V20,T20] ( 2, 0 ) ref -> a1 single-def "argument with side effect"
-; V21 cse0 [V21,T04] ( 4, 668.27) long -> s8 "CSE #05: aggressive"
-; V22 cse1 [V22,T15] ( 4, 4 ) long -> s3 "CSE #02: aggressive"
+; V21 cse0 [V21,T15] ( 4, 4 ) long -> s3 "CSE #02: aggressive"
+; V22 cse1 [V22,T04] ( 4, 668.27) long -> s8 "CSE #04: aggressive"
; V23 cse2 [V23,T14] ( 5, 171.07) int -> s2 "CSE #01: aggressive"
-; V24 cse3 [V24,T18] ( 2, 65.29) double -> fs7 hoist "CSE #06: aggressive"
+; V24 cse3 [V24,T18] ( 2, 65.29) double -> fs7 hoist "CSE #05: aggressive"
; V25 rat0 [V25,T11] ( 3, 385.71) long -> a0 "ReplaceWithLclVar is creating a new local variable"
;
; Lcl frame size = 0
@@ -176,10 +176,10 @@ G_M34029_IG03: ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
bgeu t4, t0, G_M34029_IG16
slli t3, s6, 32
srli t3, t3, 32
- slli t3, t3, 3
- addi s8, t3, 0xD1FFAB1E
+ slli s8, t3, 3
add t3, s4, s8
; byrRegs +[t3]
+ addi t3, t3, 0xD1FFAB1E
mv t4, s7,
; gcrRegs +[t4]
lui t2, 0xD1FFAB1E
@@ -249,6 +249,7 @@ G_M34029_IG03: ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
bgeu t4, t0, G_M34029_IG16
add t3, s5, s8
; byrRegs +[t3]
+ addi t3, t3, 0xD1FFAB1E
mv t4, s7,
; gcrRegs +[t4]
lui t2, 0xD1FFAB1E
@@ -318,6 +319,7 @@ G_M34029_IG03: ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
bgeu t4, t0, G_M34029_IG16
add t3, s3, s8
; byrRegs +[t3]
+ addi t3, t3, 0xD1FFAB1E
mv t4, s7,
; gcrRegs +[t4]
lui t2, 0xD1FFAB1E
@@ -334,7 +336,7 @@ G_M34029_IG03: ; bbWeight=167.07, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRe
slliw ra, s6, 0
slliw t6, s2, 0
blt ra, t6, G_M34029_IG03
- ;; size=692 bbWeight=167.07 PerfScore 30990.97
+ ;; size=700 bbWeight=167.07 PerfScore 31158.04
G_M34029_IG04: ; bbWeight=1.00, gcrefRegs=380200 {s1 s3 s4 s5}, byrefRegs=0000 {}, byref
sext.w a3, s2
mv a0, s4,
@@ -567,7 +569,7 @@ RWD00 dq 3FF0000000000000h ; 1
RWD08 dq 408F400000000000h ; 1000
-; Total bytes of code 1628, prolog size 60, PerfScore 33424.09, instruction count 355, allocated bytes for code 1628 (MethodHash=bd287b12) for method EMFloatClass:Run():double:this (Tier1)
+; Total bytes of code 1636, prolog size 60, PerfScore 33591.16, instruction count 357, allocated bytes for code 1636 (MethodHash=bd287b12) for method EMFloatClass:Run():double:this (Tier1)
; ============================================================
Unwind Info:
@@ -578,7 +580,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 407 (0x00197) Actual length = 1628 (0x00065c)
+ Function Length : 409 (0x00199) Actual length = 1636 (0x000664)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +0 (0.00%) : 12432.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SymbolExtensions:GetTypeOrReturnType(Microsoft.CodeAnalysis.CSharp.Symbol,byref,byref,byref) (Tier0)No diffs found? +0 (0.00%) : 11328.dasm - Microsoft.Cci.MetadataWriter:SerializePrimitiveType(System.Reflection.Metadata.Ecma335.CustomAttributeElementTypeEncoder,int) (Tier0)No diffs found? DetailsSize improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze output |
@@ -3160,12 +3160,17 @@ GenTree* Compiler::fgMorphIndexAddr(GenTreeIndexAddr* indexAddr) | |||
// | |||
// 1) "arrRef + (index + elemOffset)" | |||
// 2) "(arrRef + elemOffset) + index" | |||
// 3) "(arrRef + index) + elemOffset" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This -- your new option 3 -- is fundamentally unsound for GC, and in fact, is exactly what the comments here are trying to prevent. There is no guarantee that "index" is >0 and < the size of the array object. That means that arrRef + index
can point outside the array, and thus won't be properly handled by the garbage collector. You can see dotnet/coreclr#17524 where these restrictions/comments were originally added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out, sorry I didn’t understand the warning statement properly before.
After reading the original PR, I can now see why it might point outside the object. However, now I don’t understand why form 2 for ARM, which has (arrRef + offset) BYREF ADD is valid. From your example (a + (i - 10) * 4 + 8 = a + (i * 4) - 32), offset can be negative, and therefore (arrRef + offset) might point outside the object, right? Or am I misunderstanding something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve looked at the PR introducing form 2 (#61293), but I somehow still can’t see how the first BYREF ADD won’t point outside the object, especially for the example above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offset
in the example is always 8 (for arm32); it represents the distance from the object reference to the start of the element data in the array.
However, I'm also confused by the comment. Certainly reordering the formation of the byref with the bounds check would be an illegal transformation by the JIT, but if we have already bounds checked that index
is within the array, then I do not see why (arrRef + index) + elemOffset
would cause problems.
Problem 2 fixed in dotnet/coreclr#17524 seems like the "actual" problem to me, since it looked like we were assuming addition between byrefs and integers was associative in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why form 2 does not cause problems. It's possible that our expression optimizations are not sufficiently capable to convert:
a + 8 + (i - 10) * 4
=>
a + 8 + i * 4 - 40
=>
a - 32 + i * 4
=>
a + i * 4 - 32
which was the problematic original case. I don't see why this would be prevented. Does the JIT actually prevent expression optimization in byref expression trees? @EgorBo do you remember why case 2 here is sound?
The original problem case was:
a + (i - 10) * 4 + 8
=>
a + i * 4 - 40 + 8
=>
a + i * 4 - 32
Perhaps either the original fix helped prevent this, or something in the interim years did.
Problem 2 fixed in dotnet/coreclr#17524 seems like the "actual" problem to me, since it looked like we were assuming addition between byrefs and integers was associative in general.
Without that fix, the "actual fix" (fix 1: changing the types of the array index morphed terms) would be "un-done" by the subsequent morph.
Certainly reordering the formation of the byref with the bounds check would be an illegal transformation by the JIT, but if we have already bounds checked that index is within the array, then I do not see why (arrRef + index) + elemOffset would cause problems.
The problem is not the full expression, or whether it is bounds checked, it is when a partially computed byref expression is reported in a register, and that partial computation does not point within the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is not the full expression, or whether it is bounds checked, it is when a partially computed byref expression is reported in a register, and that partial computation does not point within the object
Right, but none of these patterns produce a byref outside the array object if the index has been bounds checked. They all look sound to me. If we have expression transformations that generally transform gc+(x+y) addition into (gc+x)+y, then those transformations are illegal, not the morphing that happens here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding an assertion to offset
(e.g. offset > 0
) to check that our assumption are correct? CI should catch it if it weren't true, right? Sorry if I somehow misunderstood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi all,
I tested the following C# functions both on ARM64 and RISCV64 (after introducing form 3):
[MethodImpl(MethodImplOptions.NoInlining)]
private static unsafe int Fun(int[] a, long b)
{
fixed (int* ptr = a) {
return ptr[b - 10];
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
private static int FunChecked(int[] a, long b)
{
return a[b - 10];
}
Here's the disassembly for ARM64:
;; Fun
G_M48370_IG05: ;; offset=0x0024
add x0, x0, x1, LSL #2
ldr w0, [x0, #-0x28]
;; FunChecked
G_M57461_IG02: ;; offset=0x0008
sub x2, x1, #10
ldr w3, [x0, #0x08]
cmp x2, x3
bhs G_M57461_IG04
add x0, x0, #16
lsl x1, x1, #2
sub x1, x1, #40
ldr w0, [x0, x1]
And for RISCV64:
;; Fun
G_M48370_IG05: ;; offset=0x0030
sh2add a0, a1, a0
lw a0, -40(a0)
;; FunChecked
G_M57461_IG02: ;; offset=0x0010
addi a2, a1, -10
lw a3, 8(a0)
zext.w a3, a3
bgeu a2, a3, G_M57461_IG04
slli a1, a1, 2
addi a1, a1, -40
add a0, a0, a1
lw a0, 16(a0)
As additional info, I'll put the IR before rationalization here too. For ARM64:
// Fun
STMT00003 ( 0x018[E-] ... 0x024 )
N009 ( 11, 10) [000019] ---XG+----- * RETURN int $207
N008 ( 10, 9) [000018] ---XG+----- \--* IND int <l:$184, c:$185>
N007 ( 9, 10) [000017] -----+-N--- \--* ADD long $304
N005 ( 7, 7) [000016] -----+-N--- +--* ADD long $303
N001 ( 3, 2) [000009] -----+----- | +--* LCL_VAR long V02 loc0 u:4 (last use) $340
N004 ( 3, 4) [000013] -----+----- | \--* LSH long $302
N002 ( 1, 1) [000010] -----+----- | +--* LCL_VAR long V01 arg1 u:1 (last use) $c0
N003 ( 1, 2) [000012] -----+----- | \--* CNS_INT long 2 $102
N006 ( 1, 2) [000015] -----+----- \--* CNS_INT long -40 $103
// FunChecked
STMT00000 ( 0x000[E-] ... 0x008 )
N020 ( 23, 30) [000008] ---XG+----- * RETURN int $205
N019 ( 22, 29) [000023] ---XG+----- \--* COMMA int <l:$241, c:$242>
N007 ( 11, 16) [000016] ---X-+----- +--* BOUNDS_CHECK_Rng void $205
N003 ( 3, 4) [000004] -----+----- | +--* ADD long $180
N001 ( 1, 1) [000001] -----+----- | | +--* LCL_VAR long V01 arg1 u:1 $c0
N002 ( 1, 2) [000003] -----+----- | | \--* CNS_INT long -10 $140
N006 ( 4, 5) [000015] ---X-+---U- | \--* CAST long <- uint $182
N005 ( 3, 3) [000014] ---X-+----- | \--* ARR_LENGTH int $240
N004 ( 1, 1) [000000] -----+----- | \--* LCL_VAR ref V00 arg0 u:1 $80
N018 ( 11, 13) [000024] n---G+----- \--* IND int <l:$300, c:$1c1>
N017 ( 8, 11) [000022] -----+----- \--* ARR_ADDR byref int[] $2c0
N016 ( 8, 11) [000021] -----+-N--- \--* ADD byref $281
N010 ( 3, 4) [000020] -----+----- +--* ADD byref $280
N008 ( 1, 1) [000009] -----+----- | +--* LCL_VAR ref V00 arg0 u:1 (last use) $80
N009 ( 1, 2) [000019] -----+----- | \--* CNS_INT long 16 $142
N015 ( 5, 7) [000018] -----+----- \--* ADD long $184
N013 ( 3, 4) [000011] -----+----- +--* LSH long $183
N011 ( 1, 1) [000012] -----+----- | +--* LCL_VAR long V01 arg1 u:1 (last use) $c0
N012 ( 1, 2) [000013] -----+----- | \--* CNS_INT long 2 $143
N014 ( 1, 2) [000017] -----+-N--- \--* CNS_INT long -40 $144
For RISCV64:
// Fun
N009 ( 11, 12) [000019] ---XG+----- * RETURN int $207
N008 ( 10, 11) [000018] ---XG+----- \--* IND int <l:$184, c:$185>
N007 ( 9, 14) [000017] -----+-N--- \--* ADD long $304
N005 ( 7, 9) [000016] -----+-N--- +--* ADD long $303
N001 ( 3, 2) [000009] -----+----- | +--* LCL_VAR long V02 loc0 u:4 (last use) $340
N004 ( 3, 6) [000013] -----+----- | \--* LSH long $302
N002 ( 1, 1) [000010] -----+----- | +--* LCL_VAR long V01 arg1 u:1 (last use) $c0
N003 ( 1, 4) [000012] -----+----- | \--* CNS_INT long 2 $102
N006 ( 1, 4) [000015] -----+----- \--* CNS_INT long -40 $103
// FunChecked
N020 ( 22, 36) [000008] ---XG+----- * RETURN int $205
N019 ( 21, 35) [000023] ---XG+----- \--* COMMA int <l:$241, c:$242>
N007 ( 11, 20) [000016] ---X-+----- +--* BOUNDS_CHECK_Rng void $205
N003 ( 3, 6) [000004] -----+----- | +--* ADD long $180
N001 ( 1, 1) [000001] -----+----- | | +--* LCL_VAR long V01 arg1 u:1 $c0
N002 ( 1, 4) [000003] -----+----- | | \--* CNS_INT long -10 $140
N006 ( 4, 7) [000015] ---X-+---U- | \--* CAST long <- uint $182
N005 ( 3, 3) [000014] ---X-+----- | \--* ARR_LENGTH int $240
N004 ( 1, 1) [000000] -----+----- | \--* LCL_VAR ref V00 arg0 u:1 $80
N018 ( 10, 15) [000024] n---G+----- \--* IND int <l:$300, c:$1c1>
N017 ( 7, 13) [000022] -----+----- \--* ARR_ADDR byref int[] $2c0
N016 ( 7, 13) [000021] -----+-N--- \--* ADD byref $281
N014 ( 7, 13) [000020] -----+-N--- +--* ADD byref $280
N008 ( 1, 1) [000009] -----+----- | +--* LCL_VAR ref V00 arg0 u:1 (last use) $80
N013 ( 5, 11) [000018] -----+----- | \--* ADD long $184
N011 ( 3, 6) [000011] -----+----- | +--* LSH long $183
N009 ( 1, 1) [000012] -----+----- | | +--* LCL_VAR long V01 arg1
u:1 (last use) $c0
N010 ( 1, 4) [000013] -----+----- | | \--* CNS_INT long 2 $142
N012 ( 1, 4) [000017] -----+-N--- | \--* CNS_INT long -40 $143
N015 ( 1, 4) [000019] -----+----- \--* CNS_INT long 16 $144
So, in conclusion, offset can indeed be negative when lowering to LEA
, but not when morphing (I am mistaken in my previous comment), since after fgMorphIndexAddr
, the expression will be morphed again (EDIT: see additional IR dump I put at the end). Does that still mean we're creating a BYREF
outside of the object? From the final pre-rationalization IR, it looks like byref
s are only marked for the final expression.
I don't understand why form 2 does not cause problems. It's possible that our expression optimizations are not sufficiently capable to convert:
a + 8 + (i - 10) * 4 => a + 8 + i * 4 - 40 => a - 32 + i * 4 => a + i * 4 - 32
It seems that it only optimizes it till here:
// ARM64
(a + 16) + ((i - 10) * 4)
=>
(a + 16) + (i * 4 - 40)
// RISCV64
16 + (a + ((i - 10) * 4))
=>
16 + (a + (i * 4 - 40))
If that's the case, it is still problematic? (I think this looks safe?)
Any suggestion on what we should do here? Maybe we should mark the index (in fgMorphIndexAddr
) so that it won't be morphed after fgMorphIndexAddr
? Or maybe we should open up a separate issue for this?
Any input on how we should proceed regarding this finding would be very much appreciated! 😀 In the meantime I'll try to understand this issue further 🙏
EDIT:
More IR dumps (for ARM64 FunChecked
only):
fgMorphIndexAddr (before remorph):
[000023] ---X-O----- * COMMA byref
[000016] ---X-O----- +--* BOUNDS_CHECK_Rng void
[000005] ---X------- | +--* CAST_ovfl long <- long
[000004] ----------- | | \--* SUB long
[000001] ----------- | | +--* LCL_VAR long V01 arg1 (last use)
[000003] ----------- | | \--* CNS_INT long 10
[000015] ---X-----U- | \--* CAST long <- uint
[000014] ---X------- | \--* ARR_LENGTH int
[000000] ----------- | \--* LCL_VAR ref V00 arg0 (last use)
[000022] ---X-O----- \--* ARR_ADDR byref int[]
[000021] ---X------- \--* ADD byref
[000020] ----------- +--* ADD byref
[000009] ----------- | +--* LCL_VAR ref V00 arg0 (last use)
[000019] ----------- | \--* CNS_INT long 16
[000018] ---X------- \--* MUL long
[000010] ---X------- +--* CAST_ovfl long <- long
[000011] ----------- | \--* SUB long
[000012] ----------- | +--* LCL_VAR long V01 arg1 (last use)
[000013] ----------- | \--* CNS_INT long 10
[000017] -------N--- \--* CNS_INT long 4
GenTreeNode creates assertion:
[000014] ---X-+----- * ARR_LENGTH int
In BB01 New Local Constant Assertion: V00 != null, index = #01
BB01 requires throw helper block for SCK_RNGCHK_FAIL, sharing ACD0 (data 0x00000000)
fgMorphIndexAddr (after remorph):
[000023] ---X-+----- * COMMA byref
[000016] ---X-+----- +--* BOUNDS_CHECK_Rng void
[000004] -----+----- | +--* ADD long
[000001] -----+----- | | +--* LCL_VAR long V01 arg1 (last use)
[000003] -----+----- | | \--* CNS_INT long -10
[000015] ---X-+---U- | \--* CAST long <- uint
[000014] ---X-+----- | \--* ARR_LENGTH int
[000000] -----+----- | \--* LCL_VAR ref V00 arg0 (last use)
[000022] -----+----- \--* ARR_ADDR byref int[]
[000021] -----+----- \--* ADD byref
[000020] -----+----- +--* ADD byref
[000009] -----+----- | +--* LCL_VAR ref V00 arg0 (last use)
[000019] -----+----- | \--* CNS_INT long 16
[000018] -----+----- \--* ADD long
[000011] -----+----- +--* LSH long
[000012] -----+----- | +--* LCL_VAR long V01 arg1 (last use)
[000013] -----+----- | \--* CNS_INT long 2
[000017] -----+-N--- \--* CNS_INT long -40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow response, and thanks for the continued investigation. After some investigation and internal conversation it seems like the problem that led to the avoidance of form (3) may no longer be an issue, such that your change here is ok. As an experiment, I implemented form (3) for all platforms and ran stress, including GCStress, and there were no failures (besides known failures). (I kicked it off again just to see it succeed more cleanly with our currently (hopefully) cleaner CI test runs: #114388.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, no problem, and thanks for looking into it! 👍
sh(x)add(.uw)
instructionsZba
extension instructions
8c5ac35 is being scheduled for building and testingGIT: |
superpmi asmdiffs result for commit 0fd562e : Diffs are based on 12,626 contexts (10,243 MinOpts, 2,383 FullOpts). Overall (-31,764 bytes)
MinOpts (-9,132 bytes)
FullOpts (-22,632 bytes)
Example diffstest.mch-48 (-15.19%) : 323.dasm - NumericSortJagged:NumSift(int[],int,int) (Tier1)@@ -15,10 +15,10 @@
; V04 loc1 [V04,T11] ( 2, 16.07) int -> a6
;# V05 OutArgs [V05 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V06 tmp1 [V06,T04] ( 2, 32.14) int -> a5 "Strict ordering of exceptions for Array store"
-; V07 cse0 [V07,T06] ( 3, 24.57) int -> a6 "CSE #07: aggressive"
-; V08 cse1 [V08,T07] ( 3, 24.57) int -> a5 "CSE #12: aggressive"
-; V09 cse2 [V09,T08] ( 3, 24.57) long -> a4 "CSE #05: aggressive"
-; V10 cse3 [V10,T09] ( 3, 24.57) long -> a1 "CSE #10: aggressive"
+; V07 cse0 [V07,T06] ( 3, 24.57) int -> a6 "CSE #06: aggressive"
+; V08 cse1 [V08,T07] ( 3, 24.57) int -> a5 "CSE #10: aggressive"
+; V09 cse2 [V09,T08] ( 3, 24.57) long -> a4 "CSE #04: aggressive"
+; V10 cse3 [V10,T09] ( 3, 24.57) long -> a1 "CSE #08: aggressive"
; V11 cse4 [V11,T02] ( 6, 49.57) int -> a4 multi-def "CSE #01: aggressive"
; V12 cse5 [V12,T05] ( 4, 29.17) int -> a6 "CSE #02: aggressive"
;
@@ -57,37 +57,35 @@ G_M30577_IG06: ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
sext.w a5, a4
sext.w a6, a1
bgeu a6, a5, G_M30577_IG11
- slli a1, a1, 32
- srli a1, a1, 32
- slli a1, a1, 2
- addi a1, a1, 0xD1FFAB1E
- add t6, a0, a1
- ; byrRegs +[t6]
- lw a5, 0xD1FFAB1E(t6)
+ slli.uw a1, a1, 2
+ add a5, a0, a1
+ ; byrRegs +[a5]
+ lw a5, 0xD1FFAB1E(a5)
+ ; byrRegs -[a5]
sext.w a6, a4
sext.w a7, a3
bgeu a7, a6, G_M30577_IG11
- slli a4, a3, 32
- srli a4, a4, 32
- slli a4, a4, 2
- addi a4, a4, 0xD1FFAB1E
- add t6, a0, a4
- lw a6, 0xD1FFAB1E(t6)
+ slli.uw a4, a3, 2
+ add a6, a0, a4
+ ; byrRegs +[a6]
+ lw a6, 0xD1FFAB1E(a6)
+ ; byrRegs -[a6]
slliw ra, a5, 0
slliw t6, a6, 0
- ; byrRegs -[t6]
bge ra, t6, G_M30577_IG04
- ;; size=88 bbWeight=8.27 PerfScore 202.53
+ ;; size=64 bbWeight=8.27 PerfScore 177.73
G_M30577_IG07: ; bbWeight=8.04, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
- add t6, a0, a4
- ; byrRegs +[t6]
- sw a5, 0xD1FFAB1E(t6)
- add t6, a0, a1
- sw a6, 0xD1FFAB1E(t6)
+ add a4, a0, a4
+ ; byrRegs +[a4]
+ sw a5, 0xD1FFAB1E(a4)
+ add a1, a0, a1
+ ; byrRegs +[a1]
+ sw a6, 0xD1FFAB1E(a1)
sext.w a1, a3
+ ; byrRegs -[a1]
;; size=20 bbWeight=8.04 PerfScore 76.34
G_M30577_IG08: ; bbWeight=8.27, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
- ; byrRegs -[t6]
+ ; byrRegs -[a4]
slliw a3, a1, 1
slliw ra, a3, 0
slliw t6, a2, 0
@@ -104,31 +102,25 @@ G_M30577_IG10: ; bbWeight=8.26, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, b
sext.w a5, a4
sext.w a6, a3
bgeu a6, a5, G_M30577_IG11
- slli a5, a3, 32
- srli a5, a5, 32
- slli a5, a5, 2
- add a6, a0, a5
- ; byrRegs +[a6]
- lw a5, 0xD1FFAB1E(a6)
+ sh2add.uw a5, a3, a0
+ ; byrRegs +[a5]
+ lw a5, 0xD1FFAB1E(a5)
+ ; byrRegs -[a5]
addiw a6, a3, 0xD1FFAB1E
- ; byrRegs -[a6]
sext.w t0, a4
sext.w a7, a6
bgeu a7, t0, G_M30577_IG11
- slli a4, a6, 32
- srli a4, a4, 32
- slli a4, a4, 2
- add a7, a0, a4
- ; byrRegs +[a7]
- lw a4, 0xD1FFAB1E(a7)
+ sh2add.uw a4, a6, a0
+ ; byrRegs +[a4]
+ lw a4, 0xD1FFAB1E(a4)
+ ; byrRegs -[a4]
slliw ra, a5, 0
slliw t6, a4, 0
bge ra, t6, G_M30577_IG06
j G_M30577_IG05
- ;; size=88 bbWeight=8.26 PerfScore 210.59
+ ;; size=64 bbWeight=8.26 PerfScore 185.81
G_M30577_IG11: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcrRegs -[a0]
- ; byrRegs -[a7]
lui a0, 0xD1FFAB1E
addiw a0, a0, 0xD1FFAB1E
slli a0, a0, 11
@@ -139,7 +131,7 @@ G_M30577_IG11: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
ebreak
;; size=28 bbWeight=0 PerfScore 0.00
-; Total bytes of code 316, prolog size 16, PerfScore 596.28, instruction count 79, allocated bytes for code 316 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
+; Total bytes of code 268, prolog size 16, PerfScore 546.71, instruction count 67, allocated bytes for code 268 (MethodHash=6405888e) for method NumericSortJagged:NumSift(int[],int,int) (Tier1)
; ============================================================
Unwind Info:
@@ -150,7 +142,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 79 (0x0004f) Actual length = 316 (0x00013c)
+ Function Length : 67 (0x00043) Actual length = 268 (0x00010c)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -12 (-12.50%) : 3104.dasm - System.Reflection.Internal.MemoryBlock:CheckBounds(int,int):this (Tier1)@@ -24,16 +24,13 @@ G_M49493_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=16 bbWeight=1 PerfScore 9.00
G_M49493_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byref
; byrRegs +[a0]
- slli a1, a1, 32
- srli a1, a1, 32
- slli a2, a2, 32
- srli a2, a2, 32
- add a1, a1, a2
+ zext.w a2, a2
+ add.uw a1, a1, a2
lw a0, 0xD1FFAB1E(a0)
; byrRegs -[a0]
slliw a0, a0, 0
bltu a0, a1, G_M49493_IG04
- ;; size=32 bbWeight=1 PerfScore 8.50
+ ;; size=20 bbWeight=1 PerfScore 7.00
G_M49493_IG03: ; bbWeight=1, epilog, nogc, extend
ld ra, 8(sp)
ld fp, 0(sp)
@@ -50,7 +47,7 @@ G_M49493_IG04: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
ebreak
;; size=32 bbWeight=0 PerfScore 0.00
-; Total bytes of code 96, prolog size 16, PerfScore 25.00, instruction count 24, allocated bytes for code 96 (MethodHash=09153eaa) for method System.Reflection.Internal.MemoryBlock:CheckBounds(int,int):this (Tier1)
+; Total bytes of code 84, prolog size 16, PerfScore 23.50, instruction count 21, allocated bytes for code 84 (MethodHash=09153eaa) for method System.Reflection.Internal.MemoryBlock:CheckBounds(int,int):this (Tier1)
; ============================================================
Unwind Info:
@@ -61,7 +58,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 24 (0x00018) Actual length = 96 (0x000060)
+ Function Length : 21 (0x00015) Actual length = 84 (0x000054)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -12 (-12.00%) : 3142.dasm - System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)@@ -33,13 +33,13 @@ G_M52328_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0400 {a0}, byre
sext.w a3, a1
sext.w a4, a2
bgeu a4, a3, G_M52328_IG04
- slli a1, a2, 32
- srli a1, a1, 32
- slli a1, a1, 3
- add a2, a0, a1
- ; byrRegs +[a2]
- ld a0, 0xD1FFAB1E(a2)
- ;; size=40 bbWeight=1 PerfScore 12.50
+ sh3add.uw a0, a2, a0
+ ; gcrRegs -[a0]
+ ; byrRegs +[a0]
+ ld a0, 0xD1FFAB1E(a0)
+ ; gcrRegs +[a0]
+ ; byrRegs -[a0]
+ ;; size=28 bbWeight=1 PerfScore 11.00
G_M52328_IG03: ; bbWeight=1, epilog, nogc, extend
ld ra, 8(sp)
ld fp, 0(sp)
@@ -47,7 +47,6 @@ G_M52328_IG03: ; bbWeight=1, epilog, nogc, extend
ret ;; size=16 bbWeight=1 PerfScore 7.50
G_M52328_IG04: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
; gcrRegs -[a0]
- ; byrRegs -[a2]
lui a0, 0xD1FFAB1E
addiw a0, a0, 0xD1FFAB1E
slli a0, a0, 11
@@ -57,7 +56,7 @@ G_M52328_IG04: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
ebreak
;; size=28 bbWeight=0 PerfScore 0.00
-; Total bytes of code 100, prolog size 16, PerfScore 29.00, instruction count 25, allocated bytes for code 100 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
+; Total bytes of code 88, prolog size 16, PerfScore 27.50, instruction count 22, allocated bytes for code 88 (MethodHash=cb333397) for method System.Collections.Immutable.ImmutableArray`1[System.__Canon]:get_Item(int):System.__Canon:this (Tier1)
; ============================================================
Unwind Info:
@@ -68,7 +67,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 25 (0x00019) Actual length = 100 (0x000064)
+ Function Length : 22 (0x00016) Actual length = 88 (0x000058)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +0 (0.00%) : 10416.dasm - Microsoft.CodeAnalysis.PEModule:TryGetUnmanagedCallersOnlyAttribute(System.Reflection.Metadata.EntityHandle,Microsoft.CodeAnalysis.IAttributeNamedArgumentDecoder,System.Func`4[System.String,Microsoft.CodeAnalysis.TypedConstant,ubyte,System.ValueTuple`2[ubyte,System.Collections.Immutable.ImmutableHashSet`1[Microsoft.CodeAnalysis.Symbols.INamedTypeSymbolInternal]]]):Microsoft.CodeAnalysis.UnmanagedCallersOnlyAttributeData:this (Instrumented Tier0)No diffs found? +0 (0.00%) : 8912.dasm - Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator:.ctor(Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol,Microsoft.CodeAnalysis.CSharp.BoundStatement,Microsoft.CodeAnalysis.CodeGen.ILBuilder,Microsoft.CodeAnalysis.CSharp.Emit.PEModuleBuilder,Microsoft.CodeAnalysis.CSharp.BindingDiagnosticBag,int,ubyte):this (Tier0)No diffs found? +0 (0.00%) : 8752.dasm - Microsoft.CodeAnalysis.CSharp.LocalRewriter:Rewrite(Microsoft.CodeAnalysis.CSharp.CSharpCompilation,Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol,int,Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol,Microsoft.CodeAnalysis.CSharp.BoundStatement,Microsoft.CodeAnalysis.CSharp.TypeCompilationState,Microsoft.CodeAnalysis.CSharp.SynthesizedSubmissionFields,ubyte,Microsoft.CodeAnalysis.Emit.MethodInstrumentation,Microsoft.CodeAnalysis.CodeGen.DebugDocumentProvider,Microsoft.CodeAnalysis.CSharp.BindingDiagnosticBag,byref,byref,byref,byref):Microsoft.CodeAnalysis.CSharp.BoundStatement (Tier0)No diffs found? DetailsSize improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze output |
0fd562e is being scheduled for building and testingGIT: Release-build FAILEDbuildinfo.json |
Should I rebase my commits or merge main to resolve the conflicts? |
Either one is fine. Maintainers usually squash and merge PRs so what lands in main branch history is a single commit; good for long-term as once it's a history, |
0fd562e
to
8d42b94
Compare
RISC-V Release-CLR-VF2: 9532 / 9552 (99.79%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-CLR-QEMU: 9532 / 9552 (99.79%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: |
@fuad1502 Please resolve the merge conflicts. |
Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
RISC-V Release-CLR-VF2: 9701 / 9750 (99.50%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: |
1522ba2 is being scheduled for building and testingGIT: |
/ba-g unrelated tests |
Main changes:
morph.cpp
, grouparrayRef
andindex
together, since this is the natural addressing mode for RISC-V.lower.cpp
, try to lower:ADD(LSH)
into a newSH(X)ADD(_UW)
node.ADD(CAST)
into a newADD_UW
node.LSH(CAST)
into a newLSSI_UW
node.GT_INDEX_ADDR
emitssh(x)add(.uw)
instruction.add.uw rs rs zero
instruction.With the following C# code:
Before patch:
After patch:
P.S. See #113678 for previous approach on utilizing
sh(x)add(.uw)
instructions.Part of #84834, cc @dotnet/samsung