Skip to content

[RISC-V] Optimize loading 64 bit constant with new algorithm implementation and using emitDataConst #113250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 2, 2025

Conversation

fuad1502
Copy link
Contributor

@fuad1502 fuad1502 commented Mar 7, 2025

In this PR, a new algorithm was implemented to reduce the number of instructions generated for loading constants to registers. Additionally, when the number instructions still exceed 5, it is instead optimized using emitDataConst.

See how clang load 64 bits constants in RISC-V with godbolt.

With the following C# function:

[MethodImpl(MethodImplOptions.NoInlining)]
public ulong Fun() {
	return 0xFFFFFFEFAFFFFFFF;
	// return 0x3FFFFFFFFFFFFFFE;
	// return 0xABCDABCDABCDABCD;
}

Before patch:

; Load FFFFFFEFAFFFFFFF
lui            a0, -524288
addiw          a0, a0, -9
slli           a0, a0, 11
addi           a0, a0, 1727
slli           a0, a0, 11
addi           a0, a0, 2047
slli           a0, a0, 11
addi           a0, a0, 2047

; Load 3FFFFFFFFFFFFFFE
lui            a0, -524288
addiw          a0, a0, -1
slli           a0, a0, 11
addi           a0, a0, 2047
slli           a0, a0, 11
addi           a0, a0, 2047
slli           a0, a0, 9
addi           a0, a0, 510

; Load ABCDABCDABCDABCD
lui            a0, 351853
addiw          a0, a0, 1510
slli           a0, a0, 11
addi           a0, a0, 1711
slli           a0, a0, 11
addi           a0, a0, 437
slli           a0, a0, 11
addi           a0, a0, 973

After patch:

; Load FFFFFFEFAFFFFFFF
addiw          a0, zero, -261
slli           a0, a0, 28
addi           a0, a0, -1 ; Example of utilizing subtraction

; Load 3FFFFFFFFFFFFFFE
addiw          a0, zero, -8
srli           a0, a0, 2 ; Example of SRLI utilization

; Load ABCDABCDABCDABCD
G_M16745_IG02:  ;; offset=0x0010
            auipc          t6, 0
            ld             a0, 24(t6)
                                                ;; size=8 bbWeight=1 PerfScore 0.00
G_M16745_IG03:  ;; offset=0x0018
            ld             ra, 8(sp)
            ld             fp, 0(sp)
            addi           sp, sp, 16
            ret                                         ;; size=16 bbWeight=1 PerfScore 0.00
RWD00   dq      ABCDABCDABCDABCDh ; Example of utilizing data section

Note: @tomeksowi point out that there is one additional optimization that GCC is able to do but clang cannot (see the godbolt link above), which uses a temporary register to utilize instruction level parallelism. However, I won't be covering that optimization in this PR.

Part of #84834, cc @dotnet/samsung

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 7, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 7, 2025
@risc-vv
Copy link

risc-vv commented Mar 7, 2025

RISC-V Release-CLR-VF2: 9465 / 9541 (99.20%)
=======================
      passed: 9465
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 8min 55s 782ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 1e0135bfa363ed925aa72482226b24259b934918
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9465 / 9541 (99.20%)
=======================
      passed: 9465
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 45min 34s 169ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: 1e0135bfa363ed925aa72482226b24259b934918
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 630930 / 658679 (95.79%)
=======================
      passed: 630930
      failed: 316
     skipped: 1453
      killed: 27433
------------------------
  TOTAL libs: 258
 TOTAL tests: 660132
   REAL time: 2h 27min 14s 537ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 1e0135bfa363ed925aa72482226b24259b934918
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 436825 / 464641 (94.01%)
=======================
      passed: 436825
      failed: 141
     skipped: 1480
      killed: 27675
------------------------
  TOTAL libs: 258
 TOTAL tests: 466121
   REAL time: 2h 53min 22s 694ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 1e0135bfa363ed925aa72482226b24259b934918
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

@am11 am11 added the arch-riscv Related to the RISC-V architecture label Mar 7, 2025
@am11
Copy link
Member

am11 commented Mar 7, 2025

See how clang load 64 bits constants in RISC-V with godbolt

With -O2, it gets more interesting https://godbolt.org/z/7PojfWjba.

@risc-vv
Copy link

risc-vv commented Mar 9, 2025

eb1c702 is being scheduled for building and testing

GIT: eb1c702046f014631564f5d8afd658555bcbc3c6
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis

Release-build FAILED

buildinfo.json
Compilation failed during core build

@fuad1502 fuad1502 changed the title [RISC-V] Optimize loading 64 bit constant with emitDataConst [RISC-V] Optimize loading 64 bit constant with new algorithm implementation and using emitDataConst Mar 9, 2025
@fuad1502 fuad1502 force-pushed the riscv-jit-opt/pointer-synthesis branch from eb1c702 to 3b52234 Compare March 9, 2025 07:28
@risc-vv
Copy link

risc-vv commented Mar 9, 2025

RISC-V Release-CLR-VF2: 9464 / 9541 (99.19%)
=======================
      passed: 9464
      failed: 60
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 8min 4s 684ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 3b5223488d79a4e72a3f0a6ad67e327e745f1042
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9464 / 9541 (99.19%)
=======================
      passed: 9464
      failed: 60
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 45min 34s 658ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: 3b5223488d79a4e72a3f0a6ad67e327e745f1042
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 429741 / 457637 (93.90%)
=======================
      passed: 429741
      failed: 147
     skipped: 1478
      killed: 27749
------------------------
  TOTAL libs: 258
 TOTAL tests: 459115
   REAL time: 2h 50min 20s 28ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 3b5223488d79a4e72a3f0a6ad67e327e745f1042
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 659033 / 696638 (94.60%)
=======================
      passed: 659033
      failed: 321
     skipped: 1453
      killed: 37284
------------------------
  TOTAL libs: 258
 TOTAL tests: 698091
   REAL time: 2h 26min 56s 959ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 3b5223488d79a4e72a3f0a6ad67e327e745f1042
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

@fuad1502
Copy link
Contributor Author

fuad1502 commented Mar 9, 2025

Hi @tomeksowi , sorry for bothering you on the weekends. I was wondering, is the risc-vv test paused on the weekend or my test simply hangs? Because it's almost 4 hours since it's scheduled for build + test, yet no test result is shown up till now 😅

@filipnavara
Copy link
Member

Because it's almost 4 hours since it's scheduled for build + test, yet no test result is shown up till now 😅

It often takes several hours, especially with the previous builds in a queue. Also, I think this is more in @sirntar's turf.

@fuad1502
Copy link
Contributor Author

fuad1502 commented Mar 9, 2025

It often takes several hours, especially with the previous builds in a queue. Also, I think this is more in @sirntar's turf.

Alright, thanks for the info! 👍

@tomeksowi
Copy link
Contributor

Hi @tomeksowi , sorry for bothering you on the weekends. I was wondering, is the risc-vv test paused on the weekend or my test simply hangs? Because it's almost 4 hours since it's scheduled for build + test, yet no test result is shown up till now 😅

There was a maintenance power-off in SRPOL office over the weekend, maybe sth didn't restart properly. I'm away so don't have access, we'll check on Monday.

@am11
Copy link
Member

am11 commented Mar 10, 2025

#113250 (comment) is updated. I noticed that sometimes it takes time (few hours to days) but eventually it updates the comment.

@filipnavara
Copy link
Member

One of the failures could be relevant:

[1.760s] JIT.Directed.ConstantFolding.value_numbering_checked_arithmetic_with_constants_ro.value_numbering_checked_arithmetic_with_constants_ro
    value_numbering_checked_arithmetic_with_constants_ro.sh
    [exitcode_101]: 
    Unknown exit code 101.
    03
   at Xunit.Assert.Equal[T](T expected, T actual, IEqualityComparer`1 comparer) in /_/src/Microsoft.DotNet.XUnitAssert/src/EqualityAsserts.cs:line 174
   at Xunit.Assert.Equal[T](T expected, T actual) in /_/src/Microsoft.DotNet.XUnitAssert/src/EqualityAsserts.cs:line 96
   at __GeneratedMainWrapper.Main()

@fuad1502
Copy link
Contributor Author

One of the failures could be relevant:

[1.760s] JIT.Directed.ConstantFolding.value_numbering_checked_arithmetic_with_constants_ro.value_numbering_checked_arithmetic_with_constants_ro
    value_numbering_checked_arithmetic_with_constants_ro.sh
    [exitcode_101]: 
    Unknown exit code 101.
    03
   at Xunit.Assert.Equal[T](T expected, T actual, IEqualityComparer`1 comparer) in /_/src/Microsoft.DotNet.XUnitAssert/src/EqualityAsserts.cs:line 174
   at Xunit.Assert.Equal[T](T expected, T actual) in /_/src/Microsoft.DotNet.XUnitAssert/src/EqualityAsserts.cs:line 96
   at __GeneratedMainWrapper.Main()

Yup, it caught an edge case I didn't handle properly, working on it now 👍

@filipnavara
Copy link
Member

Yup, it caught an edge case I didn't handle properly, working on it now 👍

Out of curiosity, is it something like the sign bit not propagated correctly because you omit one of the instructions? (I didn't look at the code, so just wildly guessing.)

@fuad1502
Copy link
Contributor Author

fuad1502 commented Mar 10, 2025

Out of curiosity, is it something like the sign bit not propagated correctly because you omit one of the instructions? (I didn't look at the code, so just wildly guessing.)

Yes, sometimes I gave +1 to the lui operand, I already made sure the original operand have the expected sign bit to be extended, but for a particular operand value (0x7FFFFFFF), the sign bit changes, which causes unintended sign extension.

I've figured out a fix, but I realized something else. clang cleverly use srli instruction for this particular value, which I don't utilize at all in this algorithm, which results in 1 additional instruction..

@tomeksowi
Copy link
Contributor

tomeksowi commented Mar 10, 2025

; Load 0xABCDABCDABCDABCD
IN0006: 000018      ld             ra, 8(sp)
IN0007: 00001C      ld             fp, 0(sp)
IN0008: 000020      addi           sp, sp, 16
IN0009: 000024      ret                                         ;; size=16 bbWeight=1 PerfScore 0.00
Emitting data sections: 8 total bytes

RWD00   dq      ABCDABCDABCDABCDh
  section   0, size  8, RWD 0:  cd ab cd ab cd ab cd ab 

I don't know if you're going to cover sequences with a temporary register in this PR but in this case it could detect that the immediate bits can be split into addable halves, in this case:

lui temp, 0xABCDA
addi temp, temp, 0xBCD
slli dest, temp, 32
add dest, dest, temp

If so, and you have microbenchmarks at hand, it would be worth checking if having > 5 instructions in the general case of the above would still be faster than loading due to a more parallelize-able workload, e.g. for 0x12345'678'98765'432:

lui temp, 0x12345
lui dest, 0x98765
addi temp, temp, 0x678
addi dest, dest, 0x432
slli temp, temp, 32
add dest, dest, temp

EDIT: GCC does it so probably it is faster.

BTW, in the asm examples I forgot to incorporate addi's sign bit into the lui bits so it's a bit more complicated. Still doable.

@fuad1502
Copy link
Contributor Author

I don't know if you're going to cover sequences with a temporary register in this PR but in this case it could detect that the immediate bits can be split into addable halves, in this case:

@tomeksowi nice suggestion! But I think I'll leave that to another PR. We can add that path later to replace cases where it would generate emitDataConst. What do you think?

@risc-vv
Copy link

risc-vv commented Mar 11, 2025

RISC-V Release-CLR-VF2: 9465 / 9541 (99.20%)
=======================
      passed: 9465
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 5min 55s 583ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: e124349509ac3acf346453dfc8f16aff68212ecb
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 429341 / 467750 (91.79%)
=======================
      passed: 429341
      failed: 135
     skipped: 1478
      killed: 38274
------------------------
  TOTAL libs: 258
 TOTAL tests: 469228
   REAL time: 2h 43min 40s 215ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: e124349509ac3acf346453dfc8f16aff68212ecb
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 660039 / 687745 (95.97%)
=======================
      passed: 660039
      failed: 319
     skipped: 1429
      killed: 27387
------------------------
  TOTAL libs: 258
 TOTAL tests: 689174
   REAL time: 2h 24min 58s 596ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: e124349509ac3acf346453dfc8f16aff68212ecb
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

@tomeksowi
Copy link
Contributor

@tomeksowi nice suggestion! But I think I'll leave that to another PR. We can add that path later to replace cases where it would generate emitDataConst. What do you think?

Fine with me.

@risc-vv
Copy link

risc-vv commented Mar 12, 2025

RISC-V Release-CLR-VF2: 9465 / 9541 (99.20%)
=======================
      passed: 9465
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 8min 3s 784ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 3998bd37e1f3a27b345a08873a2620d3748364ac
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9465 / 9541 (99.20%)
=======================
      passed: 9465
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 45min 52s 600ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: 3998bd37e1f3a27b345a08873a2620d3748364ac
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 687315 / 712141 (96.51%)
=======================
      passed: 687315
      failed: 672
     skipped: 1481
      killed: 24154
------------------------
  TOTAL libs: 258
 TOTAL tests: 713622
   REAL time: 2h 44min 23s 742ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 3998bd37e1f3a27b345a08873a2620d3748364ac
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 630219 / 658569 (95.70%)
=======================
      passed: 630219
      failed: 861
     skipped: 1455
      killed: 27489
------------------------
  TOTAL libs: 258
 TOTAL tests: 660024
   REAL time: 2h 25min 22s 679ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 3998bd37e1f3a27b345a08873a2620d3748364ac
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

@risc-vv
Copy link

risc-vv commented Mar 20, 2025

RISC-V Release-FX-VF2: 0 / 258 (0.00%)
=======================
      passed: 0
      failed: 0
     skipped: 0
      killed: 258
------------------------
  TOTAL libs: 258
 TOTAL tests: 258
   REAL time: 42s 436ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 5214b8ffb741f624ca2cb3880a1f10791186dd9d
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-VF2: 9468 / 9544 (99.20%)
=======================
      passed: 9468
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 11min 11s 339ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 5214b8ffb741f624ca2cb3880a1f10791186dd9d
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9468 / 9544 (99.20%)
=======================
      passed: 9468
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 45min 34s 355ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: 5214b8ffb741f624ca2cb3880a1f10791186dd9d
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 0 / 258 (0.00%)
=======================
      passed: 0
      failed: 0
     skipped: 0
      killed: 258
------------------------
  TOTAL libs: 258
 TOTAL tests: 258
   REAL time: 42s 867ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 5214b8ffb741f624ca2cb3880a1f10791186dd9d
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@risc-vv
Copy link

risc-vv commented Mar 21, 2025

RISC-V Release-CLR-VF2: 9524 / 9544 (99.79%)
=======================
      passed: 9524
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 6min 3s 631ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: db65e0a4ac05bcee54d374f0c6ef81b8e3969e92
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9524 / 9544 (99.79%)
=======================
      passed: 9524
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 46min 28s 673ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: db65e0a4ac05bcee54d374f0c6ef81b8e3969e92
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 436721 / 465185 (93.88%)
=======================
      passed: 436721
      failed: 682
     skipped: 1513
      killed: 27782
------------------------
  TOTAL libs: 258
 TOTAL tests: 466698
   REAL time: 2h 57min 50s 481ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: db65e0a4ac05bcee54d374f0c6ef81b8e3969e92
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@risc-vv
Copy link

risc-vv commented Mar 25, 2025

RISC-V Release-CLR-VF2: 9524 / 9544 (99.79%)
=======================
      passed: 9524
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 13min 52s 452ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: b66f7be633a7ebed00773b3fe2f1e935b0259f79
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9524 / 9544 (99.79%)
=======================
      passed: 9524
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 47min 40s 47ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: b66f7be633a7ebed00773b3fe2f1e935b0259f79
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 641743 / 665283 (96.46%)
=======================
      passed: 641743
      failed: 889
     skipped: 1705
      killed: 22651
------------------------
  TOTAL libs: 258
 TOTAL tests: 666988
   REAL time: 2h 32min 19s 661ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: b66f7be633a7ebed00773b3fe2f1e935b0259f79
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 435375 / 470663 (92.50%)
=======================
      passed: 435375
      failed: 664
     skipped: 1538
      killed: 34624
------------------------
  TOTAL libs: 258
 TOTAL tests: 472201
   REAL time: 2h 52min 39s 816ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: b66f7be633a7ebed00773b3fe2f1e935b0259f79
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@fuad1502
Copy link
Contributor Author

Diffs are based on 12,734 contexts (10,221 MinOpts, 2,513 FullOpts).

Overall (-882,648 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 6,984,916 -882,648 -1.62%
MinOpts (-684,524 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 5,347,656 -684,524 -1.52%
FullOpts (-198,124 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
test.mch 1,637,260 -198,124 -2.06%
Example diffs
test.mch
-48 (-38.71%) : 198.dasm - System.ConsolePal:InvalidateCachedCursorPosition() (Tier0)
@@ -19,36 +19,27 @@ G_M58234_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M58234_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addi           a0, zero, 0xD1FFAB1E
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
-            addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
-            addi           a1, a1, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a1, 0xD1FFAB1E(t6)
             sw             a0, 0xD1FFAB1E(a1)
-            lui            a0, 0xD1FFAB1E
-            addiw          a0, a0, 0xD1FFAB1E
-            slli           a0, a0, 11
-            addi           a0, a0, 0xD1FFAB1E
-            slli           a0, a0, 5
-            addi           a0, a0, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a0, 0xD1FFAB1E(t6)
             lw             a0, 0xD1FFAB1E(a0)
             addiw          a0, a0, 0xD1FFAB1E
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
-            addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
-            addi           a1, a1, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a1, 0xD1FFAB1E(t6)
             sw             a0, 0xD1FFAB1E(a1)
-						;; size=92 bbWeight=1 PerfScore 20.00
+						;; size=44 bbWeight=1 PerfScore 17.00
 G_M58234_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
+RWD00  	dq	00007E19B463B30Ch
+RWD08  	dq	00007E19B463B308h
 
-; Total bytes of code 124, prolog size 16, PerfScore 36.50, instruction count 31, allocated bytes for code 124 (MethodHash=d4221c85) for method System.ConsolePal:InvalidateCachedCursorPosition() (Tier0)
+
+; Total bytes of code 76, prolog size 16, PerfScore 33.50, instruction count 16, allocated bytes for code 76 (MethodHash=d4221c85) for method System.ConsolePal:InvalidateCachedCursorPosition() (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -59,7 +50,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 31 (0x0001f) Actual length = 124 (0x00007c)
+  Function Length   : 19 (0x00013) Actual length = 76 (0x00004c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-48 (-36.36%) : 667.dasm - System.ConsolePal:InvalidateTerminalSettings() (FullOpts)
@@ -23,38 +23,30 @@ G_M52800_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M52800_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addi           a0, fp, -16
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
-            addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
-            addi           a1, a1, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a1, 0xD1FFAB1E(t6)
             jalr           a1		// CORINFO_HELP_JIT_REVERSE_PINVOKE_ENTER
             addi           a0, zero, 0xD1FFAB1E
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
-            addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
-            addi           a1, a1, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a1, 0xD1FFAB1E(t6)
             fence          3, 3
             sw             a0, 0xD1FFAB1E(a1)
             addi           a0, fp, -16
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
-            addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
-            addi           a1, a1, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a1, 0xD1FFAB1E(t6)
             jalr           a1		// CORINFO_HELP_JIT_REVERSE_PINVOKE_EXIT
-						;; size=100 bbWeight=1 PerfScore 25.50
+						;; size=52 bbWeight=1 PerfScore 22.50
 G_M52800_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 24(sp)
             ld             fp, 16(sp)
             addi           sp, sp, 32
             ret						;; size=16 bbWeight=1 PerfScore 7.50
+RWD00  	dq	00007E1A33AB9A74h
+RWD08  	dq	00007E19B463B31Ch
+RWD16  	dq	00007E1A33AB9BCCh
 
-; Total bytes of code 132, prolog size 16, PerfScore 42.00, instruction count 33, allocated bytes for code 132 (MethodHash=bc7731bf) for method System.ConsolePal:InvalidateTerminalSettings() (FullOpts)
+
+; Total bytes of code 84, prolog size 16, PerfScore 39.00, instruction count 18, allocated bytes for code 84 (MethodHash=bc7731bf) for method System.ConsolePal:InvalidateTerminalSettings() (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -65,7 +57,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 33 (0x00021) Actual length = 132 (0x000084)
+  Function Length   : 21 (0x00015) Actual length = 84 (0x000054)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-32 (-34.78%) : 11911.dasm - Microsoft.CodeAnalysis.SyntaxNode+ChildSyntaxListEnumeratorStack+<>c:<.cctor>b__12_0():Microsoft.CodeAnalysis.ChildSyntaxList+Enumerator[]:this (Tier0)
@@ -20,29 +20,24 @@ G_M43111_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             sd             a0, -8(fp)
 						;; size=20 bbWeight=1 PerfScore 13.00
 G_M43111_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            lui            a0, 0xD1FFAB1E
-            addiw          a0, a0, 0xD1FFAB1E
-            slli           a0, a0, 11
-            addi           a0, a0, 0xD1FFAB1E
-            slli           a0, a0, 5
-            addi           a0, a0, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a0, 0xD1FFAB1E(t6)
             addi           a1, zero, 0xD1FFAB1E
-            lui            a2, 0xD1FFAB1E
-            addiw          a2, a2, 0xD1FFAB1E
-            slli           a2, a2, 11
-            addi           a2, a2, 0xD1FFAB1E
-            slli           a2, a2, 5
-            addi           a2, a2, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a2, 0xD1FFAB1E(t6)
             jalr           a2		// CORINFO_HELP_NEWARR_1_VC
             ; gcrRegs +[a0]
-						;; size=56 bbWeight=1 PerfScore 9.50
+						;; size=24 bbWeight=1 PerfScore 7.50
 G_M43111_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 24(sp)
             ld             fp, 16(sp)
             addi           sp, sp, 32
             ret						;; size=16 bbWeight=1 PerfScore 7.50
+RWD00  	dq	0000768AA9A1BFB8h
+RWD08  	dq	0000768B238B2044h
 
-; Total bytes of code 92, prolog size 16, PerfScore 30.00, instruction count 23, allocated bytes for code 92 (MethodHash=94ff5798) for method Microsoft.CodeAnalysis.SyntaxNode+ChildSyntaxListEnumeratorStack+<>c:<.cctor>b__12_0():Microsoft.CodeAnalysis.ChildSyntaxList+Enumerator[]:this (Tier0)
+
+; Total bytes of code 60, prolog size 16, PerfScore 28.00, instruction count 13, allocated bytes for code 60 (MethodHash=94ff5798) for method Microsoft.CodeAnalysis.SyntaxNode+ChildSyntaxListEnumeratorStack+<>c:<.cctor>b__12_0():Microsoft.CodeAnalysis.ChildSyntaxList+Enumerator[]:this (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -53,7 +48,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 23 (0x00017) Actual length = 92 (0x00005c)
+  Function Length   : 15 (0x0000f) Actual length = 60 (0x00003c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 12720.dasm - System.Linq.Enumerable+EnumerableSorter`1[System.__Canon]:Sort(System.__Canon[],int):int[]:this (Tier0)
@@ -33,9 +33,9 @@ G_M50207_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             lw             a2, -20(fp)
             lui            a3, 0xD1FFAB1E
             addiw          a3, a3, 0xD1FFAB1E
-            slli           a3, a3, 11
+            slli           a3, a3, 12
             addi           a3, a3, 0xD1FFAB1E
-            slli           a3, a3, 5
+            slli           a3, a3, 4
             ld             a3, 0xD1FFAB1E(a3)
             jalr           a3		// <unknown method>
             ; gcrRegs -[a1]
+0 (0.00%) : 12704.dasm - Microsoft.CodeAnalysis.CSharp.VariablesDeclaredWalker:Free():this (Tier0)
@@ -24,9 +24,9 @@ G_M15256_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcrRegs +[a0]
             lui            a1, 0xD1FFAB1E
             addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
+            slli           a1, a1, 14
             addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
+            slli           a1, a1, 2
             ld             a1, 0xD1FFAB1E(a1)
             jalr           a1		// <unknown method>
             ; gcrRegs -[a0]
+0 (0.00%) : 12640.dasm - Microsoft.CodeAnalysis.DiagnosticBag:Add(Microsoft.CodeAnalysis.Diagnostic):this (Tier0)
@@ -28,9 +28,9 @@ G_M13912_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcrRegs +[a0]
             lui            a1, 0xD1FFAB1E
             addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
+            slli           a1, a1, 14
             addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
+            slli           a1, a1, 2
             ld             a1, 0xD1FFAB1E(a1)
             jalr           a1		// <unknown method>
             sd             a0, -24(fp)
@@ -39,9 +39,9 @@ G_M13912_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcrRegs +[a1]
             lui            a2, 0xD1FFAB1E
             addiw          a2, a2, 0xD1FFAB1E
-            slli           a2, a2, 11
+            slli           a2, a2, 14
             addi           a2, a2, 0xD1FFAB1E
-            slli           a2, a2, 5
+            slli           a2, a2, 2
             ld             a2, 0xD1FFAB1E(a2)
             lw             zero, 0xD1FFAB1E(a0)
             jalr           a2		// <unknown method>
Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
test.mch 10,193 6,908 0 3,285 -882,648 +0

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
test.mch 10,193 6,705 0 3,488 -2.46% 0.00% -1.6590%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
test.mch 12,734 10,221 2,513 0 (0.00%) 0 (0.00%)

jit-analyze output

Report generated after merging fuad1502@544cf0c to the local branch & diffing with that commit.

@risc-vv
Copy link

risc-vv commented Mar 27, 2025

RISC-V Release-CLR-VF2: 9527 / 9547 (99.79%)
=======================
      passed: 9527
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9653
 TOTAL tests: 9653
   REAL time: 2h 12min 12s 31ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: db365a7ffc6c0119a5a6d0e7975c47be746b04bc
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9527 / 9547 (99.79%)
=======================
      passed: 9527
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9653
 TOTAL tests: 9653
   REAL time: 2h 47min 58s 142ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: db365a7ffc6c0119a5a6d0e7975c47be746b04bc
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 627539 / 665359 (94.32%)
=======================
      passed: 627539
      failed: 695
     skipped: 1417
      killed: 37125
------------------------
  TOTAL libs: 258
 TOTAL tests: 666776
   REAL time: 2h 50min 6s 808ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: db365a7ffc6c0119a5a6d0e7975c47be746b04bc
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 622000 / 654961 (94.97%)
=======================
      passed: 622000
      failed: 892
     skipped: 1459
      killed: 32069
------------------------
  TOTAL libs: 258
 TOTAL tests: 656420
   REAL time: 2h 27min 52s 746ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: db365a7ffc6c0119a5a6d0e7975c47be746b04bc
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +1288 to +1301
/* The following algorithm works based on the following equation:
* `imm = high32 + offset1` OR `imm = high32 - offset2`
*
* high32 will be loaded with `lui + addiw`, while offset
* will be loaded with `slli + addi` in 11-bits chunks
*
* First, determine at which position to partition imm into high32 and offset,
* so that it yields the least instruction.
* Where high32 = imm[y:x] and imm[63:y] are all zeroes or all ones.
*
* From the above equation, the value of offset1 & offset2 are:
* -> offset1 = imm[x-1:0]
* -> offset2 = ~(imm[x-1:0] - 1)
* The smaller offset should yield the least instruction. (is this correct?) */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the preferred style of comments: https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/clr-jit-coding-conventions.md#711-comment-style

Feel free to include as part of a follow-up to avoid rerunning CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thank you, I’ll create a follow up PR and make sure to review the coding conventions 👍

@jakobbotsch
Copy link
Member

/ba-g Azurelinux 3 timeouts

@jakobbotsch jakobbotsch merged commit 4b2fe3c into dotnet:main Apr 2, 2025
109 of 111 checks passed
@BruceForstall
Copy link
Member

Notes:

  1. It would be nice if the disassembly (in JitDisasm/JitDump) of the first instruction in the sequence displayed a comment with the full (hex, and possibly also decimal) value.
  2. If we need to generate relocations (ops.compReloc), does the code always generate a load from the data section with a reloc to that data? Or is there a fallback path to a fixed sequence of inline code that has a defined reloc?

@fuad1502
Copy link
Contributor Author

fuad1502 commented Apr 3, 2025

@BruceForstall Thank you for the notes.

  1. Alright, I’ll create a follow up PR for it.
  2. Correct me if I’m wrong in any of my assumptions. I am relatively new to the codebase and still need to acquitance myself more with the design. So, as far as I know, RISCV64 (like ARM64 & LoongArch64) put data section together with code:

#if defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64)
// For arm64/LoongArch64, we're going to put the data in the code section. So make sure the code section has
// adequate alignment.

Therefore, we don’t need to generate relocations and simply use PC relative instructions (auipc + ld in RISCV64) to load constants from the data section. I looked at ARM64 implementation for loading data constants, and they also don’t seem to generate relocations.

However, refering to the following:

* Without splitting, the read-only data section is adjacent to the function's instruction section on ARM64. When
splitting, the data section is adjacent to the hot section; from the hot section, we can load constants with a single
`ldr` instruction. However, this is not possible from the cold section: Because it is arbitrarily far away, the target
address cannot be determined relative to the PC. Instead, the JIT emits a `IF_LARGELDC` pseudoinstruction with a
few different possibilities:

If we’re in the cold region, the data section (located in hot code region) might be arbitrarily far away, whereas auipc + ld could only reach +-2GB from the instruction. Therefore, currently in RISCV64, we load the absolute address onto a register and load from it if we’re in a cold region. This of course would cause problems if the code is supposed to be relocatable.

Then I realized that ARM64 actually generates either ldr or adrp + ldr where adrp + ldr is used when we’re in the cold region. However, adrp + ldr only reaches +-4GB from the instruction, not to an “arbitrary” location.

To address the particular problem when the code is supposed to be relocatable & we’re in a cold region, I would need to answers to the following:

  1. Is there an absolute maximum code size? Is it target dependent?
  2. Is the relative position between hot and cold region fixed? e.g. Does using PC relative addressing between the regions leads to position independent code?
  3. Is there an absolute maximum distance between hot and cold region? Is it target dependent?
  4. Is hot cold splitting even implemented for RISC-V? What about fake splitting?

I’m still reading the codebase to get the answers, but if you have any information that you can share about this, or you already know some of the answers, please do let me know, I would really appreciate it 😄 And sorry if by opening this PR with my currently minimum knowledge on .NET JIT is causing more trouble than it helps, I’ll try to learn more!

@jakobbotsch
Copy link
Member

jakobbotsch commented Apr 3, 2025

I was under the impression that you are generating relocations when you said that you addressed the issue above: #113250 (comment)
As mentioned above we need relocations for these because AOT compilation handles the RO data block specially (maybe not?)

I looked at ARM64 implementation for loading data constants, and they also don’t seem to generate relocations.

Hmm, it's very possible the AOT compilers never move this data around. If the other backends are also not recording relocations it does not seem like a problem.

@jakobbotsch
Copy link
Member

Some more clarity on top of what @fuad1502 wrote above comes from here:

#if defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64)
// For arm64/LoongArch64/RISCV64, we want to allocate JIT data always adjacent to code similar to what native
// compiler does.
// This way allows us to use a single `ldr` to access such data like float constant/jmp table.
// For LoongArch64 using `pcaddi + ld` to access such data.
UNATIVE_OFFSET roDataAlignmentDelta = 0;
if (args->roDataSize > 0)
{
roDataAlignmentDelta = AlignmentPad(args->hotCodeSize, roDataSectionAlignment);
}
const UNATIVE_OFFSET roDataOffset = args->hotCodeSize + roDataAlignmentDelta;
args->hotCodeSize = roDataOffset + args->roDataSize;
args->roDataSize = 0;
#endif // defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64)
info.compCompHnd->allocMem(args);

So essentially, for these backends we allocate no data section at all, we just allocate a larger hot code section.

@BruceForstall
Copy link
Member

@fuad1502 Thanks for the analysis. You are correct, for arm64/loongarch64/riscv64, where the read-only data is appended to the hot cold section, if you load it via pc-relative addressing no relocations are required.

As for your questions:

  1. The max code size is about 2GB, due to maximum branches normally being +/-2GB. But I don't know if we enforce any maximums. We may hit a failure point if branches are too big, or something similar. Branches on x64/arm64 (and loongarch64/riscv64?) can go through "branch islands" to extend the branch range.
  2. the relative position between hot/cold is not fixed at JIT and requires relocs.
  3. I don't know if there is a maximum distance between hot and cold region. I suspect no, but if so, it would be about 2GB.
  4. hot/cold splitting is mostly likely not (yet?) implemented for RISC-V. It's only implemented for R2R/NAOT compilation modes. "Fake" splitting is a testing mode; you could turn it on and see if RISC-V implements it, but it is not high priority, even for R2R/NAOT.

@fuad1502
Copy link
Contributor Author

fuad1502 commented Apr 4, 2025

@BruceForstall Thanks for the answers! So in conclusion, for your second note in the original comment, I only need to address the particular case where currently I load from an absolute address when loading constant from cold section, despite relocation requirement. But since it seems that we can safely use +-2GB as the maximum distance between cold code and constant data in hot code, I’ll create a follow up PR to use PC relative addressing for loading constant, regardless, but generate relocs when loading from cold section. I’ll make sure to add an assertion to check the distance assumption validity.

Does this sounds about right?

@BruceForstall
Copy link
Member

Does this sounds about right?

That all sounds right. You could also have an assert that there is no hot/cold splitting at all (which I presume there isn't yet), if you want to defer this until later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants