[RISC-V] Optimize loading 64 bit constant with new algorithm implementation and using `emitDataConst` #113250

fuad1502 · 2025-03-07T08:27:12Z

In this PR, a new algorithm was implemented to reduce the number of instructions generated for loading constants to registers. Additionally, when the number instructions still exceed 5, it is instead optimized using emitDataConst.

See how clang load 64 bits constants in RISC-V with godbolt.

With the following C# function:

[MethodImpl(MethodImplOptions.NoInlining)]
public ulong Fun() {
	return 0xFFFFFFEFAFFFFFFF;
	// return 0x3FFFFFFFFFFFFFFE;
	// return 0xABCDABCDABCDABCD;
}

Before patch:

; Load FFFFFFEFAFFFFFFF
lui            a0, -524288
addiw          a0, a0, -9
slli           a0, a0, 11
addi           a0, a0, 1727
slli           a0, a0, 11
addi           a0, a0, 2047
slli           a0, a0, 11
addi           a0, a0, 2047

; Load 3FFFFFFFFFFFFFFE
lui            a0, -524288
addiw          a0, a0, -1
slli           a0, a0, 11
addi           a0, a0, 2047
slli           a0, a0, 11
addi           a0, a0, 2047
slli           a0, a0, 9
addi           a0, a0, 510

; Load ABCDABCDABCDABCD
lui            a0, 351853
addiw          a0, a0, 1510
slli           a0, a0, 11
addi           a0, a0, 1711
slli           a0, a0, 11
addi           a0, a0, 437
slli           a0, a0, 11
addi           a0, a0, 973

After patch:

; Load FFFFFFEFAFFFFFFF
addiw          a0, zero, -261
slli           a0, a0, 28
addi           a0, a0, -1 ; Example of utilizing subtraction

; Load 3FFFFFFFFFFFFFFE
addiw          a0, zero, -8
srli           a0, a0, 2 ; Example of SRLI utilization

; Load ABCDABCDABCDABCD
G_M16745_IG02:  ;; offset=0x0010
            auipc          t6, 0
            ld             a0, 24(t6)
                                                ;; size=8 bbWeight=1 PerfScore 0.00
G_M16745_IG03:  ;; offset=0x0018
            ld             ra, 8(sp)
            ld             fp, 0(sp)
            addi           sp, sp, 16
            ret                                         ;; size=16 bbWeight=1 PerfScore 0.00
RWD00   dq      ABCDABCDABCDABCDh ; Example of utilizing data section

Note: @tomeksowi point out that there is one additional optimization that GCC is able to do but clang cannot (see the godbolt link above), which uses a temporary register to utilize instruction level parallelism. However, I won't be covering that optimization in this PR.

Part of #84834, cc @dotnet/samsung

risc-vv · 2025-03-07T08:31:15Z

RISC-V Release-CLR-VF2: 9465 / 9541 (99.20%)

=======================
      passed: 9465
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 8min 55s 782ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 1e0135bfa363ed925aa72482226b24259b934918
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9465 / 9541 (99.20%)

=======================
      passed: 9465
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 45min 34s 169ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: 1e0135bfa363ed925aa72482226b24259b934918
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 630930 / 658679 (95.79%)

=======================
      passed: 630930
      failed: 316
     skipped: 1453
      killed: 27433
------------------------
  TOTAL libs: 258
 TOTAL tests: 660132
   REAL time: 2h 27min 14s 537ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 1e0135bfa363ed925aa72482226b24259b934918
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 436825 / 464641 (94.01%)

=======================
      passed: 436825
      failed: 141
     skipped: 1480
      killed: 27675
------------------------
  TOTAL libs: 258
 TOTAL tests: 466121
   REAL time: 2h 53min 22s 694ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 1e0135bfa363ed925aa72482226b24259b934918
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

am11 · 2025-03-07T12:15:26Z

See how clang load 64 bits constants in RISC-V with godbolt

With -O2, it gets more interesting https://godbolt.org/z/7PojfWjba.

risc-vv · 2025-03-09T07:11:07Z

eb1c702 is being scheduled for building and testing

GIT: eb1c702046f014631564f5d8afd658555bcbc3c6
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis

Details

Release-build FAILED

buildinfo.json
Compilation failed during core build

…er instructions

risc-vv · 2025-03-09T07:31:10Z

RISC-V Release-CLR-VF2: 9464 / 9541 (99.19%)

=======================
      passed: 9464
      failed: 60
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 8min 4s 684ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 3b5223488d79a4e72a3f0a6ad67e327e745f1042
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9464 / 9541 (99.19%)

=======================
      passed: 9464
      failed: 60
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 45min 34s 658ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: 3b5223488d79a4e72a3f0a6ad67e327e745f1042
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 429741 / 457637 (93.90%)

=======================
      passed: 429741
      failed: 147
     skipped: 1478
      killed: 27749
------------------------
  TOTAL libs: 258
 TOTAL tests: 459115
   REAL time: 2h 50min 20s 28ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 3b5223488d79a4e72a3f0a6ad67e327e745f1042
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 659033 / 696638 (94.60%)

=======================
      passed: 659033
      failed: 321
     skipped: 1453
      killed: 37284
------------------------
  TOTAL libs: 258
 TOTAL tests: 698091
   REAL time: 2h 26min 56s 959ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 3b5223488d79a4e72a3f0a6ad67e327e745f1042
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

fuad1502 · 2025-03-09T12:15:57Z

Hi @tomeksowi , sorry for bothering you on the weekends. I was wondering, is the risc-vv test paused on the weekend or my test simply hangs? Because it's almost 4 hours since it's scheduled for build + test, yet no test result is shown up till now 😅

filipnavara · 2025-03-09T12:20:13Z

Because it's almost 4 hours since it's scheduled for build + test, yet no test result is shown up till now 😅

It often takes several hours, especially with the previous builds in a queue. Also, I think this is more in @sirntar's turf.

fuad1502 · 2025-03-09T12:21:34Z

It often takes several hours, especially with the previous builds in a queue. Also, I think this is more in @sirntar's turf.

Alright, thanks for the info! 👍

tomeksowi · 2025-03-09T12:31:05Z

Hi @tomeksowi , sorry for bothering you on the weekends. I was wondering, is the risc-vv test paused on the weekend or my test simply hangs? Because it's almost 4 hours since it's scheduled for build + test, yet no test result is shown up till now 😅

There was a maintenance power-off in SRPOL office over the weekend, maybe sth didn't restart properly. I'm away so don't have access, we'll check on Monday.

am11 · 2025-03-10T13:36:51Z

#113250 (comment) is updated. I noticed that sometimes it takes time (few hours to days) but eventually it updates the comment.

filipnavara · 2025-03-10T13:40:18Z

One of the failures could be relevant:

[1.760s] JIT.Directed.ConstantFolding.value_numbering_checked_arithmetic_with_constants_ro.value_numbering_checked_arithmetic_with_constants_ro
    value_numbering_checked_arithmetic_with_constants_ro.sh
    [exitcode_101]: 
    Unknown exit code 101.
    03
   at Xunit.Assert.Equal[T](T expected, T actual, IEqualityComparer`1 comparer) in /_/src/Microsoft.DotNet.XUnitAssert/src/EqualityAsserts.cs:line 174
   at Xunit.Assert.Equal[T](T expected, T actual) in /_/src/Microsoft.DotNet.XUnitAssert/src/EqualityAsserts.cs:line 96
   at __GeneratedMainWrapper.Main()

fuad1502 · 2025-03-10T13:43:45Z

One of the failures could be relevant:

[1.760s] JIT.Directed.ConstantFolding.value_numbering_checked_arithmetic_with_constants_ro.value_numbering_checked_arithmetic_with_constants_ro
    value_numbering_checked_arithmetic_with_constants_ro.sh
    [exitcode_101]: 
    Unknown exit code 101.
    03
   at Xunit.Assert.Equal[T](T expected, T actual, IEqualityComparer`1 comparer) in /_/src/Microsoft.DotNet.XUnitAssert/src/EqualityAsserts.cs:line 174
   at Xunit.Assert.Equal[T](T expected, T actual) in /_/src/Microsoft.DotNet.XUnitAssert/src/EqualityAsserts.cs:line 96
   at __GeneratedMainWrapper.Main()

Yup, it caught an edge case I didn't handle properly, working on it now 👍

filipnavara · 2025-03-10T13:45:42Z

Yup, it caught an edge case I didn't handle properly, working on it now 👍

Out of curiosity, is it something like the sign bit not propagated correctly because you omit one of the instructions? (I didn't look at the code, so just wildly guessing.)

fuad1502 · 2025-03-10T14:33:21Z

Out of curiosity, is it something like the sign bit not propagated correctly because you omit one of the instructions? (I didn't look at the code, so just wildly guessing.)

Yes, sometimes I gave +1 to the lui operand, I already made sure the original operand have the expected sign bit to be extended, but for a particular operand value (0x7FFFFFFF), the sign bit changes, which causes unintended sign extension.

I've figured out a fix, but I realized something else. clang cleverly use srli instruction for this particular value, which I don't utilize at all in this algorithm, which results in 1 additional instruction..

src/coreclr/jit/emitriscv64.cpp

tomeksowi · 2025-03-10T16:44:09Z

; Load 0xABCDABCDABCDABCD
IN0006: 000018      ld             ra, 8(sp)
IN0007: 00001C      ld             fp, 0(sp)
IN0008: 000020      addi           sp, sp, 16
IN0009: 000024      ret                                         ;; size=16 bbWeight=1 PerfScore 0.00
Emitting data sections: 8 total bytes

RWD00   dq      ABCDABCDABCDABCDh
  section   0, size  8, RWD 0:  cd ab cd ab cd ab cd ab

I don't know if you're going to cover sequences with a temporary register in this PR but in this case it could detect that the immediate bits can be split into addable halves, in this case:

lui temp, 0xABCDA
addi temp, temp, 0xBCD
slli dest, temp, 32
add dest, dest, temp

If so, and you have microbenchmarks at hand, it would be worth checking if having > 5 instructions in the general case of the above would still be faster than loading due to a more parallelize-able workload, e.g. for 0x12345'678'98765'432:

lui temp, 0x12345
lui dest, 0x98765
addi temp, temp, 0x678
addi dest, dest, 0x432
slli temp, temp, 32
add dest, dest, temp

EDIT: GCC does it so probably it is faster.

BTW, in the asm examples I forgot to incorporate addi's sign bit into the lui bits so it's a bit more complicated. Still doable.

fuad1502 · 2025-03-11T00:42:50Z

I don't know if you're going to cover sequences with a temporary register in this PR but in this case it could detect that the immediate bits can be split into addable halves, in this case:

@tomeksowi nice suggestion! But I think I'll leave that to another PR. We can add that path later to replace cases where it would generate emitDataConst. What do you think?

…eros

risc-vv · 2025-03-11T03:26:11Z

RISC-V Release-CLR-VF2: 9465 / 9541 (99.20%)

=======================
      passed: 9465
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 5min 55s 583ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: e124349509ac3acf346453dfc8f16aff68212ecb
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 429341 / 467750 (91.79%)

=======================
      passed: 429341
      failed: 135
     skipped: 1478
      killed: 38274
------------------------
  TOTAL libs: 258
 TOTAL tests: 469228
   REAL time: 2h 43min 40s 215ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: e124349509ac3acf346453dfc8f16aff68212ecb
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 660039 / 687745 (95.97%)

=======================
      passed: 660039
      failed: 319
     skipped: 1429
      killed: 27387
------------------------
  TOTAL libs: 258
 TOTAL tests: 689174
   REAL time: 2h 24min 58s 596ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: e124349509ac3acf346453dfc8f16aff68212ecb
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

tomeksowi · 2025-03-11T07:03:41Z

@tomeksowi nice suggestion! But I think I'll leave that to another PR. We can add that path later to replace cases where it would generate emitDataConst. What do you think?

Fine with me.

risc-vv · 2025-03-12T03:36:02Z

RISC-V Release-CLR-VF2: 9465 / 9541 (99.20%)

=======================
      passed: 9465
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 8min 3s 784ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 3998bd37e1f3a27b345a08873a2620d3748364ac
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9465 / 9541 (99.20%)

=======================
      passed: 9465
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9647
 TOTAL tests: 9647
   REAL time: 2h 45min 52s 600ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: 3998bd37e1f3a27b345a08873a2620d3748364ac
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 687315 / 712141 (96.51%)

=======================
      passed: 687315
      failed: 672
     skipped: 1481
      killed: 24154
------------------------
  TOTAL libs: 258
 TOTAL tests: 713622
   REAL time: 2h 44min 23s 742ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 3998bd37e1f3a27b345a08873a2620d3748364ac
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 630219 / 658569 (95.70%)

=======================
      passed: 630219
      failed: 861
     skipped: 1455
      killed: 27489
------------------------
  TOTAL libs: 258
 TOTAL tests: 660024
   REAL time: 2h 25min 22s 679ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 3998bd37e1f3a27b345a08873a2620d3748364ac
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/pointer-synthesis
CONFIG: Release
LIB_CONFIG: Release

risc-vv · 2025-03-21T14:51:13Z

RISC-V Release-CLR-VF2: 9524 / 9544 (99.79%)

=======================
      passed: 9524
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 6min 3s 631ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: db65e0a4ac05bcee54d374f0c6ef81b8e3969e92
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9524 / 9544 (99.79%)

=======================
      passed: 9524
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 46min 28s 673ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: db65e0a4ac05bcee54d374f0c6ef81b8e3969e92
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 436721 / 465185 (93.88%)

=======================
      passed: 436721
      failed: 682
     skipped: 1513
      killed: 27782
------------------------
  TOTAL libs: 258
 TOTAL tests: 466698
   REAL time: 2h 57min 50s 481ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: db65e0a4ac05bcee54d374f0c6ef81b8e3969e92
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

src/coreclr/jit/emitriscv64.cpp

…ting prolog

risc-vv · 2025-03-25T03:31:11Z

RISC-V Release-CLR-VF2: 9524 / 9544 (99.79%)

=======================
      passed: 9524
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 13min 52s 452ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: b66f7be633a7ebed00773b3fe2f1e935b0259f79
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9524 / 9544 (99.79%)

=======================
      passed: 9524
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9650
 TOTAL tests: 9650
   REAL time: 2h 47min 40s 47ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: b66f7be633a7ebed00773b3fe2f1e935b0259f79
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 641743 / 665283 (96.46%)

=======================
      passed: 641743
      failed: 889
     skipped: 1705
      killed: 22651
------------------------
  TOTAL libs: 258
 TOTAL tests: 666988
   REAL time: 2h 32min 19s 661ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: b66f7be633a7ebed00773b3fe2f1e935b0259f79
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 435375 / 470663 (92.50%)

=======================
      passed: 435375
      failed: 664
     skipped: 1538
      killed: 34624
------------------------
  TOTAL libs: 258
 TOTAL tests: 472201
   REAL time: 2h 52min 39s 816ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: b66f7be633a7ebed00773b3fe2f1e935b0259f79
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

fuad1502 · 2025-03-25T03:59:10Z

Diffs are based on 12,734 contexts (10,221 MinOpts, 2,513 FullOpts).

Overall (-882,648 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
test.mch	6,984,916	-882,648	-1.62%

MinOpts (-684,524 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
test.mch	5,347,656	-684,524	-1.52%

FullOpts (-198,124 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
test.mch	1,637,260	-198,124	-2.06%

Example diffs

test.mch

-48 (-38.71%) : 198.dasm - System.ConsolePal:InvalidateCachedCursorPosition() (Tier0)

@@ -19,36 +19,27 @@ G_M58234_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M58234_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addi           a0, zero, 0xD1FFAB1E
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
-            addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
-            addi           a1, a1, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a1, 0xD1FFAB1E(t6)
             sw             a0, 0xD1FFAB1E(a1)
-            lui            a0, 0xD1FFAB1E
-            addiw          a0, a0, 0xD1FFAB1E
-            slli           a0, a0, 11
-            addi           a0, a0, 0xD1FFAB1E
-            slli           a0, a0, 5
-            addi           a0, a0, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a0, 0xD1FFAB1E(t6)
             lw             a0, 0xD1FFAB1E(a0)
             addiw          a0, a0, 0xD1FFAB1E
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
-            addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
-            addi           a1, a1, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a1, 0xD1FFAB1E(t6)
             sw             a0, 0xD1FFAB1E(a1)
-						;; size=92 bbWeight=1 PerfScore 20.00
+						;; size=44 bbWeight=1 PerfScore 17.00
 G_M58234_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
+RWD00  	dq	00007E19B463B30Ch
+RWD08  	dq	00007E19B463B308h
 
-; Total bytes of code 124, prolog size 16, PerfScore 36.50, instruction count 31, allocated bytes for code 124 (MethodHash=d4221c85) for method System.ConsolePal:InvalidateCachedCursorPosition() (Tier0)
+
+; Total bytes of code 76, prolog size 16, PerfScore 33.50, instruction count 16, allocated bytes for code 76 (MethodHash=d4221c85) for method System.ConsolePal:InvalidateCachedCursorPosition() (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -59,7 +50,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 31 (0x0001f) Actual length = 124 (0x00007c)
+  Function Length   : 19 (0x00013) Actual length = 76 (0x00004c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-48 (-36.36%) : 667.dasm - System.ConsolePal:InvalidateTerminalSettings() (FullOpts)

@@ -23,38 +23,30 @@ G_M52800_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M52800_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addi           a0, fp, -16
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
-            addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
-            addi           a1, a1, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a1, 0xD1FFAB1E(t6)
             jalr           a1		// CORINFO_HELP_JIT_REVERSE_PINVOKE_ENTER
             addi           a0, zero, 0xD1FFAB1E
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
-            addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
-            addi           a1, a1, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a1, 0xD1FFAB1E(t6)
             fence          3, 3
             sw             a0, 0xD1FFAB1E(a1)
             addi           a0, fp, -16
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
-            addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
-            addi           a1, a1, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a1, 0xD1FFAB1E(t6)
             jalr           a1		// CORINFO_HELP_JIT_REVERSE_PINVOKE_EXIT
-						;; size=100 bbWeight=1 PerfScore 25.50
+						;; size=52 bbWeight=1 PerfScore 22.50
 G_M52800_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 24(sp)
             ld             fp, 16(sp)
             addi           sp, sp, 32
             ret						;; size=16 bbWeight=1 PerfScore 7.50
+RWD00  	dq	00007E1A33AB9A74h
+RWD08  	dq	00007E19B463B31Ch
+RWD16  	dq	00007E1A33AB9BCCh
 
-; Total bytes of code 132, prolog size 16, PerfScore 42.00, instruction count 33, allocated bytes for code 132 (MethodHash=bc7731bf) for method System.ConsolePal:InvalidateTerminalSettings() (FullOpts)
+
+; Total bytes of code 84, prolog size 16, PerfScore 39.00, instruction count 18, allocated bytes for code 84 (MethodHash=bc7731bf) for method System.ConsolePal:InvalidateTerminalSettings() (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -65,7 +57,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 33 (0x00021) Actual length = 132 (0x000084)
+  Function Length   : 21 (0x00015) Actual length = 84 (0x000054)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-32 (-34.78%) : 11911.dasm - Microsoft.CodeAnalysis.SyntaxNode+ChildSyntaxListEnumeratorStack+<>c:<.cctor>b__12_0():Microsoft.CodeAnalysis.ChildSyntaxList+Enumerator[]:this (Tier0)

@@ -20,29 +20,24 @@ G_M43111_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             sd             a0, -8(fp)
 						;; size=20 bbWeight=1 PerfScore 13.00
 G_M43111_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            lui            a0, 0xD1FFAB1E
-            addiw          a0, a0, 0xD1FFAB1E
-            slli           a0, a0, 11
-            addi           a0, a0, 0xD1FFAB1E
-            slli           a0, a0, 5
-            addi           a0, a0, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a0, 0xD1FFAB1E(t6)
             addi           a1, zero, 0xD1FFAB1E
-            lui            a2, 0xD1FFAB1E
-            addiw          a2, a2, 0xD1FFAB1E
-            slli           a2, a2, 11
-            addi           a2, a2, 0xD1FFAB1E
-            slli           a2, a2, 5
-            addi           a2, a2, 0xD1FFAB1E
+            auipc          t6, 0xD1FFAB1E
+            ld             a2, 0xD1FFAB1E(t6)
             jalr           a2		// CORINFO_HELP_NEWARR_1_VC
             ; gcrRegs +[a0]
-						;; size=56 bbWeight=1 PerfScore 9.50
+						;; size=24 bbWeight=1 PerfScore 7.50
 G_M43111_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 24(sp)
             ld             fp, 16(sp)
             addi           sp, sp, 32
             ret						;; size=16 bbWeight=1 PerfScore 7.50
+RWD00  	dq	0000768AA9A1BFB8h
+RWD08  	dq	0000768B238B2044h
 
-; Total bytes of code 92, prolog size 16, PerfScore 30.00, instruction count 23, allocated bytes for code 92 (MethodHash=94ff5798) for method Microsoft.CodeAnalysis.SyntaxNode+ChildSyntaxListEnumeratorStack+<>c:<.cctor>b__12_0():Microsoft.CodeAnalysis.ChildSyntaxList+Enumerator[]:this (Tier0)
+
+; Total bytes of code 60, prolog size 16, PerfScore 28.00, instruction count 13, allocated bytes for code 60 (MethodHash=94ff5798) for method Microsoft.CodeAnalysis.SyntaxNode+ChildSyntaxListEnumeratorStack+<>c:<.cctor>b__12_0():Microsoft.CodeAnalysis.ChildSyntaxList+Enumerator[]:this (Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -53,7 +48,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 23 (0x00017) Actual length = 92 (0x00005c)
+  Function Length   : 15 (0x0000f) Actual length = 60 (0x00003c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

+0 (0.00%) : 12720.dasm - System.Linq.Enumerable+EnumerableSorter`1[System.__Canon]:Sort(System.__Canon[],int):int[]:this (Tier0)

@@ -33,9 +33,9 @@ G_M50207_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             lw             a2, -20(fp)
             lui            a3, 0xD1FFAB1E
             addiw          a3, a3, 0xD1FFAB1E
-            slli           a3, a3, 11
+            slli           a3, a3, 12
             addi           a3, a3, 0xD1FFAB1E
-            slli           a3, a3, 5
+            slli           a3, a3, 4
             ld             a3, 0xD1FFAB1E(a3)
             jalr           a3		// <unknown method>
             ; gcrRegs -[a1]

+0 (0.00%) : 12704.dasm - Microsoft.CodeAnalysis.CSharp.VariablesDeclaredWalker:Free():this (Tier0)

@@ -24,9 +24,9 @@ G_M15256_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcrRegs +[a0]
             lui            a1, 0xD1FFAB1E
             addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
+            slli           a1, a1, 14
             addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
+            slli           a1, a1, 2
             ld             a1, 0xD1FFAB1E(a1)
             jalr           a1		// <unknown method>
             ; gcrRegs -[a0]

+0 (0.00%) : 12640.dasm - Microsoft.CodeAnalysis.DiagnosticBag:Add(Microsoft.CodeAnalysis.Diagnostic):this (Tier0)

@@ -28,9 +28,9 @@ G_M13912_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcrRegs +[a0]
             lui            a1, 0xD1FFAB1E
             addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 11
+            slli           a1, a1, 14
             addi           a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 5
+            slli           a1, a1, 2
             ld             a1, 0xD1FFAB1E(a1)
             jalr           a1		// <unknown method>
             sd             a0, -24(fp)
@@ -39,9 +39,9 @@ G_M13912_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcrRegs +[a1]
             lui            a2, 0xD1FFAB1E
             addiw          a2, a2, 0xD1FFAB1E
-            slli           a2, a2, 11
+            slli           a2, a2, 14
             addi           a2, a2, 0xD1FFAB1E
-            slli           a2, a2, 5
+            slli           a2, a2, 2
             ld             a2, 0xD1FFAB1E(a2)
             lw             zero, 0xD1FFAB1E(a0)
             jalr           a2		// <unknown method>

Details

Size improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Same size	Improvements (bytes)	Regressions (bytes)
test.mch	10,193	6,908	0	3,285	-882,648	+0

PerfScore improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Same PerfScore	Improvements (PerfScore)	Regressions (PerfScore)	PerfScore Overall in FullOpts
test.mch	10,193	6,705	0	3,488	-2.46%	0.00%	-1.6590%

Context information

Collection	Diffed contexts	MinOpts	FullOpts	Missed, base	Missed, diff
test.mch	12,734	10,221	2,513	0 (0.00%)	0 (0.00%)

jit-analyze output

Report generated after merging fuad1502@544cf0c to the local branch & diffing with that commit.

src/coreclr/jit/emitriscv64.cpp

risc-vv · 2025-03-27T06:26:08Z

RISC-V Release-CLR-VF2: 9527 / 9547 (99.79%)

=======================
      passed: 9527
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9653
 TOTAL tests: 9653
   REAL time: 2h 12min 12s 31ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: db365a7ffc6c0119a5a6d0e7975c47be746b04bc
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9527 / 9547 (99.79%)

=======================
      passed: 9527
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9653
 TOTAL tests: 9653
   REAL time: 2h 47min 58s 142ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: db365a7ffc6c0119a5a6d0e7975c47be746b04bc
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 627539 / 665359 (94.32%)

=======================
      passed: 627539
      failed: 695
     skipped: 1417
      killed: 37125
------------------------
  TOTAL libs: 258
 TOTAL tests: 666776
   REAL time: 2h 50min 6s 808ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: db365a7ffc6c0119a5a6d0e7975c47be746b04bc
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 622000 / 654961 (94.97%)

=======================
      passed: 622000
      failed: 892
     skipped: 1459
      killed: 32069
------------------------
  TOTAL libs: 258
 TOTAL tests: 656420
   REAL time: 2h 27min 52s 746ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: db365a7ffc6c0119a5a6d0e7975c47be746b04bc
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

jakobbotsch

LGTM

jakobbotsch · 2025-04-02T14:03:44Z

src/coreclr/jit/emitriscv64.cpp

+    /* The following algorithm works based on the following equation:
+     * `imm = high32 + offset1` OR `imm = high32 - offset2`
+     *
+     * high32 will be loaded with `lui + addiw`, while offset
+     * will be loaded with `slli + addi` in 11-bits chunks
+     *
+     * First, determine at which position to partition imm into high32 and offset,
+     * so that it yields the least instruction.
+     * Where high32 = imm[y:x] and imm[63:y] are all zeroes or all ones.
+     *
+     * From the above equation, the value of offset1 & offset2 are:
+     * -> offset1 = imm[x-1:0]
+     * -> offset2 = ~(imm[x-1:0] - 1)
+     * The smaller offset should yield the least instruction. (is this correct?) */


This is not the preferred style of comments: https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/clr-jit-coding-conventions.md#711-comment-style

Feel free to include as part of a follow-up to avoid rerunning CI.

Alright, thank you, I’ll create a follow up PR and make sure to review the coding conventions 👍

jakobbotsch · 2025-04-02T14:04:28Z

/ba-g Azurelinux 3 timeouts

BruceForstall · 2025-04-02T18:48:17Z

Notes:

It would be nice if the disassembly (in JitDisasm/JitDump) of the first instruction in the sequence displayed a comment with the full (hex, and possibly also decimal) value.
If we need to generate relocations (ops.compReloc), does the code always generate a load from the data section with a reloc to that data? Or is there a fallback path to a fixed sequence of inline code that has a defined reloc?

fuad1502 · 2025-04-03T10:52:14Z

@BruceForstall Thank you for the notes.

Alright, I’ll create a follow up PR for it.
Correct me if I’m wrong in any of my assumptions. I am relatively new to the codebase and still need to acquitance myself more with the design. So, as far as I know, RISCV64 (like ARM64 & LoongArch64) put data section together with code:

runtime/src/coreclr/jit/emit.cpp

Lines 6804 to 6806 in 4631ece

    
           #if defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64) 
        
               // For arm64/LoongArch64, we're going to put the data in the code section. So make sure the code section has 
        
               // adequate alignment.

Therefore, we don’t need to generate relocations and simply use PC relative instructions (auipc + ld in RISCV64) to load constants from the data section. I looked at ARM64 implementation for loading data constants, and they also don’t seem to generate relocations.

However, refering to the following:

runtime/docs/design/coreclr/jit/hot-cold-splitting.md

Lines 82 to 86 in 4631ece

    
           * Without splitting, the read-only data section is adjacent to the function's instruction section on ARM64. When 
        
           splitting, the data section is adjacent to the hot section; from the hot section, we can load constants with a single 
        
           `ldr` instruction. However, this is not possible from the cold section: Because it is arbitrarily far away, the target 
        
           address cannot be determined relative to the PC. Instead, the JIT emits a `IF_LARGELDC` pseudoinstruction with a 
        
           few different possibilities:

If we’re in the cold region, the data section (located in hot code region) might be arbitrarily far away, whereas auipc + ld could only reach +-2GB from the instruction. Therefore, currently in RISCV64, we load the absolute address onto a register and load from it if we’re in a cold region. This of course would cause problems if the code is supposed to be relocatable.

Then I realized that ARM64 actually generates either ldr or adrp + ldr where adrp + ldr is used when we’re in the cold region. However, adrp + ldr only reaches +-4GB from the instruction, not to an “arbitrary” location.

To address the particular problem when the code is supposed to be relocatable & we’re in a cold region, I would need to answers to the following:

Is there an absolute maximum code size? Is it target dependent?
Is the relative position between hot and cold region fixed? e.g. Does using PC relative addressing between the regions leads to position independent code?
Is there an absolute maximum distance between hot and cold region? Is it target dependent?
Is hot cold splitting even implemented for RISC-V? What about fake splitting?

I’m still reading the codebase to get the answers, but if you have any information that you can share about this, or you already know some of the answers, please do let me know, I would really appreciate it 😄 And sorry if by opening this PR with my currently minimum knowledge on .NET JIT is causing more trouble than it helps, I’ll try to learn more!

jakobbotsch · 2025-04-03T11:42:59Z

I was under the impression that you are generating relocations when you said that you addressed the issue above: #113250 (comment)
~~As mentioned above we need relocations for these because AOT compilation handles the RO data block specially~~ (maybe not?)

I looked at ARM64 implementation for loading data constants, and they also don’t seem to generate relocations.

Hmm, it's very possible the AOT compilers never move this data around. If the other backends are also not recording relocations it does not seem like a problem.

jakobbotsch · 2025-04-03T11:58:08Z

Some more clarity on top of what @fuad1502 wrote above comes from here:

runtime/src/coreclr/jit/ee_il_dll.cpp

Lines 1143 to 1162 in 1587221

    
           #if defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64) 
        
               // For arm64/LoongArch64/RISCV64, we want to allocate JIT data always adjacent to code similar to what native 
        
               // compiler does. 
        
               // This way allows us to use a single `ldr` to access such data like float constant/jmp table. 
        
               // For LoongArch64 using `pcaddi + ld` to access such data. 
        
               UNATIVE_OFFSET roDataAlignmentDelta = 0; 
        
               if (args->roDataSize > 0) 
        
               { 
        
                   roDataAlignmentDelta = AlignmentPad(args->hotCodeSize, roDataSectionAlignment); 
        
               } 
        
               const UNATIVE_OFFSET roDataOffset = args->hotCodeSize + roDataAlignmentDelta; 
        
               args->hotCodeSize                 = roDataOffset + args->roDataSize; 
        
               args->roDataSize                  = 0; 
        
           #endif // defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64) 
        
               info.compCompHnd->allocMem(args);

So essentially, for these backends we allocate no data section at all, we just allocate a larger hot code section.

BruceForstall · 2025-04-03T21:59:23Z

@fuad1502 Thanks for the analysis. You are correct, for arm64/loongarch64/riscv64, where the read-only data is appended to the hot cold section, if you load it via pc-relative addressing no relocations are required.

As for your questions:

The max code size is about 2GB, due to maximum branches normally being +/-2GB. But I don't know if we enforce any maximums. We may hit a failure point if branches are too big, or something similar. Branches on x64/arm64 (and loongarch64/riscv64?) can go through "branch islands" to extend the branch range.
the relative position between hot/cold is not fixed at JIT and requires relocs.
I don't know if there is a maximum distance between hot and cold region. I suspect no, but if so, it would be about 2GB.
hot/cold splitting is mostly likely not (yet?) implemented for RISC-V. It's only implemented for R2R/NAOT compilation modes. "Fake" splitting is a testing mode; you could turn it on and see if RISC-V implements it, but it is not high priority, even for R2R/NAOT.

fuad1502 · 2025-04-04T14:31:48Z

@BruceForstall Thanks for the answers! So in conclusion, for your second note in the original comment, I only need to address the particular case where currently I load from an absolute address when loading constant from cold section, despite relocation requirement. But since it seems that we can safely use +-2GB as the maximum distance between cold code and constant data in hot code, I’ll create a follow up PR to use PC relative addressing for loading constant, regardless, but generate relocs when loading from cold section. I’ll make sure to add an assertion to check the distance assumption validity.

Does this sounds about right?

BruceForstall · 2025-04-08T00:24:41Z

Does this sounds about right?

That all sounds right. You could also have an assert that there is no hot/cold splitting at all (which I presume there isn't yet), if you want to defer this until later.

[RISC-V] Optimize loading 64 bit constant with emitDataConst

1e0135b

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 7, 2025

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 7, 2025

am11 added the arch-riscv Related to the RISC-V architecture label Mar 7, 2025

fuad1502 changed the title ~~[RISC-V] Optimize loading 64 bit constant with emitDataConst~~ [RISC-V] Optimize loading 64 bit constant with new algorithm implementation and using emitDataConst Mar 9, 2025

[RISC-V] Implement an algorithm for loading immediate that yields few…

3b52234

…er instructions

fuad1502 force-pushed the riscv-jit-opt/pointer-synthesis branch from eb1c702 to 3b52234 Compare March 9, 2025 07:28

am11 mentioned this pull request Mar 9, 2025

Fix CAS mustExpand assertions in checked build #113300

Closed

tomeksowi reviewed Mar 10, 2025

View reviewed changes

src/coreclr/jit/emitriscv64.cpp Outdated Show resolved Hide resolved

src/coreclr/jit/emitriscv64.cpp Outdated Show resolved Hide resolved

fuad1502 added 2 commits March 11, 2025 09:49

[RISC-V] Handle corner case where high32 is 0x7FFFFFFF

6dd2dda

[RISC-V] Optimize further: move chunk window for as many as leading z…

e124349

…eros

[RISC-V] Optimize load immidiate with SRLI instruction

04da400

[RISC-V] Use pcrel for loading constant unless loading from cold section

db65e0a

tomeksowi approved these changes Mar 24, 2025

View reviewed changes

jakobbotsch reviewed Mar 24, 2025

View reviewed changes

src/coreclr/jit/emitriscv64.cpp Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Mar 24, 2025

Segmentation fault running System.Security.Cryptography.Tests #113785

Closed

[RISC-V] Make sure to not use emitDataConst + emitIns_R_C when genera…

b66f7be

…ting prolog

This was referenced Mar 25, 2025

[QUIC & HTTP/3] Handshake Timeout on tests #104426

Closed

System.Net.Quic tests timeout #107761

Closed

jakobbotsch reviewed Mar 26, 2025

View reviewed changes

src/coreclr/jit/emitriscv64.cpp Outdated Show resolved Hide resolved

[RISC-V] Don't generate emitDataConst + emitIns_R_C in epilog

db365a7

This was referenced Mar 27, 2025

System.TimeoutException : The operation has timed out. dotnet/dnceng#5279

Closed

System.Net.Requests test timeout #113883

Closed

BruceForstall self-requested a review April 1, 2025 22:46

jakobbotsch approved these changes Apr 2, 2025

View reviewed changes

jakobbotsch reviewed Apr 2, 2025

View reviewed changes

jakobbotsch merged commit 4b2fe3c into dotnet:main Apr 2, 2025
109 of 111 checks passed

fuad1502 mentioned this pull request Apr 10, 2025

[RISC-V] Print load immediate value in disassembly #114470

Merged

fuad1502 mentioned this pull request May 5, 2025

[RISC-V] Optimize emitLoadImmediate further #115298

Closed

8 tasks

github-actions bot locked and limited conversation to collaborators May 8, 2025

[RISC-V] Optimize loading 64 bit constant with new algorithm implementation and using emitDataConst #113250

[RISC-V] Optimize loading 64 bit constant with new algorithm implementation and using emitDataConst #113250

Uh oh!

Conversation

fuad1502 commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

risc-vv commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am11 commented Mar 7, 2025

Uh oh!

risc-vv commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

risc-vv commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuad1502 commented Mar 9, 2025

Uh oh!

filipnavara commented Mar 9, 2025

Uh oh!

fuad1502 commented Mar 9, 2025

Uh oh!

tomeksowi commented Mar 9, 2025

Uh oh!

am11 commented Mar 10, 2025

Uh oh!

filipnavara commented Mar 10, 2025

Uh oh!

fuad1502 commented Mar 10, 2025

Uh oh!

filipnavara commented Mar 10, 2025

Uh oh!

fuad1502 commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tomeksowi commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuad1502 commented Mar 11, 2025

Uh oh!

risc-vv commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomeksowi commented Mar 11, 2025

Uh oh!

risc-vv commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

risc-vv commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

risc-vv commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuad1502 commented Mar 25, 2025

Size improvements/regressions per collection

PerfScore improvements/regressions per collection

Context information

jit-analyze output

Uh oh!

Uh oh!

risc-vv commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakobbotsch left a comment

Choose a reason for hiding this comment

Uh oh!

jakobbotsch Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

fuad1502 Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

jakobbotsch commented Apr 2, 2025

Uh oh!

[RISC-V] Optimize loading 64 bit constant with new algorithm implementation and using `emitDataConst` #113250

[RISC-V] Optimize loading 64 bit constant with new algorithm implementation and using `emitDataConst` #113250

fuad1502 commented Mar 7, 2025 •

edited

Loading

risc-vv commented Mar 7, 2025 •

edited

Loading

risc-vv commented Mar 9, 2025 •

edited

Loading

risc-vv commented Mar 9, 2025 •

edited

Loading

fuad1502 commented Mar 10, 2025 •

edited

Loading

tomeksowi commented Mar 10, 2025 •

edited

Loading

risc-vv commented Mar 11, 2025 •

edited

Loading

risc-vv commented Mar 12, 2025 •

edited

Loading

risc-vv commented Mar 21, 2025 •

edited

Loading

risc-vv commented Mar 25, 2025 •

edited

Loading

risc-vv commented Mar 27, 2025 •

edited

Loading

jakobbotsch commented Apr 3, 2025 •

edited

Loading

fuad1502 commented Apr 4, 2025 •

edited

Loading