-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[RISC-V] Synthesizing PC-relative pointer constants #119203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
c2181d6 to
bd0e79c
Compare
|
@risc-vv /run |
RISC-V pull_request-CLR-QEMU: 9108 / 9139 (99.66%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V pull_request-CLR-VF2: 9108 / 9139 (99.66%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V pull_request-FX-QEMU: 0 / 0 (100.00%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V pull_request-FX-VF2: 0 / 61 (0.00%)report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
|
No regressions. Quite a bit of post-bringup cruft went out the window. Diffs are based on 104,367 contexts (2,270 MinOpts, 102,097 FullOpts). MISSED contexts: 2 (0.00%) Overall (-9,469,872 bytes)
MinOpts (-2,342,108 bytes)
FullOpts (-7,127,764 bytes)
Example diffslinux.riscv64.Checked.3.mch-48 (-41.38%) : 53568.dasm - Program:Main():int (FullOpts)@@ -20,35 +20,23 @@ G_M24375_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mv fp, sp
;; size=16 bbWeight=1 PerfScore 9.00
G_M24375_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- lui t6, 0xD1FFAB1E
- addiw t6, t6, 0xD1FFAB1E
- slli t6, t6, 13
- addi t6, t6, 0xD1FFAB1E
- slli t6, t6, 2
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
jalr a0 // <unknown method>
- lui t6, 0xD1FFAB1E
- addiw t6, t6, 0xD1FFAB1E
- slli t6, t6, 13
- addi t6, t6, 0xD1FFAB1E
- slli t6, t6, 2
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
jalr a0 // <unknown method>
- lui t6, 0xD1FFAB1E
- addiw t6, t6, 0xD1FFAB1E
- slli t6, t6, 13
- addi t6, t6, 0xD1FFAB1E
- slli t6, t6, 2
- ld a0, 0xD1FFAB1E(t6)
+ auipc a0, 0xD1FFAB1E
+ ld a0, 0xD1FFAB1E(a0)
jalr a0 // Test:Ret():int
- ;; size=84 bbWeight=1 PerfScore 30.00
+ ;; size=36 bbWeight=1 PerfScore 15.00
G_M24375_IG03: ; bbWeight=1, epilog, nogc, extend
ld ra, 8(sp)
ld fp, 0(sp)
addi sp, sp, 16
ret ;; size=16 bbWeight=1 PerfScore 7.50
-; Total bytes of code 116, prolog size 16, PerfScore 46.50, instruction count 17, allocated bytes for code 116 (MethodHash=d9b8a0c8) for method Program:Main():int (FullOpts)
+; Total bytes of code 68, prolog size 16, PerfScore 31.50, instruction count 14, allocated bytes for code 68 (MethodHash=d9b8a0c8) for method Program:Main():int (FullOpts)
; ============================================================
Unwind Info:
@@ -59,7 +47,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 58 (0x0003a) Actual length = 116 (0x000074)
+ Function Length : 34 (0x00022) Actual length = 68 (0x000044)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)-32 (-40.00%) : 54962.dasm - ExceptionExtensions:RethrowWithNoStackTraceLoss(System.Exception) (FullOpts)@@ -21,26 +21,18 @@ G_M64910_IG01: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=16 bbWeight=0 PerfScore 0.00
G_M64910_IG02: ; bbWeight=0, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
; gcrRegs +[a0]
- lui t6, 0xD1FFAB1E
- addiw t6, t6, 0xD1FFAB1E
- slli t6, t6, 12
- addi t6, t6, 0xD1FFAB1E
- slli t6, t6, 3
- ld a1, 0xD1FFAB1E(t6)
+ auipc a1, 0xD1FFAB1E
+ ld a1, 0xD1FFAB1E(a1)
jalr a1 // System.Runtime.ExceptionServices.ExceptionDispatchInfo:Capture(System.Exception):System.Runtime.ExceptionServices.ExceptionDispatchInfo
- lui t6, 0xD1FFAB1E
- addiw t6, t6, 0xD1FFAB1E
- slli t6, t6, 12
- addi t6, t6, 0xD1FFAB1E
- slli t6, t6, 3
- ld a1, 0xD1FFAB1E(t6)
+ auipc a1, 0xD1FFAB1E
+ ld a1, 0xD1FFAB1E(a1)
lw zero, 0xD1FFAB1E(a0)
jalr a1 // System.Runtime.ExceptionServices.ExceptionDispatchInfo:Throw():this
; gcrRegs -[a0]
ebreak
- ;; size=64 bbWeight=0 PerfScore 0.00
+ ;; size=32 bbWeight=0 PerfScore 0.00
-; Total bytes of code 80, prolog size 16, PerfScore 0.00, instruction count 12, allocated bytes for code 80 (MethodHash=95e60271) for method ExceptionExtensions:RethrowWithNoStackTraceLoss(System.Exception) (FullOpts)
+; Total bytes of code 48, prolog size 16, PerfScore 0.00, instruction count 10, allocated bytes for code 48 (MethodHash=95e60271) for method ExceptionExtensions:RethrowWithNoStackTraceLoss(System.Exception) (FullOpts)
; ============================================================
Unwind Info:
@@ -51,7 +43,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 40 (0x00028) Actual length = 80 (0x000050)
+ Function Length : 24 (0x00018) Actual length = 48 (0x000030)
---- Epilog scopes ----
No epilogs
---- Unwind codes -----40 (-38.46%) : 856.dasm - Interop:g__ParentDirectoryExists|7_0(System.String):bool (FullOpts)@@ -22,25 +22,15 @@ G_M36700_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=16 bbWeight=1 PerfScore 9.00
G_M36700_IG02: ; bbWeight=1, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
; gcrRegs +[a0]
- lui t6, 0xD1FFAB1E
- addiw t6, t6, 0xD1FFAB1E
- slli t6, t6, 13
- addi t6, t6, 0xD1FFAB1E
- slli t6, t6, 2
- ld a1, 0xD1FFAB1E(t6)
+ auipc a1, 0xD1FFAB1E
+ ld a1, 0xD1FFAB1E(a1)
jalr a1 // System.IO.PathInternal:TrimEndingDirectorySeparator(System.String):System.String
- lui t6, 0xD1FFAB1E
- addiw t6, t6, 0xD1FFAB1E
- slli t6, t6, 32
- srli t6, t6, 18
- ld a1, 0xD1FFAB1E(t6)
+ auipc a1, 0xD1FFAB1E
+ ld a1, 0xD1FFAB1E(a1)
jalr a1 // System.IO.Path:GetDirectoryName(System.String):System.String
- lui t6, 0xD1FFAB1E
- addiw t6, t6, 0xD1FFAB1E
- slli t6, t6, 32
- srli t6, t6, 18
- ld a1, 0xD1FFAB1E(t6)
- ;; size=72 bbWeight=1 PerfScore 25.00
+ auipc a1, 0xD1FFAB1E
+ ld a1, 0xD1FFAB1E(a1)
+ ;; size=32 bbWeight=1 PerfScore 12.00
G_M36700_IG03: ; bbWeight=1, epilog, nogc, extend
ld ra, 8(sp)
ld fp, 0(sp)
@@ -48,7 +38,7 @@ G_M36700_IG03: ; bbWeight=1, epilog, nogc, extend
jr a1 // System.IO.Directory:Exists(System.String):bool
;; size=16 bbWeight=1 PerfScore 7.50
-; Total bytes of code 104, prolog size 16, PerfScore 41.50, instruction count 16, allocated bytes for code 104 (MethodHash=027470a3) for method Interop:<GetExceptionForIoErrno>g__ParentDirectoryExists|7_0(System.String):bool (FullOpts)
+; Total bytes of code 64, prolog size 16, PerfScore 28.50, instruction count 13, allocated bytes for code 64 (MethodHash=027470a3) for method Interop:<GetExceptionForIoErrno>g__ParentDirectoryExists|7_0(System.String):bool (FullOpts)
; ============================================================
Unwind Info:
@@ -59,7 +49,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 52 (0x00034) Actual length = 104 (0x000068)
+ Function Length : 32 (0x00020) Actual length = 64 (0x000040)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)+0 (0.00%) : 104032.dasm - Microsoft.Diagnostics.Tracing.TraceEvent:get_FormattedMessage():System.String:this (FullOpts)@@ -22,6 +22,7 @@ G_M51530_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
G_M51530_IG02: ; bbWeight=1, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
; gcrRegs +[a0]
mv a1, zero
+ ; gcrRegs +[a1]
ld a2, 0xD1FFAB1E(a0)
ld a2, 0xD1FFAB1E(a2)
ld a2, 0xD1FFAB1E(a2)+0 (0.00%) : 57024.dasm - UDivConst:U8_Div_3(ulong):ulong (FullOpts)No diffs found? +0 (0.00%) : 56976.dasm - SlowPathELTTests.SlowPathELTHelpers:MixedSseStructFunc():SlowPathELTTests.MixedSseStruct (FullOpts)No diffs found? DetailsSize improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze output |
@@ -22,6 +22,7 @@ G_M51530_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
G_M51530_IG02: ; bbWeight=1, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byref
; gcrRegs +[a0]
mv a1, zero
+ ; gcrRegs +[a1]
ld a2, 0xD1FFAB1E(a0)
ld a2, 0xD1FFAB1E(a2)
ld a2, 0xD1FFAB1E(a2)Looks like a bonus fix. Notice how this code I removed also silently stripped GC flags. runtime/src/coreclr/jit/codegenriscv64.cpp Lines 902 to 905 in f1cbe0f
|
|
We will review .NET 11 PRs when we are done with .NET 10 last minute works and less busy. Please give us a few more weeks. |
|
@EgorBo Do we need a fix for arm64 version of the code @tomeksowi points at above? runtime/src/coreclr/jit/codegenarm64.cpp Lines 2098 to 2101 in 25e088f
For address 0 it is clearly not a problem -- would it be a problem for frozen objects? |
Note that it's generally impossible to know where the code will be located, so you can only do this heuristically. How do you recover when you get it wrong? For x64 we recover the situation by disabling rip-relative addressing globally the first time it happens, and then designing things so that it very rarely happens. Look for |
Do you mean it may strip the GC related flags for frozen objects? I think we don't report them to GC anyway, do we? |
I don't recover, the address of a handle with function address was always within range, but you're right, I'll try to use it, maybe communicating failure with |
|
For x64 there is a contract between the JIT and VM about when and how relative addressing is used: for data addresses PC-relative addressing can only be used based on If you are using PC-relative addressing for data addresses then I expect you need to build a similar system. If you are only using it for code addresses, then I expect you need to support some form of jump stubs. |
Yes, that's what I meant. I was not sure if they GC refs to frozen objects need to be reported or not. If not then it seems ok. |
I think in fact we do report them today, but don't have to |
|
OK, I cleared assumptions about code being always allocated within 32-bit range of the helper function. I also added jump stubs while I'm here. Ready for review @jakobbotsch @dotnet/jit-contrib |
src/coreclr/vm/jitinterface.cpp
Outdated
|
|
||
| if (putTargetIntoAuipcCombo(target)) | ||
| break; // success | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this use the same structure as Amd64 and Arm64 cases?
src/coreclr/vm/jitinterface.cpp
Outdated
| INT32 lo12 = (offset << (64 - 12)) >> (64 - 12); | ||
| INT32 hi20 = INT32(offset - lo12); | ||
|
|
||
| if (INT64(hi20) + INT64(lo12) != offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to move this into a FitsIn.... helper method that's next to PutRiscV64AuipcItype, and keep PutRiscV64AuipcItype signature as is so that callers do not need to be aware of the encoding details and limitations.
src/coreclr/vm/jitinterface.cpp
Outdated
| enum | ||
| { | ||
| OpcodeJalr = 0x67, | ||
| OpcodeMask = 0x7F, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would look better if there is a separate reloc type for IP vs. data so that we do not need to disassemble the instructions here.
am11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For linux-musl-riscv64
Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
| BinaryPrimitives.WriteInt64LittleEndian(relocationEntry.Slice(16), symbolicRelocation.Addend); | ||
| relocationStream.Write(relocationEntry); | ||
|
|
||
| // TODO: This is wrong, LO12 needs to point to a label (previous instruction with HI20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // TODO: This is wrong, LO12 needs to point to a label (previous instruction with HI20) | |
| // TODO-RISCV64: This is wrong, LO12 needs to point to a label (previous instruction with HI20) |
Up till this point in your PR, AOT smoke tests are passing on glibc and musl. If you have a repro, maybe would be nice to add a tests in one of the categories under src/tests/nativeaot/SmokeTests (or its own category) to make it clear what doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CALL_PLT relocation is meant for auipc+jalr. Emitting it always (also for loads and addi) could have worked by mistake, these instructions have the same I-type layout as jalr. This won't be the case for stores as the S-type format is slightly different.
I came up with a relocation pattern that matches the ELF spec better:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#pc-relative-symbol-addresses
In the future we may support hoisting the auipc for optimizations similar to #121739 but currently I'm pushing to get PC-relative ops widespread as being very frequent they bring significant savings over synthesizing absolute addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But testing sounds a good idea, I'll try to add once I figure out how to build; it complains about objcopy in my env not recognizing the format (true) but ./src/tests/build.sh -cross -rebuild -nativeaot ... -p:ObjCopyName=/usr/bin/llvm-objcopy-18 doesn't have an effect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my workflow isn't propagating the exit code from qemu VM to host when test fails (I will see if it's possible at all since I thought it's handled by the if [ "$fail" -eq 1 ]; then exit 1; fi condition in workflow), but your branch is actually failing an AOT smoke test. I've checked it doesn't show up on main.
main: https://github.com/am11/CrossRepoCITesting/actions/runs/19540902736/job/55946632081 (git HEAD is at: 56ce9bd28dfd46b3b99d64825cea94b7bacbf4f3)
pr: https://github.com/am11/CrossRepoCITesting/actions/runs/19540949235/job/55946805017 (git HEAD is at: c2ab64490f307bf93e28bd734e649e958888b17d)
In run tests in qemu section, it shows:
--------------------------------
Running Test:
Caught Unexpected exception:System.NullReferenceException: Object reference not set to an instance of an object.
at CoreFXTestLibrary.Internal.Runner.RunTestMethod(TestInfo) + 0x414
at CoreFXTestLibrary.Internal.Runner.RunTest(TestInfo) + 0x48
---- Test FAILED ---------------
Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
at CoreFXTestLibrary.Internal.Runner.RunTests(TestInfo[], String[]) + 0x80
at EntryPointMain.Main(String[] args) + 0x2a78
Aborted (core dumped)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But testing sounds a good idea, I'll try to add once I figure out how to build; it complains about
objcopyin my env not recognizing the format (true) but./src/tests/build.sh -cross -rebuild -nativeaot ... -p:ObjCopyName=/usr/bin/llvm-objcopy-18doesn't have an effect
You can use llvm-objcopy from the toolchain. The container I am using is from dotnet prereqs repo, same one is used in CI here. They have all the build prerequisites installed (including llvm-objcopy; all tools in llvm toolchain are multi-target, so any host/target arch combo works).
RISC-V is ill-equipped to synthesize large >32-bit constants like absolute pointers, it results in a 6-8 instruction sequence. If an address can be synthesized as PC-relative (2 instructions:
auipc + ld/addi), do so. It also simplifies codegen.Part of #84834, cc @dotnet/samsung