Skip to content

Arm-64: Add initial support for PAC-RET #110472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 78 commits into
base: main
Choose a base branch
from

Conversation

SwapnilGaikwad
Copy link
Contributor

@SwapnilGaikwad SwapnilGaikwad commented Dec 6, 2024

This PR introduces initial support for Pointer Authentication (PAC) on Arm64. PAC is a hardware security feature designed to mitigate Return-Oriented Programming (ROP) attacks by cryptographically signing return addresses. The signed return address is stored on the stack and later authenticated before returning from a function, ensuring control flow returns to the intended caller.

More details on PAC and its role in software security can be found (here).

Enabling PAC involves inserting additional instructions into both the function prolog (for signing) and epilog (for authentication). This results in increased code size. For example, we observe a 1.8% increase in code size across System*.dll assemblies compiled using crossgen2.

The added instructions also introduce some runtime overhead. In our benchmark of Orchard CMS 9.0, we observe a 1.3% performance regression, which falls within the noise range (standard deviation: ~1.3%).

@kunalspathak @janvorli @a74nh

Contributes to #109457

@kunalspathak
Copy link
Contributor

@SwapnilGaikwad - once you resolve the stack walking problem, can you also try something like this in a Foo() method?

paciasp
stp     fp, lr, [sp, #-0x10]!

mov x9, [sp] ;  overwrite lr value with random content
...
...
xpaclri  ; will fail and should give the call stack
...

@SwapnilGaikwad
Copy link
Contributor Author

Hi @jkotas , I was wondering if you any suggestions to investigate a segfault. The NativeAot version of tests/GC/API/GC/GetTotalAllocatedBytes.cs test on Linux+arm64 is hitting a segfault after jumping to an address with signed pointer. The signed address points to RhpGcProbeHijack function in GcProbe.S. This function is used as default hijack target during return address hijacking so expecting it to have unsigned address.
The stacktrace (listed below) after the segfault suggests that an exception may have triggered unwinding. However, while debugging, it doesn't hit the RhpCallFilterFunclet. Printing more info for debugging causes test to pass :).

frame #0: 0x0000aaaaaad46a4c GetTotalAllocatedBytes`RhpGcProbeHijack at GcProbe.S:123
frame #1: 0x0000aaaaaad46a4c GetTotalAllocatedBytes`RhpCallFilterFunclet at ExceptionHandling.S:662

Any pointers to debug this? I'm using a debug build. Currently checking if any GC workflow jumps to this address.

@jkotas
Copy link
Member

jkotas commented Jun 29, 2025

Any pointers to debug this? I'm using a debug build.

I do not see this specific failure in the CI, but I see plenty of other failures. They are related to hijacking - we are trying to return to RhpGcProbeHijack, but the return address is not signed.

This function is used as default hijack target during return address hijacking so expecting it to have unsigned address.

The hijack target is returned up during return address hijacking. It is expected that the hijack target is signed - it is expected to happen here https://github.com/dotnet/runtime/pull/110472/files#diff-ab37b443cf4fe3a293dfbfddb896222f93c6b3c3630a6b023bb2c27399a841bdR838

Printing more info for debugging causes test to pass :).

Yes, crashes related to GC suspension tend to be timing dependent. You may want to create a more stressful repro to debug it - run the test routing the problem on one thread and run a for (;;) { GC.Collect(); Thread.Sleep(1); } in a loop on background thread.

@SwapnilGaikwad
Copy link
Contributor Author

The hijack target is returned up during return address hijacking. It is expected that the hijack target is signed - it is expected to happen here

Correct. However, we should jump to the hijack target after authenticating so shouldn't have it to be signed. You're right, it could be the case where we are incorrectly detecting PAC which may lead to jumping to this address without auth. I'll probably fix the detecting PAC using DWARF info and come back to this failure (hopefully it disappears :) ). I don't think currently I'm having correct starting unwind code.

Yes, crashes related to GC suspension tend to be timing dependent. You may want to create a more stressful repro to debug it - run the test routing the problem on one thread and run a for (;;) { GC.Collect(); Thread.Sleep(1); } in a loop on background thread.

Cool, I'll use this if the issue persists. Thanks for the pointers 👍

@risc-vv
Copy link

risc-vv commented Jul 4, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

@@ -105,7 +109,7 @@ inline PCODE GetIP(T_CONTEXT* context)
#elif defined(TARGET_ARM)
return (PCODE)context->Pc;
#elif defined(TARGET_ARM64)
return (PCODE)context->Pc;
return (PCODE) PacStripPtr((void *)context->Pc);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move these strip operations to the source while populating the context. These are temporarily added until we get all the CI issues fixed.

@AndyAyersMS
Copy link
Member

Have we tried running the diagnostic tests on this yet?

@risc-vv
Copy link

risc-vv commented Jul 8, 2025

RISC-V Release-CLR-QEMU: 9080 / 9110 (99.67%)
=======================
      passed: 9080
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9709
VIRTUAL time: 35h 9min 36s 335ms
   REAL time: 35min 56s 818ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 34401fb08d91f230ca12b3ff311cdf67fd54c858
CI: 78e142fd33020d1c98d51294d2e82d7c5be9fbf2
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@risc-vv
Copy link

risc-vv commented Jul 11, 2025

RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
=======================
      passed: 9083
      failed: 2
     skipped: 597
      killed: 28
------------------------
 TOTAL tests: 9710
VIRTUAL time: 11h 18min 11s 352ms
   REAL time: 45min 50s 923ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
=======================
      passed: 9082
      failed: 2
     skipped: 597
      killed: 28
------------------------
 TOTAL tests: 9709
VIRTUAL time: 37h 32min 14s 532ms
   REAL time: 38min 25s 818ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 0bc660ee660772ffd6445818cdae223c1345a9a4
CI: d6c9c1ab3a7411819463edc05ded301e89ba586a
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@amanasifkhalid
Copy link
Member

Have we tried running the diagnostic tests on this yet?

@steveisok is doing this now

@risc-vv
Copy link

risc-vv commented Jul 29, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants