Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable for PAC while compiling coreclr (not the jitted code) #108561

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

SwapnilGaikwad
Copy link
Contributor

Arm64 added a Pointer Authentication Code (PAC) extension in ARMv8.3. One of Its aims is to mitigate certain attacks that rely on corrupting the return address on the stack, such as return-oriented programming (ROP). Clang and GCC added support for PAC-RET that can be enabled using the compilation flag -mbranch-protection=pac-ret.
This patch enables PAC-RET for the libraries and binaries in CoreCLR compiled using clang on UNIX-based platforms.

It is important to note that the patch would enable PAC-RET for the JIT but not the code emitted from the JIT. Thus, the emitted code should remain unchanged by this change. However, adding PAC support leads to additional instructions in the compiled CoreCLR code for signing and authenticating return addresses and function pointers. It increases the size of compiled files slightly. For the libclrjit.so we noted an increase in size by 1.4%. Execution of additional instructions may also increase the compilation time. We saw compilation time increased by 1.6% (+/- 0.37%) while compiling all the methods in benchmarks.run_pgo.linux.arm64.checked.mch using superpmi on a Graviton 3 system.

Please find the details below. They include values when the Branch Target Identification (BTI) is enabled. As there are not BTI enabled machines available to test, not enabling it.

Execution Time:

benchmarks.run.linux.arm64.checked.mch
                                         Slowdown
Base         :  72023.25ms (+/- 0.37%)   (0.00%)
PAC          :  72782.88ms (+/- 0.24%)   (1.05%)
PAC+BTI      :  72781.52ms (+/- 0.34%)   (1.05%)

benchmarks.run_pgo.linux.arm64.checked.mch
                                         Slowdown
Base        :  277104.76ms (+/-0.35%)    (0.00%)
PAC         :  281714.87ms (+/-0.37%)    (1.66%)
PAC+BTI     :  282553.39ms (+/-0.38%)    (1.97%)

realworld.run.linux.arm64.checked.mch
                                        Slowdown
Base        :  56409.39ms (+/-0.30%)    (0.00%)
PAC         :  57162.73ms (+/-0.35%)    (1.34%)
PAC+BTI     :  57334.52ms (+/-0.41%)    (1.64%)

Setup

  • Used Release Build (bdcfb10eec930):
  • For PAC and PAC+BTI mode, compiled the jit and superpmi with the -mbranch-protection=pac-ret and -mbranch-protection=standard flags respectively. Then Dropped libclrjit.so* and superpmi (binary) in $CORE_ROOT
  • Used Graviton 3 that has following architecture features
CPU Flags:
fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng
  • Disable parallel execution of superpmi by explicitly calling superpmi binary without -p flag.

File sizes:

-rwxr-xr-x 1 user user 2942776 Oct 1 13:58 base/libclrjit.so
-rwxr-xr-x 1 user user 2985064 Oct 1 10:07 pac/libclrjit.so
-rwxr-xr-x 1 user user 3008024 Oct 1 10:07 pac+bti/libclrjit.so

Execution methodology

  • Executed benchmarks.run.linux.arm64.checked.mch, benchmarks.run_pgo.linux.arm64.checked.mch and realworld.run.linux.arm64.checked.mch using superpmi.
  • Each function was compiled 5 times and executed (with repeatCount=5).
  • All 3 modes (baseline, pac, pac+bti), executed 10 times and average time was used as a representative value for that mode.

@kunalspathak @a74nh @dotnet/arm64-contrib

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 4, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 4, 2024
@SwapnilGaikwad
Copy link
Contributor Author

Following numbers are from an altra (N1) machine that doesn't support PAC or BTI. On such machines, PAC and BTI instructions are executed as nops but it leads to slightly increased file sizes.
We can note from the execution times that the impact on JITs throughput is difficult to distinguish from the noise.

Execution Time:

benchmarks.run.linux.arm64.checked.mch
                                  Slowdown
Base    : 75300.75ms (+/- 0.48%)   (0.00%)
PAC     : 75802.75ms (+/- 0.30%)   (0.67%)
PAC+BTI : 76038.32ms (+/- 0.21%)   (0.98%)

benchmarks.run_pgo.linux.arm64.checked.mch
                                  Slowdown
Base    : 314841.64ms (+/-0.37%)   (0.00%)
PAC     : 317086.04ms (+/-0.61%)   (0.71%)
PAC+BTI : 318262.13ms (+/-0.78%)   (1.09%)

realworld.run.linux.arm64.checked.mch
                                 Slowdown
Base    : 65200.13ms (+/-0.88%)   (0.00%)
PAC     : 65382.44ms (+/-1.04%)   (0.28%)
PAC+BTI : 65680.45ms (+/-0.87%)   (0.74%)

File sizes:

-rwxr-xr-x 1 user user  2863400 Oct  4 12:56 base/libclrjit.so
-rwxr-xr-x 1 user user  2873592 Oct  4 12:56 pac/libclrjit.so
-rwxr-xr-x 1 user user  2908808 Oct  4 12:56 pac+bti/libclrjit.so

@jkotas
Copy link
Member

jkotas commented Oct 5, 2024

Thank you for pushing on adoption of security mitigations in this repo!

.NET runtime has a lot of level code that can be affected by changes like this. For example, we use libunwind to unwind through native code that is going to be compiled with pointer authentication enabled after this change. The debugging support is typically where we find the long-tail of issues from changes like this one. cc @tommcdon

@jkotas jkotas added arch-arm64 os-linux Linux OS (any supported distro) labels Oct 5, 2024
@jkotas jkotas requested a review from a team October 5, 2024 01:18
@jkotas jkotas added area-Infrastructure-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 5, 2024
Copy link
Contributor

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented Oct 5, 2024

the patch would enable PAC-RET for the JIT but not the code emitted from the JIT

The code produced by the JIT is more likely to be attacked since it is processing the input controlled by the attacker. It is rare for code in native runtime libraries to process input controller by the attacker. This change is general goodness, but it is not closing the most likely path of the ROP attacks in .NET.

@kunalspathak
Copy link
Member

the patch would enable PAC-RET for the JIT but not the code emitted from the JIT

The code produced by the JIT is more likely to be attacked since it is processing the input controlled by the attacker.
It is rare for code in native runtime libraries to process input controller by the attacker. This change is general goodness, but it is not closing the most likely path of the ROP attacks in .NET.

Yes, the original plan is to support it in JIT, but we started this work to get a sense of TP/size impact of PAC.

@a74nh
Copy link
Contributor

a74nh commented Oct 7, 2024

The code produced by the JIT is more likely to be attacked since it is processing the input controlled by the attacker. It is rare for code in native runtime libraries to process input controller by the attacker. This change is general goodness, but it is not closing the most likely path of the ROP attacks in .NET.

The native code is static so is known to the attacker ahead of time, so is a good place for the attacker to use to build a library of attack gadgets ahead of time. Whereas anything produced by the jit is harder to know in advance. But, as Kunal says, the plan would be to support in the Jit too.

Jit support should be fairly straightforward assuming the following:

  • CoreCLR does not move the location of program stacks. (as the return addresses on the stack will be encrypted using their location on the stack as the salt)
  • CoreCLR does not parse through stacks to rewrite the return addresses.

I'd recommend a rethink if either of those are the case as both of these are done by OpenJDK which made PAC-RET a lot more complicated and less secure.

For even more security, another feature that could be added is to encrypt the store of addresses that point to the location of jitted code. This would ensure that an attacker could not change them. That will be a self contained piece of work. I'm not aware of any other languaes which are doing that today.

The debugging support is typically where we find the long-tail of issues from changes like this one

Getting this PR in early will help find that long tail, rather than after the full PAC implementation.

.NET runtime has a lot of level code that can be affected by changes like this. For example, we use libunwind to unwind through native code that is going to be compiled with pointer authentication enabled after this change.

I see .NET has a copy of libunwind and llvm-libunwind. I'm not sure when either is used.

Looks like llvm-libunwind has support for PAC, eg:
src/native/external/llvm-libunwind/src/Registers.hpp line 2337:
uint32_t __pac; // Return Authentication Code (PAC)

But libunwind doesn't yet. A PR is here:
libunwind/libunwind#793
Which at worse case we could test and copy locally.

@jkotas
Copy link
Member

jkotas commented Oct 7, 2024

CoreCLR does not parse through stacks to rewrite the return addresses.

CoreCLR does this.

@a74nh
Copy link
Contributor

a74nh commented Oct 7, 2024

CoreCLR does not parse through stacks to rewrite the return addresses.

CoreCLR does this.

Is that because of tiered compilation?

That will require the function that rewrites the stack to first confirm the old address on the stack is still valid, then encrypt the new return address before storing it on the stack. If the new return address is ever in memory before being stored on the stack then that's potentially adding an exploit a hacker could exploit to write their own values to the stack.
Hence you then may want to consider encrypting the new return address whenever it gets stored to any memory location.

@jkotas
Copy link
Member

jkotas commented Oct 7, 2024

Is that because of tiered compilation?

It is because return address hijacking done as part thread suspension for GC: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/threading.md#hijacking

@a74nh
Copy link
Contributor

a74nh commented Oct 7, 2024

Is that because of tiered compilation?

It is because return address hijacking done as part thread suspension for GC: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/threading.md#hijacking

Ok, this is returning to a stub. So should be fine to have an encrypted pointer to that stub which itself is kept in a fixed location.

@janvorli
Copy link
Member

janvorli commented Oct 7, 2024

I see .NET has a copy of libunwind and llvm-libunwind. I'm not sure when either is used.

The llvm-libunwind is used by NativeAOT, the libunwind by coreclr except for macOS where llvm-libunwind is used too (since it is the default unwind library on macOS).

@janvorli
Copy link
Member

janvorli commented Oct 7, 2024

Ok, this is returning to a stub. So should be fine to have an encrypted pointer to that stub which itself is kept in a fixed location.

Just to make sure I understand how this would work. When we hijack a return address, we would need to modify both the LR and the register that stores the PAC, right?

@a74nh
Copy link
Contributor

a74nh commented Oct 7, 2024

Ok, this is returning to a stub. So should be fine to have an encrypted pointer to that stub which itself is kept in a fixed location.

Just to make sure I understand how this would work. When we hijack a return address, we would need to modify both the LR and the register that stores the PAC, right?

The address of the GC stub we would keep in a global variable, meaning all access to it would hardcoded with fixed addresses in the assembly. This prevents the hacker from making us look at different variable.

To rewrite the stack, first load the existing return address from the stack and un-encrypt it to make sure it's a valid value (faulting if not). Then load the stub value into a register, unencrypt it (faulting if invalid), encrypt it for the stack, then store to the stack.

The encrypt and unecrypt instruction just requires the value to encypt, name of the key to use (A or B), and the salt (address on the stack). There is no pac register as such (except for the secret keys)

A little more background:

The assumption here is that the attacker has gotten the ability the change writable memory in the process (possibly only the stack) and read executable memory. They do not have the ability to change readonly memory, change the control flow or change/access register contents. The goal of the attacker is to make the program execute arbitrary code. This can be done by editing the return addresses on the stack. When the program returns, it now jumps back to code the attacker wants to run. This in itself is not that useful as the attacker is limited to functions available and register contents. By looking through all executable memory they look for small groups of instructions directly proceeding a return instruction. These are "gadgets" which simply change a register or write a bit of memory. By chaining gadgets together using return addresses the attacker now can execute whatever they want. Tools exist to look at the executable code of a known program (or library) and build a library of gadgets (which is why I'm concerned about protection of CoreCLR code over jitted code).

PAC-RET works because the return address stays in a register LR (which the attacker cannot access) and only goes to memory when saved to the stack, which is encrypted before the store. When loading from the stack we unencrypt, and fault on an error. To modify the address, the hacker would need to know the secret per-process key. The hacker can't simply replace it with a different encrypted value as the location on the stack is used as a salt in the encryption, meaning every encrypted value is pinned to that location.

@janvorli
Copy link
Member

janvorli commented Oct 7, 2024

@a74nh thank you for the details. Based on reading PAC doc here, their example was using PAC to encrypt the LR using SP and store the result in another register and then pushed both the LR and that result register. That made me think that the LR is pushed unencrypted and then the computed key is used to validate LR before return.

@jkotas
Copy link
Member

jkotas commented Oct 7, 2024

Tools exist to look at the executable code of a known program (or library) and build a library of gadgets (which is why I'm concerned about protection of CoreCLR code over jitted code).

The typical .NET app has a lot of AOT (R2R) compiled code. It is as predictable as unmanaged runtime code and its size is about an order magnitude bigger compared to size of the unmanaged runtime code.

@hoyosjs
Copy link
Member

hoyosjs commented Oct 7, 2024

This might also break Out of Process stack walking in the DAC, although I'm not sure since the only part of coreclr we'd care about is FCall

@a74nh
Copy link
Contributor

a74nh commented Oct 10, 2024

More things to think about:


It may be possible to remove the use of the GS cookie when PAC is enabled.


The jit code will need to be triggered via a boolean. (if (config.PACenabled) { emit some pac instructions... } etc). Given that this boolean is itself vulnerable to attack then eventually we should either:

  1. put the bool into readonly memory asap after startup
  2. or, the jit should always emit PAC instructions on ARM64. On non-PAC hardware these instructions will be NOPs. (On Linux distros that support PAC, all tools and libraries are all built with PAC enabled. On non-PAC hardware these run fine).

Before enabling this by default on PAC enabled boxes, we should ensure the CI has PAC hardware. And consider what needs to be CI tested with and without PAC.


For now enable on Linux only. Consider Windows, MacOS etc later.


BTI is a separate technology that is starting to appear in the latest ARM hardware. When enabled, at a program branch, if the instruction at the target is not one in the small BTI subset (including some older instructions) then the program segfaults. Once PAC is fully enabled then we should consider BTI in a following .NET release.

@a74nh
Copy link
Contributor

a74nh commented Oct 14, 2024

https://llsoftsec.github.io/llsoftsecbook/ - This has a good description of PAC-RET and other security issues.

Arm also has GCS, but I suspect it'll be a few years before it appears in any real hardware. This is similar to CET in X64. Both provide a shadow for storing return addresses. Advantage here is that the same code should be reusable for both technologies. But I suspect it'll be tricky to implement.

@janvorli
Copy link
Member

But I suspect it'll be tricky to implement.

We have already support for Intel CET for Windows and we are planning to add support for it on Linux as well. Hopefully, the GCS implementation would be similar enough to not to add any surprises.

@a74nh
Copy link
Contributor

a74nh commented Oct 15, 2024

#47309 - Support for Intel CET

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-Infrastructure-coreclr community-contribution Indicates that the PR has been added by a community member os-linux Linux OS (any supported distro)
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

7 participants