Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] Adding CET support #102680

Merged
merged 18 commits into from
Jul 2, 2024
Merged

[NativeAOT] Adding CET support #102680

merged 18 commits into from
Jul 2, 2024

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented May 25, 2024

Fixes: #101942

  • Added support for STATUS_RETURN_ADDRESS_HIJACK_ATTEMPT
  • Set /CETCOMPAT in executables by default, unless /CETCOMPAT:NO
  • Opt into EHCONT (if CFG is enabled)
  • Reconcile shadow stack with SP changes in RhpCallCatchFunclet

Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

src/coreclr/vm/excep.cpp Outdated Show resolved Hide resolved
@VSadov
Copy link
Member Author

VSadov commented May 25, 2024

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

// NOTE: When thunks are present, the thunk sequence may report a conservative GC reporting
// lower bound that must be applied when processing the managed frame.

ReturnAddressCategory category = CategorizeUnadjustedReturnAddress(m_ControlPC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how this is related to the CET stuff

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InternalInit is supposed to leave the iterator in managed code. Until this change the InternalInit that takes NATIVE_CONTEXT was always starting from a location that is already in managed code. (unlike the one that takes PInvokeTransitionFrame)

Now, when we receive hijack exception, we pop to the caller and perform suspension in the caller's context.
Typically managed method that is not a reverse PInvoke must only be called by another managed method, so that would require no changes here. There are rare cases though in NativeAOT when ordinary managed methods are called from asm thunks. So it becomes possible for this code to see a location in a thunk. That is handled in the same way as in the other InternalInit method. (i.e. via UnwindNonEHThunkSequence)

Perhaps I should unify this code into a helper to not duplicate it in both InternalInit methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've unified the adjustment for asm thunks

@VSadov
Copy link
Member Author

VSadov commented May 29, 2024

Turns out my CET-compatible machine already enables usermode shadow stacks for executables that are CET-compatible. (i.e. msedge.exe, conhost.exe, ...). No extra steps necessary. This probably came with some OS update.

If we pass /CETCOMPAT to linker, NativeAOT executables are also opted in. And tests are passing, while running in "Compatible modules only" mode. (same as msedge.exe and others).

The next step is to make sure we are not relying on some compat/fixup behavior in exception handling, and whether we need /guard:ehcont

@@ -99,6 +99,10 @@ The .NET Foundation licenses this file to you under the MIT license.
<LinkerArg Condition="'$(OutputType)' == 'WinExe' or '$(OutputType)' == 'Exe'" Include="/STACK:$(IlcDefaultStackSize)" />
<!-- Do not warn if someone declares UnmanagedCallersOnly with an entrypoint of 'DllGetClassObject' and similar -->
<LinkerArg Include="/IGNORE:4104" />
<!-- Opt into CETCOMPAT by default. -->
<LinkerArg Condition="'$(ShadowStacksCompatible)' != 'false'" Include="/CETCOMPAT" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call this property CETCompat. It is what it is called in .vcxproj files.

Copy link
Member Author

@VSadov VSadov May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we may want make it x64-only, just in case it starts doing something for x86.
According to docs /CETCOMPAT is accepted for x86 and currently does nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want make it x64-only,

Sounds good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I will start doing something for arm64 eventually.)

Copy link
Member Author

@VSadov VSadov May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also /CETCOMPAT seems not accepted on arm64. Not yet, I guess.

@VSadov
Copy link
Member Author

VSadov commented May 29, 2024

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented May 29, 2024

/guard:ehcont is a bit tricky. Once that is set, things like following start failing RIP validation in NtContinue/RtlRestoreContext .

    if (translateToManagedException)
    {
        pExPtrs->ContextRecord->SetIp(PCODEToPINSTR((PCODE)&RhpThrowHwEx));
        pExPtrs->ContextRecord->SetArg0Reg(faultCode);
        pExPtrs->ContextRecord->SetArg1Reg(faultingIP);

        return EXCEPTION_CONTINUE_EXECUTION;
    }

I think .GEHCONT might be a solution to this.

@VSadov
Copy link
Member Author

VSadov commented May 29, 2024

The other interesting part is that /guard:ehcont requires CFG which we do not enable by default, as there are reasons.
One possibility is to enable /guard:ehcont only when CGF is already enabled.

For now I will assume that we will be enabling /guard:ehcont one way or another, together with CFG. We need to make the scenario work before settling under which conditions it is enabled.

@VSadov VSadov force-pushed the aotCET01 branch 3 times, most recently from 73f484b to 051c2d7 Compare May 29, 2024 23:22
@VSadov
Copy link
Member Author

VSadov commented May 29, 2024

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -786,7 +786,7 @@ bool GcInfoDecoder::EnumerateLiveSlots(

UINT32 normStart = lastNormStop + normStartDelta;
UINT32 normStop = normStart + normStopDelta;
if(normBreakOffset >= normStart && normBreakOffset < normStop)
if(normBreakOffset >= normStart && normBreakOffset <= normStop)
Copy link
Member Author

@VSadov VSadov Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an ugly hack to workaround a rare case where a call to a hijacked method is immediately followed by a NoGC region thus the caller IP is outside of interruptible code.
It is always safe to pop to the caller of a regular managed method and start stackwalk at the call site, but when we do that as a part of hijack and the above condition is present, it triggers asserts if caller is fully-interruptible because we end up trying a stackwalk in a NoGC region.

A proper fix for this condition is in PR: #103540

@VSadov
Copy link
Member Author

VSadov commented Jun 17, 2024

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented Jun 19, 2024

I think the technical part of this PR is complete - we can enable CET for NativeAOT. What remains is to settle on enablement policy and how to test this.

What I can suggest for enablement:

  • The basic opt-in to CET could happen unconditionally.
    This is the part where we mark binary as CET compatible and if OS enables CET, we use different APIs for hijacking.
    This is the state that mitigates typical ROP attacks, but things like exception handling could work as before and rely on OS fixups (we would not be relying on that).
    In this mode the actual binary is roughly the same and the perf should be roughly the same.

It may be possible that some environments will be configured to treat this as "not enough CET" and require EHCONT as well.

  • The next level of opt-in is opting into EHCONT
    The inconvenient part here is that it will also require opting into CFG, which has known costs in terms of size and throughput.
    I can suggest to opt into EHCONT automatically if user has already opted into CFG.

Basically:

  • If user explicitly opts out of CET (as in /CETCOMPAT:NO), then nothing is enabled. Binary is not marked for CET
  • By default the binary is marked as CET compatible, but not opted into EHCONT
  • If CFG is enabled, we also opt into EHCONT.

I am basically optimizing for minimum number of knobs and assume that CET without EHCONT could be a good default.

Any other ideas?
@jkotas @janvorli

Re: https://techcommunity.microsoft.com/t5/windows-os-platform-blog/developer-guidance-for-hardware-enforced-stack-protection/ba-p/2163340

@VSadov
Copy link
Member Author

VSadov commented Jun 19, 2024

For the testing part - it looks like we have a bunch of tests that are incompatible with CFG.
A PR without any of these changes, but with forcing CFG (#103215) demonstrates that Libraries tests mostly run, but smoke tests mostly fail.

We need to do something about that - either make tests compatible with CFG (where possible) or conditionally disable some tests/scenarios if can't made compatible.
Perhaps have a leg in runtime-nativeaot-outerloop that enables CFG?

@MichalStrehovsky

@jkotas
Copy link
Member

jkotas commented Jun 19, 2024

I am basically optimizing for minimum number of knobs and assume that CET without EHCONT could be a good default.

Sounds good to me.

@jkotas
Copy link
Member

jkotas commented Jun 19, 2024

For the testing part - it looks like we have a bunch of tests that are incompatible with CFG.

At least some of these failures look like product bugs to me. We want to fix them - CFG is a supported config for naot. Have you looked into where the problems are?

Perhaps have a leg in runtime-nativeaot-outerloop that enables CFG?

I agree that we can use more outer loops tests with CFG enabled given these results.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@VSadov
Copy link
Member Author

VSadov commented Jul 2, 2024

Thanks!!!

@VSadov VSadov merged commit c8e4f2c into dotnet:main Jul 2, 2024
89 checks passed
@VSadov VSadov deleted the aotCET01 branch July 2, 2024 19:47
@github-actions github-actions bot locked and limited conversation to collaborators Aug 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NativeAOT] Enable CET support
5 participants