Skip to content

Conversation

janvorli
Copy link
Member

There were some GC holes discovered caused by the fact that GC can be triggered during 2nd pass of EH in-between calls to finally handlers and catch handler. After considering options, moving the 2nd pass to native code seems the most reasonable solution.

There were some GC holes discovered caused by the fact that GC can be
triggered during 2nd pass of EH in-between calls to finally handlers and
catch handler. After considering options, moving the 2nd pass to native
code seems the most reasonable solution.
@janvorli janvorli added this to the 10.0.0 milestone Sep 18, 2025
@janvorli janvorli self-assigned this Sep 18, 2025
@Copilot Copilot AI review requested due to automatic review settings September 18, 2025 20:04
@janvorli janvorli added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) area-ExceptionHandling-coreclr labels Sep 18, 2025
@janvorli
Copy link
Member Author

The current state is that it almost works. A catch handler is correctly invoked, but then something crashes.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR moves the CoreCLR exception handling second pass from managed to native code to fix GC holes that can occur when GC is triggered between finally handlers and catch handlers during the second pass of exception handling.

  • Splits existing RhThrowEx/RhThrowHwEx methods into separate first-pass handler finding methods and second-pass execution
  • Moves the second pass execution logic (DispatchExPass2) from managed to native code
  • Removes QCALL exports for CallCatchFunclet and CallFinallyFunclet as they're now called directly from native code

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/coreclr/vm/qcallentrypoints.cpp Removes QCALL exports for CallCatchFunclet and CallFinallyFunclet
src/coreclr/vm/metasig.h Updates method signatures to include handlingFrameSP and pCatchHandler out parameters
src/coreclr/vm/exinfo.h Adds ContainsCodeOffset helper method to RhEHClause
src/coreclr/vm/exceptionhandlingqcalls.h Removes QCALL declarations for catch/finally funclets
src/coreclr/vm/exceptionhandling.h Adds declaration for new native DispatchExPass2 function
src/coreclr/vm/exceptionhandling.cpp Major refactoring: converts QCALL functions to native, adds new DispatchExPass2 implementation
src/coreclr/vm/excep.cpp Updates managed fault handling to use new method signatures
src/coreclr/vm/corelib.h Updates method name mappings for exception handling entry points
src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/ExceptionHandling.cs Splits exception handling into separate first/second pass methods

@janvorli
Copy link
Member Author

Also, please note that the collided unwind detection is not working yet, as I cannot rely on the pinvokes from the managed EH code anymore. I am working on a fix.
With the contract and EH clauses enumeration fixes based on David's feedback, many EH tests now pass until I hit one with collided unwind. I am reusing the interpreter test ran without interpreter for now.

@janvorli
Copy link
Member Author

And one more thing not done yet is rethrow.

* Reflect PR feedback
* Implement rethrow
* Implement new way of collided unwind detection now that the
  CallCatchFunclet is not called via pinvoke
* Remove forced reporting of EH code from stack frame iterator, as we
  now cannot have that code on the stack during 2nd pass
@jkotas jkotas added the arch-wasm WebAssembly architecture label Sep 19, 2025
Copy link
Contributor

Tagging subscribers to 'arch-wasm': @lewing, @pavelsavara
See info in area-owners.md if you want to be subscribed.

@janvorli
Copy link
Member Author

I am considering reducing the number of changes in the exceptionhandling.cs by a slightly different approach - keeping only the methods we had and adding an extra argument "firstPassOnly" to them. Then if that argument is set to true, in DispatchEx store the handlingFrameSP/PC, _pReversePInvokePropagationCallback/Context and the pCatchHandler into the ExInfo and return;
The native side would then take these from the ExInfo. On NativeAOT, this argument would be ignored.

The current state is that, with some additional changes that I am going to commit after I fix the offsets in StackFrameIterator for some targets, it passes all diagnostics tests and coreclr tests (except ForeignThreadExceptionTest where it ends up messing GC mode when calling managed callback from native code catch)

I also need to add proper contracts here and there.

@davidwrighton
Copy link
Member

I would add some correlation comments so that we know that C++ function X is supposed to be the same as C# function Y. But otherwise, this actually looks good to me (assuming it works of course)

@janvorli
Copy link
Member Author

@jkotas I've added some contracts. I also had to change contract on the Debugger::FirstChanceManagedExceptionCatcherFound from GC_TRIGGERS_FROM_GETJITINFO to GC_NOTRIGGER. I cannot find how the GC can get triggered by the GetJitInfo (and it has GC_NOTRIGGER contract), so I hope I haven't missed anything.

@janvorli
Copy link
Member Author

There is one issue left that I am investigating - Linux arm64 failures in several libraries tests.

@janvorli janvorli changed the title [WIP] Move coreclr EH second pass to native code Move coreclr EH second pass to native code Sep 22, 2025
@janvorli janvorli removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Sep 22, 2025
@janvorli
Copy link
Member Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@janvorli
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

This is done to ensure that no GC is allowed between the scanned stack
range is extended and a funclet for the current frame is called.
@janvorli
Copy link
Member Author

@jkotas, @davidwrighton the current state of this PR is final from my point of view. Can you please review it?


#if TARGET_ARM64
[FieldOffset(AsmOffsets.OFFSETOF__ExInfo__m_handlingFramePC)]
internal volatile byte* _handlingFramePC;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a comment somewhere that explains what's special about Arm64 that it needs the handlingFramePC ?


extern "C" CLR_BOOL QCALLTYPE EHEnumNext(EH_CLAUSE_ENUMERATOR* pEHEnum, RhEHClause* pEHClause)
// The doNotCalculateCatchType option makes the function skip calculation of the catch type. It is used in
// the 2nd path of EH to avoid possible GC stemming from a call to ResolveEHClause.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: path -> pass?

MethodDesc *pMD = exInfo->m_frameIter.m_crawl.GetFunction();
// Profiler, debugger and ETW events
TADDR spForDebugger = GetSpForDiagnosticReporting(pvRegDisplay);
exInfo->MakeCallbacksRelatedToHandler(true, pThread, pMD, &exInfo->m_CurrentClause, (DWORD_PTR)pHandlerIP, spForDebugger);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how your are protecting the callbacks here against triggering GCs. AFAIK BEGINFORBIDGC and ENDFORBIDGC are debug only macros. They have no effect in retail builds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callbacks that are called here don't trigger GC, there is a subset of profiler callbacks that are documented here https://github.com/dotnet/runtime/blob/fc1dee66fac8ae088d01d07b4a2faaeef15a234a/docs/design/coreclr/botr/profiling.md#gc-safe-callouts
that runtime calls without switching to preemptive mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants