[NativeAOT] Initialize COM before initializing FLS slot. #113194

VSadov · 2025-03-06T06:59:53Z

Order FLS slot initialization after ensuring that COM is initialized. This is to ensure that deinitialization of COM happen before detachment of a thread - as COM deinitialization may want to run managed code.

VSadov · 2025-03-06T17:02:31Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2025-03-06T17:02:50Z

Azure Pipelines successfully started running 1 pipeline(s).

jkotas · 2025-03-10T22:44:35Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2025-03-10T22:44:49Z

Azure Pipelines successfully started running 1 pipeline(s).

jkotas · 2025-03-10T22:51:50Z

src/coreclr/nativeaot/Runtime/threadstore.cpp

-    // The thread was already initialized, so it is already attached
+    if (pAttachingThread->IsDetached())
+    {
+        ASSERT_UNCONDITIONALLY("The thread has been detached already");


Can we make the error message to be identical to the CoreCLR one? https://github.com/dotnet/runtime/pull/112809/files#diff-63935b2e5a324d25feff725f83200eaa716c69fdef7529d2bd3cfa35620105aeR1776

Can we make the error message to be identical to the CoreCLR one?

Yes, we can do that.

Technically, a NativeAOT thread does not have a "destroyed" state. There is "detached" when we can't run managed code any more and then there is "thread is gone, because native TLS is gone". It seems we do not need a state in between and things get simpler without it.

For the purpose of error message the difference is too subtle though.

VSadov · 2025-03-10T23:37:16Z

I'll make this "Ready for review" because the change is basically complete.

What is left is to construct a repro that fails with no fix and does not fail after. Unlike with Core we do not have an existing scenario.
I have some instructions on how a repro could be constructed, but it is not completely straightforward. (COM+threading,..)

VSadov · 2025-03-11T02:48:09Z

I have confirmed that the problematic scenario (running managed code on detached thread) is reachable on baseline NativeAOT SDK and that this fix prevents it.

Here is my repro reduced to only what is necessary:

using System.Collections;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

namespace ConsoleApp57
{
    internal class Program
    {
        [DllImport("oleaut32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
        private static extern int SetErrorInfo(uint dwReserved, IntPtr perrinfo);

        static void Main(string[] args)
        {
            Thread thr1 = new Thread(Test);
            thr1.Start();
            System.Console.ReadLine();
        }

        private static void Test()
        {
            var cw = new ComWrappersImpl();
            IntPtr iUnknown = cw.GetOrCreateComInterfaceForObject(new object(), CreateComInterfaceFlags.None);

            SetErrorInfo(0, iUnknown);

            System.Console.WriteLine(Marshal.Release(iUnknown));
        }
    }
}


public class ComWrappersImpl : ComWrappers
{
    public const string IID_IErrorInfo = "1CF2B120-547D-101B-8E65-08002B2BD119";

    protected unsafe override ComInterfaceEntry* ComputeVtables(object obj, CreateComInterfaceFlags flags, out int count)
    {
        IntPtr fpQueryInterface = default;
        IntPtr fpAddRef = default;
        IntPtr fpRelease = default;
        ComWrappers.GetIUnknownImpl(out fpQueryInterface, out fpAddRef, out fpRelease);

        var vtblRaw = (IntPtr*)RuntimeHelpers.AllocateTypeAssociatedMemory(typeof(ComWrappersImpl), IntPtr.Size * 3);
        vtblRaw[0] = fpQueryInterface;
        vtblRaw[1] = fpAddRef;
        vtblRaw[2] = fpRelease;

        var entryRaw = (ComInterfaceEntry*)RuntimeHelpers.AllocateTypeAssociatedMemory(typeof(ComWrappersImpl), sizeof(ComInterfaceEntry));
        entryRaw->IID = new Guid(IID_IErrorInfo);
        entryRaw->Vtable = (IntPtr)vtblRaw;

        count = 1;
        return entryRaw;
    }

    protected override object CreateObject(IntPtr externalComObject, CreateObjectFlags flag)
        => throw new NotImplementedException();

    protected override void ReleaseObjects(IEnumerable objects)
        => throw new NotImplementedException();
}

If I run this compiled with baseline SDK, COM FLS callback fires after ours and runs managed S_P_CoreLib_System_Runtime_InteropServices_ComWrappers__IUnknown_Release on a detached thread. That typically does not cause problems by itself, since the default Release just decrements a counter and does not touch GC. I can imagine worse outcomes if GC is touched - like if something is allocated.

In this PR we are introducing a FailFast to catch such scenario and deterministically fail.

If I run same repro with the current PR, but with the call to CoInitialize removed, I get (as expected) a FailFast crash:

[0x0]   ntdll!NtRaiseException+0x14   0xcc5cff058   0x7ffaa2944182   
[0x1]   KERNELBASE!RaiseFailFastException+0x152   0xcc5cff060   0x7ff7446e8c1c   
[0x2]   ConsoleApp57!ThreadStore::AttachCurrentThread+0x36   (Inline Function)   (Inline Function)   
[0x3]   ConsoleApp57!ThreadStore::AttachCurrentThread+0x3c   0xcc5cff640   0x7ff7446e4e26   
[0x4]   ConsoleApp57!Thread::ReversePInvokeAttachOrTrapThread+0x85   (Inline Function)   (Inline Function)   
[0x5]   ConsoleApp57!RhpReversePInvokeAttachOrTrapThread2+0x96   0xcc5cff670   0x7ff7446b1b52   
[0x6]   ConsoleApp57!S_P_CoreLib_System_Runtime_InteropServices_ComWrappers__IUnknown_Release+0x12   0xcc5cff6a0   0x7ffaa3b4ae08   
[0x7]   combase!Microsoft::WRL::ComPtr<IUnknown>::InternalRelease+0x1e3   (Inline Function)   (Inline Function)   
[0x8]   combase!Microsoft::WRL::ComPtr<IUnknown>::{dtor}+0x1e3   (Inline Function)   (Inline Function)   
[0x9]   combase!TlsClearErrorInfo+0x22b   (Inline Function)   (Inline Function)   
[0xa]   combase!CoSetErrorInfo+0x244   (Inline Function)   (Inline Function)   
[0xb]   combase!wCoUninitialize+0x280   0xcc5cff6e0   0x7ffaa3b4b764   
[0xc]   combase!CoUninitialize+0x104   0xcc5cff750   0x7ffaa3b7a833   
[0xd]   combase!DoThreadSpecificCleanup+0x6b   0xcc5cff840   0x7ffaa3b7a769   
[0xe]   combase!FlsThreadCleanupCallback+0x29   0xcc5cff870   0x7ffaa4baf3a5   
[0xf]   ntdll!RtlpFlsDataCleanup+0x121   0xcc5cff8a0   0x7ffaa4b67498   
[0x10]   ntdll!LdrShutdownThread+0x48   0xcc5cff8e0   0x7ffaa4b9ec6e   
[0x11]   ntdll!RtlExitUserThread+0x3e   0xcc5cff9e0   0x7ffaa4a5737d   
[0x12]   KERNEL32!BaseThreadInitThunk+0x1d   0xcc5cffa20   0x7ffaa4b9cc91   
[0x13]   ntdll!RtlUserThreadStart+0x21   0xcc5cffa50   0x0

And if I run the repro sample with complete fix, I see no crash.

With the complete fix COM calls the Release while the thread is still attached and has no problems with running managed code.

VSadov · 2025-03-11T03:00:19Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2025-03-11T03:00:45Z

Azure Pipelines successfully started running 1 pipeline(s).

jkotas

LGTM. Thank you!

jkotas · 2025-03-11T05:02:25Z

src/coreclr/nativeaot/Runtime/windows/PalRedhawkMinWin.cpp

    {
-        // The current fiber is the home fiber of a thread, so the thread is shutting down
-        RuntimeThreadShutdown(lpFlsData);
+        ASSERT_UNCONDITIONALLY("Could not allocate an FLS slot.");


Would it be better to return false from the runtime init method when this happens, like before?

Would it be better to return false from the runtime init method when this happens?

We cannot handle a failure to initialize FLS slot, so we have to bail one way or another.
But, yes, we could handle this failure the same way as an OOM on init path - no failfast, just return from InitializeGC with a failure/false.

jkotas · 2025-03-11T19:13:36Z

src/coreclr/nativeaot/Runtime/FinalizerHelpers.cpp

 uint32_t WINAPI FinalizerStart(void* pContext)
 {
+#ifdef TARGET_WINDOWS
+    g_FlsInitSucceeded = PalInitComAndFlsSlot();


Do we need to exit the finalizer thread when this fails? Otherwise, we call ThreadStore::AttachCurrentThread below that is going to call FlsSetValue on uninitialized g_flsIndex and things will just go bad from there.

jkotas · 2025-03-11T19:14:54Z

src/coreclr/nativeaot/Runtime/windows/PalRedhawkMinWin.cpp

+
+    // Making finalizer thread MTA early ensures that COM is initialized before we initialize our thread
+    // termination callback.
+    CoInitializeEx(NULL, COINIT_MULTITHREADED);


We can check for bad HRESULT from CoInitializeEx as well.

Can COM be somehow unavailable to the app? Like turned off for the platform or the app?

I think in such case we'd want to ignore a failure and continue, since we do not actually require that init succeeded, just want to be sure it succeeds early enough.

COM is part of Windows. Windows without COM is not a thing.

The current COM init logic propagates the errors from CoInitialize:

runtime/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Thread.NativeAot.Windows.cs

Lines 320 to 335 in 8849480

int hr = Interop.Ole32.CoInitializeEx(IntPtr.Zero,

(state == ApartmentState.STA) ? Interop.Ole32.COINIT_APARTMENTTHREADED

: Interop.Ole32.COINIT_MULTITHREADED);

#endif

if (hr < 0)

{

// RPC_E_CHANGED_MODE indicates this thread has been already initialized with a different

// concurrency model. We stay away and let whoever else initialized the COM to be in control.

if (hr == HResults.RPC_E_CHANGED_MODE)

return;

// CoInitializeEx returns E_NOTIMPL on Windows Nano Server for STA

if (hr == HResults.E_NOTIMPL)

throw new PlatformNotSupportedException();

throw new OutOfMemoryException();

Makes sense. I think forbidding COM (or in particular COINIT_MULTITHREADED mode) might be too breaking, so it will always work.

src/coreclr/nativeaot/Runtime/windows/PalRedhawkMinWin.cpp

VSadov · 2025-03-11T23:23:39Z

Thanks!

VSadov added 4 commits March 5, 2025 22:55

simplify detaching

faf3bf7

minor cleanups

3cb8c97

init COM + FLS in finalizer thread.

8bd7fde

remove managed finalizer+COM stuff

2618c73

VSadov added os-windows area-NativeAOT-coreclr labels Mar 6, 2025

dotnet-policy-service bot assigned VSadov Mar 6, 2025

Merge branch 'main' into AotCom

1a45e29

jkotas reviewed Mar 10, 2025

View reviewed changes

jkotas requested a review from kouvel March 10, 2025 22:52

VSadov marked this pull request as ready for review March 10, 2025 23:30

VSadov requested a review from MichalStrehovsky as a code owner March 10, 2025 23:30

Unify assert message with CoreCLR

7382fd6

jkotas approved these changes Mar 11, 2025

View reviewed changes

jkotas reviewed Mar 11, 2025

View reviewed changes

Treat failure to allocate FLS slot similar to OOM.

8849480

jkotas reviewed Mar 11, 2025

View reviewed changes

Exit from finalizer thread if FLS or COM failed to initialize.

7119c37

jkotas reviewed Mar 11, 2025

View reviewed changes

src/coreclr/nativeaot/Runtime/windows/PalRedhawkMinWin.cpp Show resolved Hide resolved

jkotas reviewed Mar 11, 2025

View reviewed changes

src/coreclr/nativeaot/Runtime/windows/PalRedhawkMinWin.cpp Outdated Show resolved Hide resolved

Apply suggestions from code review

40d96d1

jkotas approved these changes Mar 11, 2025

View reviewed changes

VSadov merged commit 3da53bf into dotnet:main Mar 11, 2025
93 checks passed

VSadov deleted the AotCom branch March 11, 2025 23:23

VSadov mentioned this pull request Apr 3, 2025

Deadlock in loader lock (CoreCLR) from background GC in XAML apps #114182

Closed

github-actions bot locked and limited conversation to collaborators Apr 11, 2025

	int hr = Interop.Ole32.CoInitializeEx(IntPtr.Zero,
	(state == ApartmentState.STA) ? Interop.Ole32.COINIT_APARTMENTTHREADED
	: Interop.Ole32.COINIT_MULTITHREADED);
	#endif
	if (hr < 0)
	{
	// RPC_E_CHANGED_MODE indicates this thread has been already initialized with a different
	// concurrency model. We stay away and let whoever else initialized the COM to be in control.
	if (hr == HResults.RPC_E_CHANGED_MODE)
	return;

	// CoInitializeEx returns E_NOTIMPL on Windows Nano Server for STA
	if (hr == HResults.E_NOTIMPL)
	throw new PlatformNotSupportedException();

	throw new OutOfMemoryException();

[NativeAOT] Initialize COM before initializing FLS slot. #113194

[NativeAOT] Initialize COM before initializing FLS slot. #113194

Uh oh!

Conversation

VSadov commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented Mar 6, 2025

Uh oh!

azure-pipelines bot commented Mar 6, 2025

Uh oh!

jkotas commented Mar 10, 2025

Uh oh!

azure-pipelines bot commented Mar 10, 2025

Uh oh!

jkotas Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

VSadov Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented Mar 11, 2025

Uh oh!

azure-pipelines bot commented Mar 11, 2025

Uh oh!

jkotas left a comment

Choose a reason for hiding this comment

Uh oh!

jkotas Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

jkotas Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

jkotas Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

VSadov Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

jkotas Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

VSadov Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

VSadov commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

VSadov commented Mar 6, 2025 •

edited

Loading

VSadov Mar 10, 2025 •

edited

Loading

VSadov commented Mar 10, 2025 •

edited

Loading

VSadov commented Mar 11, 2025 •

edited

Loading

jkotas Mar 11, 2025 •

edited

Loading