Skip to content

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Mar 6, 2025

Fixes: #113040

Order FLS slot initialization after ensuring that COM is initialized. This is to ensure that deinitialization of COM happen before detachment of a thread - as COM deinitialization may want to run managed code.

@VSadov
Copy link
Member Author

VSadov commented Mar 6, 2025

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jkotas
Copy link
Member

jkotas commented Mar 10, 2025

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

// The thread was already initialized, so it is already attached
if (pAttachingThread->IsDetached())
{
ASSERT_UNCONDITIONALLY("The thread has been detached already");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@VSadov VSadov Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the error message to be identical to the CoreCLR one?

Yes, we can do that.

Technically, a NativeAOT thread does not have a "destroyed" state. There is "detached" when we can't run managed code any more and then there is "thread is gone, because native TLS is gone". It seems we do not need a state in between and things get simpler without it.

For the purpose of error message the difference is too subtle though.

@jkotas jkotas requested a review from kouvel March 10, 2025 22:52
@VSadov VSadov marked this pull request as ready for review March 10, 2025 23:30
@VSadov
Copy link
Member Author

VSadov commented Mar 10, 2025

I'll make this "Ready for review" because the change is basically complete.

What is left is to construct a repro that fails with no fix and does not fail after. Unlike with Core we do not have an existing scenario.
I have some instructions on how a repro could be constructed, but it is not completely straightforward. (COM+threading,..)

@VSadov
Copy link
Member Author

VSadov commented Mar 11, 2025

I have confirmed that the problematic scenario (running managed code on detached thread) is reachable on baseline NativeAOT SDK and that this fix prevents it.

Here is my repro reduced to only what is necessary:

using System.Collections;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

namespace ConsoleApp57
{
    internal class Program
    {
        [DllImport("oleaut32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
        private static extern int SetErrorInfo(uint dwReserved, IntPtr perrinfo);

        static void Main(string[] args)
        {
            Thread thr1 = new Thread(Test);
            thr1.Start();
            System.Console.ReadLine();
        }

        private static void Test()
        {
            var cw = new ComWrappersImpl();
            IntPtr iUnknown = cw.GetOrCreateComInterfaceForObject(new object(), CreateComInterfaceFlags.None);

            SetErrorInfo(0, iUnknown);

            System.Console.WriteLine(Marshal.Release(iUnknown));
        }
    }
}


public class ComWrappersImpl : ComWrappers
{
    public const string IID_IErrorInfo = "1CF2B120-547D-101B-8E65-08002B2BD119";

    protected unsafe override ComInterfaceEntry* ComputeVtables(object obj, CreateComInterfaceFlags flags, out int count)
    {
        IntPtr fpQueryInterface = default;
        IntPtr fpAddRef = default;
        IntPtr fpRelease = default;
        ComWrappers.GetIUnknownImpl(out fpQueryInterface, out fpAddRef, out fpRelease);

        var vtblRaw = (IntPtr*)RuntimeHelpers.AllocateTypeAssociatedMemory(typeof(ComWrappersImpl), IntPtr.Size * 3);
        vtblRaw[0] = fpQueryInterface;
        vtblRaw[1] = fpAddRef;
        vtblRaw[2] = fpRelease;

        var entryRaw = (ComInterfaceEntry*)RuntimeHelpers.AllocateTypeAssociatedMemory(typeof(ComWrappersImpl), sizeof(ComInterfaceEntry));
        entryRaw->IID = new Guid(IID_IErrorInfo);
        entryRaw->Vtable = (IntPtr)vtblRaw;

        count = 1;
        return entryRaw;
    }

    protected override object CreateObject(IntPtr externalComObject, CreateObjectFlags flag)
        => throw new NotImplementedException();

    protected override void ReleaseObjects(IEnumerable objects)
        => throw new NotImplementedException();
}

If I run this compiled with baseline SDK, COM FLS callback fires after ours and runs managed S_P_CoreLib_System_Runtime_InteropServices_ComWrappers__IUnknown_Release on a detached thread. That typically does not cause problems by itself, since the default Release just decrements a counter and does not touch GC. I can imagine worse outcomes if GC is touched - like if something is allocated.

In this PR we are introducing a FailFast to catch such scenario and deterministically fail.

If I run same repro with the current PR, but with the call to CoInitialize removed, I get (as expected) a FailFast crash:

[0x0]   ntdll!NtRaiseException+0x14   0xcc5cff058   0x7ffaa2944182   
[0x1]   KERNELBASE!RaiseFailFastException+0x152   0xcc5cff060   0x7ff7446e8c1c   
[0x2]   ConsoleApp57!ThreadStore::AttachCurrentThread+0x36   (Inline Function)   (Inline Function)   
[0x3]   ConsoleApp57!ThreadStore::AttachCurrentThread+0x3c   0xcc5cff640   0x7ff7446e4e26   
[0x4]   ConsoleApp57!Thread::ReversePInvokeAttachOrTrapThread+0x85   (Inline Function)   (Inline Function)   
[0x5]   ConsoleApp57!RhpReversePInvokeAttachOrTrapThread2+0x96   0xcc5cff670   0x7ff7446b1b52   
[0x6]   ConsoleApp57!S_P_CoreLib_System_Runtime_InteropServices_ComWrappers__IUnknown_Release+0x12   0xcc5cff6a0   0x7ffaa3b4ae08   
[0x7]   combase!Microsoft::WRL::ComPtr<IUnknown>::InternalRelease+0x1e3   (Inline Function)   (Inline Function)   
[0x8]   combase!Microsoft::WRL::ComPtr<IUnknown>::{dtor}+0x1e3   (Inline Function)   (Inline Function)   
[0x9]   combase!TlsClearErrorInfo+0x22b   (Inline Function)   (Inline Function)   
[0xa]   combase!CoSetErrorInfo+0x244   (Inline Function)   (Inline Function)   
[0xb]   combase!wCoUninitialize+0x280   0xcc5cff6e0   0x7ffaa3b4b764   
[0xc]   combase!CoUninitialize+0x104   0xcc5cff750   0x7ffaa3b7a833   
[0xd]   combase!DoThreadSpecificCleanup+0x6b   0xcc5cff840   0x7ffaa3b7a769   
[0xe]   combase!FlsThreadCleanupCallback+0x29   0xcc5cff870   0x7ffaa4baf3a5   
[0xf]   ntdll!RtlpFlsDataCleanup+0x121   0xcc5cff8a0   0x7ffaa4b67498   
[0x10]   ntdll!LdrShutdownThread+0x48   0xcc5cff8e0   0x7ffaa4b9ec6e   
[0x11]   ntdll!RtlExitUserThread+0x3e   0xcc5cff9e0   0x7ffaa4a5737d   
[0x12]   KERNEL32!BaseThreadInitThunk+0x1d   0xcc5cffa20   0x7ffaa4b9cc91   
[0x13]   ntdll!RtlUserThreadStart+0x21   0xcc5cffa50   0x0   

And if I run the repro sample with complete fix, I see no crash.

With the complete fix COM calls the Release while the thread is still attached and has no problems with running managed code.

@VSadov
Copy link
Member Author

VSadov commented Mar 11, 2025

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

{
// The current fiber is the home fiber of a thread, so the thread is shutting down
RuntimeThreadShutdown(lpFlsData);
ASSERT_UNCONDITIONALLY("Could not allocate an FLS slot.");
Copy link
Member

@jkotas jkotas Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to return false from the runtime init method when this happens, like before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to return false from the runtime init method when this happens?

We cannot handle a failure to initialize FLS slot, so we have to bail one way or another.
But, yes, we could handle this failure the same way as an OOM on init path - no failfast, just return from InitializeGC with a failure/false.

uint32_t WINAPI FinalizerStart(void* pContext)
{
#ifdef TARGET_WINDOWS
g_FlsInitSucceeded = PalInitComAndFlsSlot();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to exit the finalizer thread when this fails? Otherwise, we call ThreadStore::AttachCurrentThread below that is going to call FlsSetValue on uninitialized g_flsIndex and things will just go bad from there.


// Making finalizer thread MTA early ensures that COM is initialized before we initialize our thread
// termination callback.
CoInitializeEx(NULL, COINIT_MULTITHREADED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check for bad HRESULT from CoInitializeEx as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can COM be somehow unavailable to the app? Like turned off for the platform or the app?

I think in such case we'd want to ignore a failure and continue, since we do not actually require that init succeeded, just want to be sure it succeeds early enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COM is part of Windows. Windows without COM is not a thing.

The current COM init logic propagates the errors from CoInitialize:

int hr = Interop.Ole32.CoInitializeEx(IntPtr.Zero,
(state == ApartmentState.STA) ? Interop.Ole32.COINIT_APARTMENTTHREADED
: Interop.Ole32.COINIT_MULTITHREADED);
#endif
if (hr < 0)
{
// RPC_E_CHANGED_MODE indicates this thread has been already initialized with a different
// concurrency model. We stay away and let whoever else initialized the COM to be in control.
if (hr == HResults.RPC_E_CHANGED_MODE)
return;
// CoInitializeEx returns E_NOTIMPL on Windows Nano Server for STA
if (hr == HResults.E_NOTIMPL)
throw new PlatformNotSupportedException();
throw new OutOfMemoryException();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think forbidding COM (or in particular COINIT_MULTITHREADED mode) might be too breaking, so it will always work.

@VSadov
Copy link
Member Author

VSadov commented Mar 11, 2025

Thanks!

@VSadov VSadov merged commit 3da53bf into dotnet:main Mar 11, 2025
93 checks passed
@VSadov VSadov deleted the AotCom branch March 11, 2025 23:23
@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Native AOT] Initialize COM eagerly
2 participants