Skip to content

Share implementation of ComWrappers between CoreCLR and NativeAOT #113907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

jkoritzinsky
Copy link
Member

Share the managed implementation of ComWrappers from NativeAOT with CoreCLR.

Remove all parts of CoreCLR's native implementation that don't need to run during GC and provide a manually-managed implementation of the logic that must run during GC.

Move portions of ComWrappers.NativeAot.cs that aren't run during GC into ComWrappers.Common.cs

Convert CoreCLR to use the managed ComWrappers implementation wherever possible (everywhere except for during GC)

Move wrapper ID to be a CoreCLR-only concept (as it's only needed for diagnostics support)
@jkoritzinsky
Copy link
Member Author

@AustinWise I was thinking over this last night and came to the same conclusion you did.

I think the object->ManagedObjectWrapperHolder mapping needs to either not be bound to the lifetime of the ComWrappers object or the ComWrappers object must be kept alive as long as the mapping is. I'll investigate possible solutions.

…use to enable the diagnostics stack to pull all wrapper information from outside the syncblock so we can get out of the syncblock entirely in the future)
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 46 out of 48 changed files in this pull request and generated no comments.

Files not reviewed (2)
  • src/coreclr/System.Private.CoreLib/System.Private.CoreLib.csproj: Language not supported
  • src/coreclr/nativeaot/System.Private.CoreLib/src/System.Private.CoreLib.csproj: Language not supported

@jkoritzinsky
Copy link
Member Author

@jkotas any more feedback?

@AaronRobinsonMSFT
Copy link
Member

I don't think micro benchmarks are going to be all that helpful so running against our own tests, that do a lot of validating wrapper clean-up, would help indicate the general impact.

Was the above done or are you looking for more feedback before doing that?

@jkoritzinsky
Copy link
Member Author

I wanted to make sure I had all feedback before I start running benchmarks.

I'll start on the benchmarking.

@jkoritzinsky
Copy link
Member Author

I did some local benchmarking with the ComWrappers/API test suite, as requested. I timed 10 runs of that test suite with the implementation in this PR and the implementation in main, both Release builds (10 runs to warm the processor cache, 10 runs for the measurements below) and got the following results:

  • PR: 2.5039942 seconds
  • Main: 2.5145364 seconds

@AustinWise
Copy link
Contributor

Here is what the references between different managed object wrapper objects now are. I added the new static conditional weak table on ComWrappers:

graph TD
WO[wrapped object];
MOW["ManagedObjectWrapper (unmanaged struct)"];
MOWH[ManagedObjectWrapperHolder];
Releaser[ManagedObjectWrapperReleaser];
WO -->|ComWrappers static conditional weak table| MOWH;
WO -->|ComWrappers instance conditional weak table| MOWH;
MOWH --> WO;
MOWH --> MOW;
MOW -->|ref-counted handle| MOWH;
Releaser --> MOW;
MOWH --> Releaser;
WO -->|SyncBlock| MOW;
Loading

Adding the static ConditionalWeakTable fixes the problem of the pointers in the SyncBlock becoming dangling. But it can cause a memory leak where there was not one before. Before this change, when a ComWrappers instance was collected, all of its associated managed object wrappers became eligible for collection (assuming all refcounts were zero). After this change, the static ConditionalWeakTable keeps every ManagedObjectWrapperHolder ever made for an object alive until the wrapped object is collected.

This program should illustrate the memory leak:

reproduction
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.InteropServices.Marshalling;

unsafe class Program
{
    static Foo sRootedFoo = new Foo();

    static void Main()
    {
        Console.WriteLine($"I'm pid {Environment.ProcessId}");

        while (true)
        {
            for (int i = 0; i < 1000; i++)
            {
                AllocateAndFreeComWrappers();
            }
            for (int i = 0; i < 5; i++)
            {
                GC.Collect();
                GC.WaitForPendingFinalizers();
            }
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void AllocateAndFreeComWrappers()
    {
        ComWrappers cw = new StrategyBasedComWrappers();
        nint ptr = cw.GetOrCreateComInterfaceForObject(sRootedFoo, CreateComInterfaceFlags.None);
        Marshal.Release(ptr);
    }
}

[GeneratedComInterface]
[Guid("3faca0d2-e7f1-4e9c-82a6-404fd6e0aab8")]
internal partial interface IFoo
{
    void Method(int i);
}

[GeneratedComClass]
internal partial class Foo : IFoo
{
    public void Method(int i)
    {
    }
}

When run on .NET 9 NativeAOT, memory usage stays flat. On .NET 9 CoreCLR, it already leaks memory. So maybe this regression is not a big deal since it only effects NativeAOT?

@AaronRobinsonMSFT
Copy link
Member

On .NET 9 CoreCLR, it already leaks memory.

This is an intentional memory leak due to incorrect usage of ComWrappers. There is a strict mapping between the ComWrappers that created the wrapper and the wrapper itself. I don't think attempting to solve this is needed and I would argue the current CoreCLR implementation has less overhead than the native AOT solution with the additional ConditionalWeakTable.

@jkoritzinsky Thanks for trakcing the performance impact between the two implementations. Did you see the peak memory usage between the two?

@jkoritzinsky
Copy link
Member Author

I'll go check the peak memory usage and report back.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory usage

I would like to know memory cost of minimal ComWrapper RCW and CCW before and after the change. (e.g. create a 100,000 of these, store them in array, and check the memory delta)

using TryInvokeICustomQueryInterfaceResult = InteropLibImports::TryInvokeICustomQueryInterfaceResult;

namespace InteropLib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in interoplibabi.h?

My understanding is that interoplibabi.h is meant to contain definitions of all data layouts that code under src\coreclr\interop needs to agree on with code somewhere else. In any case, interoplibabi.h can use a comment at the top about what's expected to be in it.

bool ComWrappersNative::HasManagedObjectComWrapper(_In_ OBJECTREF object, _Out_ bool* isRooted)
namespace
{
MethodTable* pManagedObjectWrapperHolderMT = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MethodTable* pManagedObjectWrapperHolderMT = NULL;
MethodTable* s_pManagedObjectWrapperHolderMT = NULL;

@@ -145,7 +145,7 @@ void Interop::OnAfterGCScanRoots(_In_ bool isConcurrent)
CONTRACTL_END;

#ifdef FEATURE_COMWRAPPERS
ComWrappersNative::AfterRefCountedHandleCallbacks();
ComWrappersNative::OnGCAfterMarkPhase();
Copy link
Member

@jkotas jkotas Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MarkPhase is a thing in the GC. This callback does not run after GC Mark Phase - it runs much earlier, so this rename is not helping with clarity. (I know that the native AOT uses the same confusing name.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change the name to match the containing method (OnAfterGCScanRoots)

@jkoritzinsky
Copy link
Member Author

I compared the peak working set for the unit tests and got the following results:

PR: 28860 KB
Main: 28920 KB

I also compared a program that just creates 10000 RCWs that wrap 10000 CCWs of 10000 objects and got the following results:

PR: 164812 KB
Main: 127544 KB

I'll investigate what causes the increase (if it's just the table I added for diagnostics or something else) and see if I can mitigate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants