Skip to content

Rare race condition in EventSource dispose/finalizer #55441

Open
@josalem

Description

@josalem

There is a rare race that can result in a use-after-free on the native end of EventPipeEventProvider and potentially the EtwEventProvider.

#region IDisposable Members
/// <summary>
/// Disposes of an EventSource.
/// </summary>
public void Dispose()
{
this.Dispose(true);
GC.SuppressFinalize(this);
}
/// <summary>
/// Disposes of an EventSource.
/// </summary>
/// <remarks>
/// Called from Dispose() with disposing=true, and from the finalizer (~EventSource) with disposing=false.
/// Guidelines:
/// 1. We may be called more than once: do nothing after the first call.
/// 2. Avoid throwing exceptions if disposing is false, i.e. if we're being finalized.
/// </remarks>
/// <param name="disposing">True if called from Dispose(), false if called from the finalizer.</param>
protected virtual void Dispose(bool disposing)
{
if (!IsSupported)
{
return;
}
if (disposing)
{
#if FEATURE_MANAGED_ETW
// Send the manifest one more time to ensure circular buffers have a chance to get to this information
// even in scenarios with a high volume of ETW events.
if (m_eventSourceEnabled)
{
try
{
SendManifest(m_rawManifest);
}
catch { } // If it fails, simply give up.
m_eventSourceEnabled = false;
}
if (m_etwProvider != null)
{
m_etwProvider.Dispose();
m_etwProvider = null!;
}
#endif
#if FEATURE_PERFTRACING
if (m_eventPipeProvider != null)
{
m_eventPipeProvider.Dispose();
m_eventPipeProvider = null!;
}
#endif
}
m_eventSourceEnabled = false;
m_eventSourceDisposed = true;
}
/// <summary>
/// Finalizer for EventSource
/// </summary>
~EventSource()
{
this.Dispose(false);
}
#endregion

If one thread (A) is calling EventSource.Dispose(), and another is in the process of writing (B), it is possible for the following sequence to occur:

A: (1)Dispose -> (3)m_eventSourceEnabled = false -> (4)m_eventPipeProvider.Dispose() -> (6)m_eventPipeProvider = null
B: (2)if (IsEnabled) -> (5)use m_eventPipeProvider

EventPipeEventProvider.Dispose() calls EventPipeEventProvider.EventUnregister(). This deletes the underlying native structures (the only time EventPipeProvider::m_pEventList is set to nullptr). The managed code, does not unset the m_provHandle member, so if someone got a reference to this managed object, they would have a pointer to freed memory. The managed provider has been marked as disabled, however, not all code paths check that value. Specifically:

// Define an EventPipeEvent handle.
unsafe IntPtr IEventProvider.DefineEventHandle(uint eventID, string eventName, long keywords, uint eventVersion, uint level,
byte *pMetadata, uint metadataLength)
{
IntPtr eventHandlePtr = EventPipeInternal.DefineEvent(m_provHandle, eventID, keywords, eventVersion, level, pMetadata, metadataLength);
return eventHandlePtr;
}

which is where we AV in this case. We get here from TraceLoggingEventSource.WriteImpl() which ends up in NameInfo.GetOrCreateEventHandle() which calls DefineEvent on the provider:

public IntPtr GetOrCreateEventHandle(EventProvider provider, TraceLoggingEventHandleTable eventHandleTable, EventDescriptor descriptor, TraceLoggingEventTypes eventTypes)
{
IntPtr eventHandle;
if ((eventHandle = eventHandleTable[descriptor.EventId]) == IntPtr.Zero)
{
lock (eventHandleTable)
{
if ((eventHandle = eventHandleTable[descriptor.EventId]) == IntPtr.Zero)
{
byte[]? metadata = EventPipeMetadataGenerator.Instance.GenerateEventMetadata(
descriptor.EventId,
name,
(EventKeywords)descriptor.Keywords,
(EventLevel)descriptor.Level,
descriptor.Version,
(EventOpcode)descriptor.Opcode,
eventTypes);
uint metadataLength = (metadata != null) ? (uint)metadata.Length : 0;
unsafe
{
fixed (byte* pMetadataBlob = metadata)
{
// Define the event.
eventHandle = provider.m_eventProvider.DefineEventHandle(
(uint)descriptor.EventId,
name,
descriptor.Keywords,
descriptor.Version,
descriptor.Level,
pMetadataBlob,
metadataLength);
}
}
// Cache the event handle.
eventHandleTable.SetEventHandle(descriptor.EventId, eventHandle);
}
}
}
return eventHandle;
}

I'm still pinning down the exact sequencing of events in the hopes I can create a deterministic repro.

I believe this is the cause for the failures on dotnet/coreclr#28179 and #55240.

CC @tommcdon @noahfalk @dotnet/dotnet-diag

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions