Skip to content

Conversation

@subbucse
Copy link

Illustration of fix in FileSystemWatcher to prevent SOH fragmentation from memory leak of pinned 8K byte[] buffers

  • Illustration with comments of race condition between garbage collection of FileSystemWatcher instance and directory changes callback with errorCode = ERROR_OPERATION_ABORTED on IOCP thread, when FileSystemWatcher is disposed before releasing its last reference.
  • When GC wins the race, we are leaving the door open to leaking the pinned 8k byte[] on SOH which over time will cause heavy heap fragmentation preventing heap compaction and affecting allocation and GC performance.
  • Placeholder for fix with cleanup code when we detect that the WeakRef in the AsyncState in the IO callback has been released.
  • Suggestion to move pinned buffer allocation to POH at time of alloc instead of alloc on SOH and pinning soon after, so that fragmentation is avoided even when FileSystemWatcher is kept alive.
  • Will attach a sample program to produce the memory leak and also if needed (to microsoft internal maintainers) I can produce a process dump with full heaps from our production service which shows the leak.

… from memory leak of pinned 8K byte[] buffers
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 12, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

@subbucse subbucse marked this pull request as draft June 12, 2025 23:58
…atcher in the PR as FileSystemWatcherHeapLeak.csproj

- Please run with strictest input parameters as disposeProbability = 100, maxGeneration = 2, compacting = true, and heap will start growing from leaked 8K byte[] buffers eventually reaching ~2 GB on my devbox over 3-4 hours.
- If you relax the parameters, say disposeProbability < 100, maxGeneration = 0/1, compacting = false, you will only *increase* the rate at which heap leak shows up
* 2. Removing the need to pin the buffer using GCHandle later on when provisioning / Packing the ThreadPoolBoundHandleOverlapped,
* which is done when allocating a new PreAllocatedOverlapped below.
*/
byte[] buffer = AllocateBuffer(pinned: true);
Copy link
Author

@subbucse subbucse Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub You mentioned in our teams chat that there was explicit intent to avoid pinning on POH because the pinning is supposed to be temporary. Did you mean that you expect the pinned buffer to be unpinned and given up for GC after the IO callback comes in from every async overlapped call to ReadDirectoryChangesW ?

If so, I understand not using POH, because the pinning is highly transient, but that's not what's happening now. As you see below, we only call state.ThreadPoolBinding.FreeNativeOverlapped(overlappedPointer) which will internally call AsyncState.PreAllocatedOverlapped.Release(), and NOT AsyncState.PreAllocatedOverlapped.Dispose() which is what ultimately unpins and frees up the pinned 8K byte[] buffer.

So if the intent is to keep the buffer pinned until end of lifetime of FSW, then shouldn't we consider using POH since that timeframe could be long?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean that you expect the pinned buffer to be unpinned and given up for GC after the IO callback comes in from every async overlapped call to ReadDirectoryChangesW ?

Not necessarily. My statement was about why we avoid the Pinned Object Heap, because things on that heap are never moved and thus creating allocations there that won't effectively be there for the remainder of the process can lead to significant fragmentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the intent is to keep the buffer pinned until end of lifetime of FSW, then shouldn't we consider using POH since that timeframe could be long?

There can be a significant difference between the lifetime of an FSW instance and the lifetime of the process. It's totally feasible to create transient FSWs.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.IO community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants