Skip to content

Inconsistent .NET GC process memory usage reporting #49817

Closed

Description

TL;DR

GC / dotnet-gcdump say process uses 400-500MB. Task Manager says 2.0-2.5GB. "Missing" >1GB memory is clearly filled with garbage managed strings. Whisky Tango Foxtrot ?

Context

We have a C# service that reads in the metadata of ~150K parquet files using ParquetSharp and creates an in-memory index of all this data. The reading is multithreaded and spread across 14/28 cores (physical/logical), with the data coming from a file share. Since there is a lot of data repetition, I have spent some time analyzing and improving the general memory usage (i.e. mostly by sharing immutable instances of arrays instead of maintaining separate copies when the data turns out to be identical).

This is an ASP.NET Core 3.1 application, running as x64 on Windows 10 via JetBrains Rider (as I'm developing and testing it).

All memory measurements and analysis are done after calling:

    GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
    GC.Collect();

The Issue

The reason why I'm raising this ticket is that I've never been able to match the memory usage reported by Task Manager (Details Tab) and the GC memory stats.

  • GC.GetTotalMemory(forceFullCollection: true) returns about ~400MB.
  • GC.GetGCMemoryInfo().HeapSizeBytes returns about ~400MB as well (usually slightly larger than the previous value).
  • dotnet-gcdump + eeheap gives GC Heap Size as ~500MB (spread across 28 heaps - which seems to match the number of logical cores).
  • Task Manager reports about 2.0-2.5GB for the dotnet process.
  • process.PagedMemorySize64 reports around 2GB.
  • dotnet-dump (not gcdump) creates a ~2.5GB file.

Investigation

At first I suspected a memory leak in the native components of ParquetSharp, but couldn't find anything there. Also switching to non-server GC reduces the process size to ~1.5GB (I reverted back to server GC afterwards).

In desperation I just opened in Rider the 2.5GB memory dump produced by dotnet-dump and viewed it as a text file. A good chunk of the file happened to be C# strings. In fact, they were clearly garbage strings from string.Split() operations; our custom Parquet metadata containing a lot of semi-colon-separated lists.

Doing the string splitting and parsing using ReadOnlySpan views to avoid creating loads of temporary string reduced the total process memory usage from 2.0-2.5GB to 1.0GB. The values returned by GC.GetTotalMemory() and `GC.GetGCMemoryInfo().HeapSizeBytes remain virtually unchanged.

I think I can conclude at that point that:

  • The missing memory was clearly populated by managed objects.
  • The GC APIs and tools I used did not report that memory.
  • There is still 0.5GB of unaccounted memory.

Issue and Questions

This leaves with a lot of questions. The three top ones being:

  • Am I fundamentally misunderstanding GC memory usage reporting and tools like dotnet-gcdump ?
  • If so, how do I measure and inspect the "missing" memory using standard .NET APIs and tools?
  • If not, do we have a memory leak in .NET Core GC?

TODO

I'll see if I can reproduce this behaviour in a small demo application. Worth testing on .NET 5.0 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions