Skip to content

Process memory exhaustion under region-based GC heap mode #103582

Open
@baal2000

Description

@baal2000

Description

After upgrading a production environment comprised of a pool of large memory footprint processes running in Docker containers, one container per Ubuntu 20.04 VM host ranging from ~30 GB up to 1TB RAM from .NET 7 with segment-based libcrlgc.so GC heap (due to #86183) to a default .NET 8 configuration some of the processes experienced sporadic out of memory crashes.

Analysis

A typical pattern before was the process cycling between hitting the available memory limit, then deep gen 2 GC, then again raising to the limit, then gen2 GC, etc.
image
Under .NET 8 (standard region-based GC) the pattern had changed to almost a straight line ending in out of memory
image
Reverting to .NET 7 with segment-based libcrlgc.so GC heap reversed the pattern and the process became stable.

The heap sizes looked similar under both scenarios:
image
After taking a fill memory dump for both scenarios, the .NET native top object usage looked similarly:
.net 7 (segment-based GC heap)

  kilobytes |Object count|Type
  23,402,813| 272,323,647|Class1
  11,726,220|      63,078|Class2[]
   5,024,993|     117,549|Class1[]
   3,877,060|         173|$.ValueTuple<Class2,Class1>[]
   3,214,924|  22,861,688|Class3
     850,206|      13,871|$.Collections.Generic.HashSet+Entry<Class4>[]
     698,673|   1,527,108|$.Int32[]
     664,435|  28,349,262|Class5
     664,193|  11,920,171|$.String
     608,850|   3,247,205|Class6
....    
       9,165|     233,684|Free

  60,011,021| 453,466,459|TOTAL

.net 8 (region-based GC heap)

  kilobytes |Object count|Type
  26,227,185| 305,189,069|Class1
  12,293,589|      57,454|Class2[]
   3,988,836|      80,069|Class1[]
   3,074,227|  21,861,170|Class3
   1,613,392|          91|$.ValueTuple<Class2,.Class1>[]
   1,500,896|      12,276|$.Collections.Generic.HashSet+Entry<Class4>[]
     633,966|  27,049,234|Class5
     615,378|   3,282,018|Class6
     539,491|     758,663|$.Int32[]

      29,343|      23,047|Free

  57,974,870| 451,380,242|TOTAL

Scaling up the machine up to 1.5+ its original size
image
had eliminated the OM crashes but required 1.5+ more CPU cores that in turn costed 1.5+ in dollars for a more expensive cloud infrastructure.

To test the theory that the difference was due to the GC heap mode and not related to the framework version change from .NET 7 to .NET 8, we tried switching from .NET 7 libcrlgc.so -> .NET 8 default -> .NET 8 libcrlgc.so -> .NET 8 DOTNET_GCDynamicAdaptationMode on another server that had experienced a similar issue. The test confirmed that the pattern was only dependent on the segment libcrlgc.so vs. region - based heap. Switching to DATAS for region - based heap didn't not affect the pattern.
image

Configuration

  • .NET 7 & 8
  • Ubuntu 20.04 x64

Regression?

Feels like one. Could be triggered when a process is already running close to its maximum available memory limit with no space to spare. The region-based GC heap might optimize its activity based on some factors: for instance minimizing GC pauses or busy preserving its pools of memory regions not realizing that there is a bigger issue of insufficient memory at hands that needs to be dealt with urgently.

Note that this is the second critical issue after #97316 that we had experienced with the region-based GC heap mode that needs to be addressed by the team for the new GC mode to deliver on its better proformance promise.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Complete

Relationships

None yet

Development

No branches or pull requests

Issue actions