Description
Description
After upgrading a production environment comprised of a pool of large memory footprint processes running in Docker containers, one container per Ubuntu 20.04 VM host ranging from ~30 GB up to 1TB RAM from .NET 7 with segment-based libcrlgc.so
GC heap (due to #86183) to a default .NET 8 configuration some of the processes experienced sporadic out of memory crashes.
Analysis
A typical pattern before was the process cycling between hitting the available memory limit, then deep gen 2 GC, then again raising to the limit, then gen2 GC, etc.
Under .NET 8 (standard region-based GC) the pattern had changed to almost a straight line ending in out of memory
Reverting to .NET 7 with segment-based libcrlgc.so
GC heap reversed the pattern and the process became stable.
The heap sizes looked similar under both scenarios:
After taking a fill memory dump for both scenarios, the .NET native top object usage looked similarly:
.net 7 (segment-based GC heap)
kilobytes |Object count|Type
23,402,813| 272,323,647|Class1
11,726,220| 63,078|Class2[]
5,024,993| 117,549|Class1[]
3,877,060| 173|$.ValueTuple<Class2,Class1>[]
3,214,924| 22,861,688|Class3
850,206| 13,871|$.Collections.Generic.HashSet+Entry<Class4>[]
698,673| 1,527,108|$.Int32[]
664,435| 28,349,262|Class5
664,193| 11,920,171|$.String
608,850| 3,247,205|Class6
....
9,165| 233,684|Free
60,011,021| 453,466,459|TOTAL
.net 8 (region-based GC heap)
kilobytes |Object count|Type
26,227,185| 305,189,069|Class1
12,293,589| 57,454|Class2[]
3,988,836| 80,069|Class1[]
3,074,227| 21,861,170|Class3
1,613,392| 91|$.ValueTuple<Class2,.Class1>[]
1,500,896| 12,276|$.Collections.Generic.HashSet+Entry<Class4>[]
633,966| 27,049,234|Class5
615,378| 3,282,018|Class6
539,491| 758,663|$.Int32[]
29,343| 23,047|Free
57,974,870| 451,380,242|TOTAL
Scaling up the machine up to 1.5+ its original size
had eliminated the OM crashes but required 1.5+ more CPU cores that in turn costed 1.5+ in dollars for a more expensive cloud infrastructure.
To test the theory that the difference was due to the GC heap mode and not related to the framework version change from .NET 7 to .NET 8, we tried switching from .NET 7 libcrlgc.so
-> .NET 8 default
-> .NET 8 libcrlgc.so
-> .NET 8 DOTNET_GCDynamicAdaptationMode
on another server that had experienced a similar issue. The test confirmed that the pattern was only dependent on the segment libcrlgc.so
vs. region - based heap. Switching to DATAS for region - based heap didn't not affect the pattern.
Configuration
- .NET 7 & 8
- Ubuntu 20.04 x64
Regression?
Feels like one. Could be triggered when a process is already running close to its maximum available memory limit with no space to spare. The region-based GC heap might optimize its activity based on some factors: for instance minimizing GC pauses or busy preserving its pools of memory regions not realizing that there is a bigger issue of insufficient memory at hands that needs to be dealt with urgently.
Note that this is the second critical issue after #97316 that we had experienced with the region-based GC heap mode that needs to be addressed by the team for the new GC mode to deliver on its better proformance promise.
Metadata
Metadata
Assignees
Type
Projects
Status