Description
Description
Recently my team has been working on lowering down memory allocation to minimize GC pressure.
While we used different technique to reuse/recycle object, or use ArrayPool.Shared to rent/return memory, we found that there are periodic long pauses (~1s, stop-the-world) of Gen0/Gen1 GC few times per hour.
We used PerfView command to spot it out:
PerfView -NoGui -AcceptEULA -NoNGenRundown -BufferSizeMB:2048 -CircularMB:2048 -Zip:false -Merge:true -ClrEvents:GC+Stack -StopOnGCOverMSec:300 -KernelEvents:Memory,VirtualAlloc,Default -OnlyProviders:ClrPrivate:1:5,Clr:0x40000001:5 -NoRundown
-DelayAfterTriggerSec:0 -CollectMultiple:1 -DataFile:XXX-2307051821-48272-gcsg -FocusProcess:48272 -Process:48272 collect
GC Stats showing 1s Gen1 GC pause
The Gen0 Frag% is obvious very high (98.98%) ...
If I looked into CPU Stack with filter applied for GC period, looks like most CPU time is used for doing compact?
When looking into the CPU Stack of profile, we discovered that these few calls during GC causes most part of delay:
- coreclr!SyncBlockCache::InsertCleanupSyncBlock
- coreclr!SVR::GCHeap::IsPromoted
- ntoskrnl!MiRemoveLowestPriorityStandbyPage
When firing up Goto Source these lines are most time consuming ...
InsertCleanupSyncBlock
IsPromoted
Would you mind telling us that what would be the root cause of it, and anything we can do to prevent such long app pause (1N Gen1 non-concurrent GC)? Or how to solve fragmentation problem if that is the root cause?
Thanks a lot~
dickens
Configuration
- .NET 7
- App running in GC Server mode, with GC affinity set to 0:0-62
- Windows Server 2012 x64, 128 cores, 2 NUMA nodes
- Cloud server on AWS