1sec Long Pause (stop-the-world) on Gen1 GC

### Description

Recently my team has been working on lowering down memory allocation to minimize GC pressure.
While we used different technique to reuse/recycle object, or use ArrayPool.Shared to rent/return memory, we found that there are periodic long pauses (~1s, stop-the-world) of Gen0/Gen1 GC few times per hour.
We used PerfView command to spot it out:
```
PerfView -NoGui -AcceptEULA -NoNGenRundown -BufferSizeMB:2048 -CircularMB:2048 -Zip:false -Merge:true -ClrEvents:GC+Stack -StopOnGCOverMSec:300 -KernelEvents:Memory,VirtualAlloc,Default -OnlyProviders:ClrPrivate:1:5,Clr:0x40000001:5 -NoRundown
-DelayAfterTriggerSec:0 -CollectMultiple:1 -DataFile:XXX-2307051821-48272-gcsg -FocusProcess:48272 -Process:48272 collect
```

GC Stats showing 1s Gen1 GC pause

![Screenshot 2023-07-05 224639](https://github.com/dotnet/runtime/assets/23349139/d032ba0b-f51c-4562-8917-b82275eb0bc0)
[pic1]

The Gen0 Frag% is obvious very high (98.98%) ...
If I looked into CPU Stack with filter applied for GC period, looks like most CPU time is used for doing compact?

![Screenshot 2023-07-06 010013](https://github.com/dotnet/runtime/assets/23349139/519ac691-f5a4-46f1-b096-ddf3206a9750)

When looking into the CPU Stack of profile, we discovered that these few calls during GC causes most part of delay:
1. coreclr!SyncBlockCache::InsertCleanupSyncBlock
2. coreclr!SVR::GCHeap::IsPromoted
3. ntoskrnl!MiRemoveLowestPriorityStandbyPage

![Screenshot 2023-07-05 225019](https://github.com/dotnet/runtime/assets/23349139/3423eaed-d7b7-43ca-9a24-bbfca7463d4a)
[pic2]

When firing up Goto Source these lines are most time consuming ...
InsertCleanupSyncBlock

![Screenshot 2023-07-05 231740](https://github.com/dotnet/runtime/assets/23349139/9de7cbc8-eddc-4ad7-b437-47eb0491cf2d)
[pic3]

IsPromoted

![Screenshot 2023-07-05 231840](https://github.com/dotnet/runtime/assets/23349139/4dead6e8-5ff6-4997-b1ed-7d97e431208d)
[pic4]

Would you mind telling us that what would be the root cause of it, and anything we can do to prevent such long app pause (1N Gen1 non-concurrent GC)? Or how to solve fragmentation problem if that is the root cause?

Thanks a lot~
dickens

### Configuration

- .NET 7
- App running in GC Server mode, with GC affinity set to 0:0-62
- Windows Server 2012 x64, 128 cores, 2 NUMA nodes
- Cloud server on AWS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1sec Long Pause (stop-the-world) on Gen1 GC #88426

Description

Configuration

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

1sec Long Pause (stop-the-world) on Gen1 GC #88426

Description

Description

Configuration

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions