Description
openedon Dec 9, 2022
Description
Hello,
The issue manifests it seems at random on a number of Ubuntu 18.04 servers running our Kestrel-based .NET application.
The servers in question did not install any package updates recently that might have contributed to such behavior.
Kestrel stops responding to requests - all of them time out.
$ curl -vvv -k https://server.com/alive
Trying 1.2.3.4...
TCP_NODELAY set
The machine does not experience high load at the time of the incident. CPU, memory and IO usage is at normal or even low rates.
We captured .NET counters in hope of finding clues of some kind of thread starvation but we did not see anything out of the ordinary:
Status: Running
[System.Runtime]
% Time in GC since last GC (%) 0
Allocation Rate (B / 1 sec) 529,872
CPU Usage (%) 3
Exception Count (Count / 1 sec) 0
GC Committed Bytes (MB) 1,511
GC Fragmentation (%) 66.514
GC Heap Size (MB) 527
Gen 0 GC Count (Count / 1 sec) 0
Gen 0 Size (B) 65,608,328
Gen 1 GC Count (Count / 1 sec) 0
Gen 1 Size (B) 16,600,328
Gen 2 GC Count (Count / 1 sec) 0
Gen 2 Size (B) 1.0128e+09
IL Bytes Jitted (B) 9,668,963
LOH Size (B) 3.2404e+08
Monitor Lock Contention Count (Count / 1 sec) 25
Number of Active Timers 526
Number of Assemblies Loaded 580
Number of Methods Jitted 114,669
POH (Pinned Object Heap) Size (B) 2,648,776
ThreadPool Completed Work Item Count (Count / 1 sec) 701
ThreadPool Queue Length 0
ThreadPool Thread Count 20
Time spent in JIT (ms / 1 sec) 0
Working Set (MB) 4,785
I attach stack traces taken with dotnet-stack
.
Now the surprising part, things that "unblocks it" is:
-
strace call
sudo strace -T -t -f -p $PID
-
making a minidump with dotnet-dump
-
restarting the service (not surprising)
After performing the above actions Kestrel is responding to requests again.
We would be grateful for any advice where to look further for the root cause or any additional diagnostics tips.
Reproduction Steps
Don't know yet
Expected behavior
Kestrel responds to requests.
Actual behavior
Requests time out.
Regression?
Not sure.
Known Workarounds
-
strace call
sudo strace -T -t -f -p $PID
-
making a minidump with dotnet-dump
-
restarting the service (not surprising)
Configuration
Which version of .NET is the code running on?
.NET 6.0.11
OS: Ubuntu 18.04
Architecture: x64
Config-specific: don't know