GitHub - pwrdrvr/dotnet-async-nightmare: Example of super high CPU usage on extremely simple web server in DotNet 8.0, 9.0, etc.

Overview

This demo shows the problem with async task completions moving between threads and how much CPU that is using.

🔥 Key Finding: By limiting worker threads to 1 and disabling semaphore spinning, we achieved 7x more requests per CPU core compared to the default .NET configuration.

This problem exists in all async frameworks I tested, including those in the JVM, Go, Rust, and Dotnet. Limiting each framework to 1 or 2 threads can significantly reduce CPU usage and increase RPS per CPU core. For a simple way to test these other frameworks, just use the web server examples in the-benchmarker/web-frameworks and experiment with the number of threads allowed for async tasks.

Node.js does not suffer from this because it has a single worker thread in each Javascript runtime instance, so there is no context switching between threads and no task stealing.

The question is: why are we using async task completions at all if it's causing 7x more CPU usage per RPS compared to a single thread?

Performance Visualization

The above chart shows requests/second/CPU core for different thread pool configurations. Higher is better.

You can generate these charts with real benchmarks on your system by running:

# Install prerequisites (if needed)
# npm install -g node@20
# cargo install oha

# Get help and see all available options
./run.sh --help

# Run benchmarks and generate visualizations
./run.sh

# Or test with sample data without running benchmarks (much faster)
./run.sh --test

# Clean up any lingering processes if needed
./run.sh --clean

After running the benchmarks:

View the interactive charts in your browser
Take screenshots of the charts and save them to the screenshots directory
The full results are available in docs/index.html and benchmark-data.json

Key Findings Visualized

The charts clearly show:

CPU Efficiency: Single worker thread configuration delivers 7x more requests per CPU core
Thread Overhead: Default ThreadPool settings waste significant CPU on context switching
Semaphore Spinning: Disabling ThreadPool semaphore spinning alone cuts CPU usage by 50%

See charts for detailed interactive results after running benchmarks.

Links

Initial report of high CPU usage in a reverse proxy: pwrdrvr/lambda-dispatch#43

Open issue in DotNet since 2022, with partial work-around: dotnet/runtime#72153 (comment)

Original source for aspnet-minimal-api: aspnet-minimal-api

Original source for thread pool control function: Program.cs

Details of 700% CPU usage for a dotnet reverse proxy to send 17k RPS to a Node.js express server using less than 100% CPU: pwrdrvr/lambda-dispatch#109

.NET Runtime Source Code References

The specific cause of the issue can be traced to these parts of the .NET Runtime source code:

PortableThreadPool.WorkerThread - Location where Unfair semaphore spin limit can override the default: View source
PortableThreadPool.WorkerThread - Location where SemaphoreSpinCountDefaultBaseline is set to 70, but set to 4 * 70 = 280 on ARM platforms: View source
LowLevelLifoSempahore - Location where spinCount appears to be doubled again on Unix platforms (potentially boosting this to 560 on Mac ARM?): View source

Running Locally

Install DotNet 8.0 SDK

.NET 8.0 SDK

Build

dotnet restore
dotnet build -c Release

Automated Test Script

Note: nvm is a tool to get version 20 of Node.js - you can use any other method to get v20, or any other compatible version.

Mac / Linux

nvm use

./run.sh

Windows

nvm use

node run-benchmarks.js
node generate-charts.js

Manually Testing

Click to expand

#### Run

# Base case - Uses 600-670% CPU to deliver 130k-140k RPS
# ~22k RPS per CPU core
# This *as fast* as Node.js with express.js per CPU core
dotnet run -c Release --project src/web/web.csproj

# Disabling Semaphore spinning - Uses 266-320% CPU to deliver 115k-130k RPS
# ~43k RPS per CPU core
DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0 dotnet run -c Release --project src/web/web.csproj

# Note on `oha` CPU usage
# `oha` (Rust) is using 200-300% CPU to geneate the above requests with default Tokio async runtime config

# Limiting `oha` to 1 thread - Uses 700% CPU to deliver 100k RPS
# 14k RPS per CPU core
dotnet run -c Release --project src/web/web.csproj

# Limiting `oha` to 1 thread and disabling Semaphore spinning - Uses 300% CPU to deliver 90k RPS
# 30k RPS per CPU core
# `oha` uses 90% CPU
DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0 dotnet run -c Release --project src/web/web.csproj

# Limiting `oha` to 2 threads and disabling Semaphore spinning - Uses 330% CPU to deliver 120k-140k RPS
# 43k RPS per CPU core
# `oha` uses 160% CPU
DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0 dotnet run -c Release --project src/web/web.csproj

# Limiting dotnet to 1 thread, disabling Semaphore spinning, and limiting `oha` to 2 threads
# Uses 114% CPU to deliver 120k-130k RPS
# 105k RPS per CPU core
# 🔥 4.8x RPS per CPU core compared to base case 🔥
# `oha` uses 125% CPU
# Note: 1 thread can get a little wonky with the async completions
 LAMBDA_DISPATCH_MaxWorkerThreads=1 DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0 dotnet run -c Release --project src/web/web.csproj

 # Limitng the IO completion port threads to 1 has no effect on CPU usage

Testing

# Smoke Test
curl http://localhost:5001/user/1234

# Load Test
# In this case oha will use too many threads and will be slower with 2x to 3x more CPU usage than necessary
# Incidentally, this is the same problem that dotnet is having with async task completions / spin waits / work stealing
oha -c 20 -z 60s http://localhost:5001/user/1234

# Load test with 1 Tokio runtime thread - 20 concurrent sockets
TOKIO_WORKER_THREADS=1 oha -c 20 -z 60s http://localhost:5001/user/1234

# Load test with 1 Tokio runtime thread - 100 concurrent sockets
TOKIO_WORKER_THREADS=1 oha -c 100 -z 60s http://localhost:5001/user/1234

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
docs		docs
screenshots		screenshots
src		src
.gitignore		.gitignore
.nvmrc		.nvmrc
CLAUDE.md		CLAUDE.md
README.md		README.md
benchmark-data.json		benchmark-data.json
dotnet-async-nightmare.sln		dotnet-async-nightmare.sln
generate-charts.js		generate-charts.js
run-benchmarks.js		run-benchmarks.js
run.sh		run.sh
sample-data.json		sample-data.json
test-benchmark-data.json		test-benchmark-data.json
test-run.js		test-run.js
test-version.js		test-version.js
test-visualization.js		test-visualization.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Performance Visualization

Key Findings Visualized

Links

.NET Runtime Source Code References

Running Locally

Install DotNet 8.0 SDK

Build

Automated Test Script

Mac / Linux

Windows

Manually Testing

Testing

About

Uh oh!

Uh oh!

Languages

pwrdrvr/dotnet-async-nightmare

Folders and files

Latest commit

History

Repository files navigation

Overview

Performance Visualization

Key Findings Visualized

Links

.NET Runtime Source Code References

Running Locally

Install DotNet 8.0 SDK

Build

Automated Test Script

Mac / Linux

Windows

Manually Testing

Testing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages