Open
Description
Http Stress Status Report
What we've run so far:
OS | HTTP 1.1 | HTTP 2.0 | Notes |
---|---|---|---|
Windows | 4+ h (Mana) -> 2 errors | 12 h (JJ) -> 11 errors | |
6 h (Miha) -> 8 errors | 6 h (Miha) -> 7 errors | ||
10 h (Miha) -> 53 errors | 7 h (Miha) -> 139 errors | 7 h = 1 + 6 h | |
Linux | 17 h (JJ) -> 0 errors | 12 h (JJ) -> 608 errors | we should rerun as this may be environmental problem |
12 h (Furtik) -> 0 errors | 12 h (Furtik) -> 104 errors | ran on released runtime (not on master) |
HTTP 2.0 Error Statistics
Error Type | Linux | Windows |
---|---|---|
Success | 135,727,755 | 105,273,552 |
Errors | 712 | 157 |
System.Threading.Tasks.TaskCanceledException: The operation was canceled. |
37 | 8 |
System.Threading.Tasks.TaskCanceledException: A task was canceled. |
13 | 4 |
System.IO.IOException: The response ended prematurely while waiting for the next frame from the server. |
464 | 18 |
System.Net.Sockets.SocketException (32): Broken pipe |
198 | 0 |
System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host. |
0 | 118 |
System.Net.Sockets.SocketException (10053): An established connection was aborted by the software in your host machine. |
0 | 9 |
HTTP 1.1 Error Statistics
Error Type | Linux | Windows |
---|---|---|
Success | 171,673,491 | 128,658,822 |
Errors | 0 | 61 |
System.Net.Sockets.SocketException (10061): No connection could be made because the target machine actively refused it. |
0 | 61 |
What we need to run:
- More HTTP 1.1 Linux runs to confirm that we're clear. (easy, hi pri)
- More HTTP 2.0 Linux runs to confirm that we have all error types captured. (easy, hi pri)
- HTTP 2.0 tests against HTTPSys to eliminate/confirm server as the culprit. (mid, mid pri)
- Run the matrix against 3.1 and compare. (hard, mid pri)
Existing issues, root caused:
- HTTP 1.1 connect cancellation deadlock: Cancellation of Socket.ConnectAsync intermitently hangs #42198
- HTTP 2.0 sending RESET before END_STREAM: HTTP/2 stress server intermitently fails in duplex scenarios #42200
- HTTP 2.0 throwing
TaskCancelledException
as a reaction on GO_AWAY: HTTP/2 stress test TaskCanceledException when client hasn't cancelled #42472 - NEW HTTP 2.0 infinite loop when connecting H2C to HTTP 1.0 localhost server: H2C connection to HTTP.sys on localhost causes infinite loop #42259
Discovered exceptions, not-investigated:
- HTTP 2.0 System.Net.Sockets.SocketException (32): Broken pipe. (Linux only)
- HTTP 2.0 System.IO.IOException: The response ended prematurely while waiting for the next frame from the server.
HTTP 2.0 System.Threading.Tasks.TaskCanceledException: The operation was canceled.HTTP/2 stress test TaskCanceledException when client hasn't cancelled #42472- HTTP 2.0 System.Threading.Tasks.TaskCanceledException: A task was canceled. --> might be covered by HTTP/2 stress test TaskCanceledException when client hasn't cancelled #42472
- HTTP 2.0 System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host. (Windows only)
- HTTP 2.0 System.Net.Sockets.SocketException (10053): An established connection was aborted by the software in your host machine. (Windows only)
- NEW HTTP 1.1 System.Net.Sockets.SocketException (10060): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (Windows only)
The discovered exceptions confirm what we've collected so far from the pipelines: #40388.
Distributable tasks by priority:
- More HTTP 1.1 Linux runs: http11run
- More HTTP 2.0 Linux runs: http20run
- Investigate Windows HTTP 1.1 errors: winErr3
- Investigate Windows HTTP 2.0 errors: winErr1, winErr2
- Provide fix for #42200
- Provide fix for #42198
- Help with HTTPSys client connection errors: httpSys: put on back-burner
- Run the tests against .NET Core 3.1: net31: put on back-burner
Tips and Tricks for investigations:
- Clear up docker containers and images after a product code change (
docker container prune && docker image prune -a
) - Once container is built, switch
-b
might be omitted for subsequent re-runs (skips the runtime build) - Don't use containers for investigations, they're slow and rebuild takes too long
- To run the app against locally built runtime, swap
artifacts/bin/testhost/net5.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0/
with the globally installed runtime (/usr/share/dotnet/shared/Microsoft.NETCore.App/your-latest-5.0-version
)- Make a backup of the global runtime!
- Using testhost's
corerun
didn't work for me since the app depends on ASP .NET Core SDK - If you change the product code, rebuild just
System.Net.Http
and copySystem.Net.Http.dll
to the global runtime again
- Build
System.Net.Http/tests/StressTests/HttpStress
- Open 2 terminals and run:
- server:
dotnet run -runMode server -aspnetlog
-aspnetlog
: console logging of server errors-serverUri https://localhost:5002
: bind to a different port (when running multiple tests in parallel)
- client:
dotnet run -runMode client
-serverUri https://localhost:5002
: connect to a different port (when running multiple tests in parallel)-ops 1 2 3
: run only operation 1, 2 and 3 (GET, PUT Slow, etc...)-trace
: saves internal client/server traces in a log file, very verbose, useable only for very short runs
- more options in source code: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.Http/tests/StressTests/HttpStress/Program.cs#L37-L62
- server:
- To run the app against locally built runtime, swap
If you have any improvements to the stress app or the containers, please create a PR and don't keep it just for yourself.
If you have more tips and tricks for running the tests, please share them.