Skip to content

Dispatch connection execution #12265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 18, 2019
Merged

Conversation

davidfowl
Copy link
Member

  • Dispatch connection initialization to the thread pool to avoid executing the first read on the IO thread (longer explanation in the bug).

Fixes #10956

Before:

RequestsPerSecond:           8,762
Max CPU (%):                 27
WorkingSet (MB):             152
Avg. Latency (ms):           18.5
Startup (ms):                305
First Request (ms):          56.04
Latency (ms):                0.61
Total Requests:              132,096
Duration: (ms)               15,080
Socket Errors:               0
Bad Responses:               0
SDK:                         3.0.100-preview8-013280
Runtime:                     3.0.0-preview8-27914-06
ASP.NET Core:                3.0.0-preview8.19366.5

After:

RequestsPerSecond:           10,232
Max CPU (%):                 34
WorkingSet (MB):             152
Avg. Latency (ms):           15.84
Startup (ms):                355
First Request (ms):          86.5
Latency (ms):                0.56
Total Requests:              154,456
Duration: (ms)               15,100
Socket Errors:               0
Bad Responses:               0
SDK:                         3.0.100-preview8-013280
Runtime:                     3.0.0-preview8-27914-06
ASP.NET Core:                3.0.0-preview8.19366.5

- Seeing if this helps performance of connection: close based connections
@davidfowl
Copy link
Member Author

/azp run AspNetCore-ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davidfowl
Copy link
Member Author

Ran plaintext on linux to make sure there was no regression as well:

All results:
| Description |       RPS | CPU (%) | Memory (MB) | Avg. Latency (ms) | Startup (ms) | First Request (ms) | Latency (ms) | Errors |
| ----------- | --------- | ------- | ----------- | ----------------- | ------------ | ------------------ | ------------ | ------ |
|           √ | 4,187,853 |     100 |         108 |              0.91 |     229.0775 |            47.2693 |        0.118 |      0 |
|           √ | 4,171,845 |      99 |         110 |              1.15 |      231.275 |            46.6587 |       0.1219 |      0 |
|           √ | 4,119,913 |      99 |         108 |              1.12 |      230.285 |            48.1724 |        0.126 |      0 |

RequestsPerSecond:           4,159,870
Max CPU (%):                 99
WorkingSet (MB):             109
Avg. Latency (ms):           1.06
Startup (ms):                230
First Request (ms):          47.37
Latency (ms):                0.12
Total Requests:              62,811,987
Duration: (ms)               15,100
Socket Errors:               0
Bad Responses:               0
SDK:                         3.0.100-preview8-013289
Runtime:                     3.0.0-preview8-27914-06
ASP.NET Core:                3.0.0-preview8.19366.10

After:

All results:
| Description |       RPS | CPU (%) | Memory (MB) | Avg. Latency (ms) | Startup (ms) | First Request (ms) | Latency (ms) | Errors |
| ----------- | --------- | ------- | ----------- | ----------------- | ------------ | ------------------ | ------------ | ------ |
|           √ | 4,153,552 |      99 |         106 |              1.02 |     242.3501 |            87.4809 |       0.1417 |      0 |
|           √ | 4,174,667 |      99 |         102 |              1.08 |     243.9197 |            107.139 |       0.1604 |      0 |
|           √ | 4,136,465 |      99 |         106 |               1.1 |     242.5382 |            86.8701 |       0.1476 |      0 |

RequestsPerSecond:           4,154,895
Max CPU (%):                 99
WorkingSet (MB):             105
Avg. Latency (ms):           1.07
Startup (ms):                243
First Request (ms):          93.83
Latency (ms):                0.15
Total Requests:              62,737,229
Duration: (ms)               15,100
Socket Errors:               0
Bad Responses:               0
SDK:                         3.0.100-preview8-013289
Runtime:                     3.0.0-preview8-27914-06
ASP.NET Core:                3.0.0-preview8.19366.10

@davidfowl
Copy link
Member Author

First request looks worse tho

@jkotalik
Copy link
Contributor

@davidfowl I feel like some else may have caused the first request regression. I'm running on master right now (without your change) and I see first request times around 86 ms.

@halter73
Copy link
Member

I wouldn't be shocked if the first request regression was real. That said getting the connection throughput back is probably worth it.

It would be interesting to compare the first-requests latency before the bedrock refactor and after this dispatch change. I also wouldn't be surprised if we had higher latency prior to the refactor, but didn't really notice since we don't pay as much attention to first request latency compared to RPS.

@davidfowl
Copy link
Member Author

I wouldn't be shocked if the first request regression was real. That said getting the connection throughput back is probably worth it.

Yep I was thinking the same thing. I want to make sure the other kestrel changes happen before this one so the regression doesn't end up in the same run as the other changes.

It would be interesting to compare the first-requests latency before the bedrock refactor and after this dispatch change. I also wouldn't be surprised if we had higher latency prior to the refactor, but didn't really notice since we don't pay as much attention to first request latency compared to RPS.

I spent last night looking at this and the bedrock refactor:

image

The startup time during the bedrock refactor went from 19ms to 29sms.

image

@davidfowl davidfowl merged commit 7966f25 into master Jul 18, 2019
@ghost ghost deleted the davidfowl/dispatch-connection-execution branch July 18, 2019 13:36
@NinoFloris
Copy link

34% cpu utilization seems to leave a lot on the table for connections per second. 17k – which it was before the regression – is already pretty far below what would be expected.

Even nginx claims to do 34k CPS on a single core or 250k+ on 16 cores. Erlang and Go easily do 100k+ as well.
https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/

/cc @tmds @benaadams

@jkotalik
Copy link
Contributor

@davidfowl I don't think the regression in startup corresponds with your Bedrock refactor. What is interesting is that before and after the regression, there were no changes to either the ASP.NET Core runtime and the .NET Runtime. (aspnetcore was 3.0.0-preview6-19253-01 and dotnetcore was 3.0.0-preview6-27714-15). This suggests a machine issue for the time to first response. We don't have a similar regression on Linux; Startup time has consistently dropped there.

@NinoFloris I agree that our connection close scenario is an area to improve. Our linux numbers are slightly better (around 30k CPS) but I believe the machines have more than one core.

@NinoFloris
Copy link

NinoFloris commented Jul 18, 2019

@jkotalik

This suggests a machine issue for the time to first response

Didn't it bounce back after that single 29 run as well?

Our linux numbers are slightly better (around 30k CPS) but I believe the machines have more than one core.

That's on physical with a bazillion cores right?

CPU utilization on linux for 30k CPS looks to be ~10%

@jkotalik
Copy link
Contributor

From what I remember, there was a driver issue on the Windows perf machines that haven't been addressed. @sebastienros for more info?

@NinoFloris
Copy link

This might be relevant for CPS and non pipelined investigations too https://github.com/eloraiby/fs-pacer/blob/master/fs-pacer.md
He works for Samsung, started out on CoreFX Sockets but had to switch to libuv mainly because using Sockets CPU utilization was subpar.

@halter73
Copy link
Member

halter73 commented Jul 18, 2019

Thanks for the input @NinoFloris. I hadn't seen the nginx CPS benchmarks before, but it seems that we both decided to measure CPS performance using wrk with a "Connection: close" request header. There are some slight differences like we respond with "Hello World" and nginx responds with an empty body, but I don't think that should make much of a difference.

The more significant difference is that the nginx CPS benchmarks only use 50 vs our 256 concurrent connections. 256 connections might not be optimal, since that was mostly tuned for the TechEmpower Pipelined Plaintext scenario. Or maybe nginx is effectively handicapping itself and still winning.

Either way, it looks like nginx is using better hardware for both their benchmarking servers and clients than we do (e.g. 2x Xeon E5‑2699 vs 1x Xeon E5-1650, 2x Intel XL710 40GbE vs 1x Intel X540 10GbE, etc...). Though as you mention, nginx outperforms even on a single core. We're looking at getting 40GbE NICs soon.

All that said, it's always nice to know what's possible, and I'm sure we're leaving per-connection performance on the table. So far, we definitely focused more of our optimization efforts on improving RPS once a connection is already established.

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Perf] Connection creation
6 participants