Skip to content

Reduce Pipeline contention redux #36992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented May 25, 2020

@adamsitnik
Copy link
Member

@benaadams it gives a lot of socket errors:

RequestsPerSecond:           14,813
Max CPU (%):                 97
WorkingSet (MB):             385
Avg. Latency (ms):           12.67
Startup (ms):                298
First Request (ms):          59.07
Latency (ms):                0.88
Total Requests:              223,667
Duration: (ms)               15,100
Socket Errors:               223,455
Bad Responses:               0
Build Time (ms):             4,001
Published Size (KB):         102,140
SDK:                         5.0.100-preview.6.20272.2
Runtime:                     5.0.0-preview.6.20272.2
ASP.NET Core:                5.0.0-preview.5.20255.6

I've uploaded the trace file to traces/36992

@benaadams
Copy link
Member Author

The Kestrel sockets one might be busted :-/

@benaadams
Copy link
Member Author

Could you try it without the Kestrel socket change?

@benaadams
Copy link
Member Author

Yeah, Flush invalidates the memory, need to rethink the Kestrel one

@benaadams
Copy link
Member Author

benaadams commented May 26, 2020

With updated Kestrel change dotnet/aspnetcore#22225:

Alas it's a race to avoid the lock; but hopefully it should win more often.

@benaadams
Copy link
Member Author

Updated the dlls (above); and moved the checks earlier to try to win the data race; also added lots of comments.

@benaadams benaadams force-pushed the pipeline-contention-redux branch from 30f5e50 to d6b9038 Compare May 27, 2020 11:50
@adamsitnik
Copy link
Member

@benaadams the results:

Machine Benchmark RPS before RPS after locks/s before locks/s after
Citrine 28 cores PlaintextPlatform 9,368,072 9,299,033 367 303
  JsonPlatform 1,195,092 1,203,339 28 12
  FortunesPlatform 318,144 314,811 41 41
  Fortunes Batching 416,590 415,700 754 625
           
           
Perf 12 cores PlaintextPlatform 5,942,494 5,675,971 128 57
  JsonPlatform 594,832 588,407 10 9
  FortunesPlatform 130,080 130,236 79 56
           
           
ARM 32 cores PlaintextPlatform 6,017,733 5,878,071 349 238
  JsonPlatform 543,213 522,777 170 94
  FortunesPlatform 94,598 95,610 17 20
           
           
Mono 56 cores PlaintextPlatform 6,952,624 6,943,621 6,035 1,527
  JsonPlatform 1,110,368 1,184,403 284 382
           
           
AMD 46 cores JsonPlatform 7,453,850 6,908,422 482 126
  JsonPlatform 693,225 683,765 64 30
  FortunesPlatform 265,688 261,117 21 15

trace file uploaded to traces/36992

@benaadams
Copy link
Member Author

Those results are really strange; when the locks go down it does worse, and when they go up it does better. Wonder what bottleneck it hits when it goes faster, will look at the traces.

@adamsitnik
Copy link
Member

I've used the PerfView to diff a trace file with and without your changes (it's hidden feature of perfview):

The base is before, Test is after, everything filtered by time to after warmup:

obraz

It looks like it's now spending more time in FlushAsync ? This was for JSON I am going to do the same for Plaintext now

@adamsitnik
Copy link
Member

@benaadams Plaintext:

obraz

@benaadams
Copy link
Member Author

Hmm... Plaintext (pipelines 16 requests) might be from making the Memory reservation too short; as I shorterned it to 1024 bytes in dotnet/aspnetcore@9d98835

Json that should be more than enough for the single request/response

@benaadams
Copy link
Member Author

K, Json might be the Memory block being returned after the callback is triggered, so causing contention on the MemoryPool; will tweak

@benaadams
Copy link
Member Author

New dlls in light of the Prefview comparisions

@adamsitnik
Copy link
Member

The RPS has dropped for Plaintext to 9070k, but the trace I have just shows me which things got better (I need to get more traces):

Compared to previous commits to this PR:

obraz

Compared to master;

obraz

@adamsitnik
Copy link
Member

JSON:

new new vs new (almost no changes)

obraz

@benaadams
Copy link
Member Author

So I'm fairly at a loss to explain this; why all the trace data/metrics improve, but the performance degrades

The main issue I can find is the one highlighted by @adamsitnik in #36992 (comment) and its related to FlushAsync and specifically the call in SocketConnection.ProcessReceives

Even though overall lock acquire time falls. it's lock acquire time jumps, which would throttle the throughput of the input data, adding latency back to offset all the gains:

Before:

image

After:

image

Not sure why yet...

@benaadams
Copy link
Member Author

Added FlushAsync(bool isMoreData, CancellationToken cancellationToken = default); would need api review, but can test the effect:

@benaadams
Copy link
Member Author

benaadams commented May 28, 2020

Cuts calls to .GetMemory() to 1.8% of previous

image

@adamsitnik
Copy link
Member

obraz

the plaintext result from "before" is 8kk while it should be more than 9kk. Most probably there was some other work executed on server|client machine.
I am going to re-run them with BenchmarkDriver2 which promises lack of these problems and share new results

@adamsitnik
Copy link
Member

@benaadams I've run the benchmarks two more times for you, here are the results:

obraz

obraz

I'll now try to run the JSON benchmark with bombardier and collect traces that would allow for apples vs apples comparison (run x requests, not run for y seconds)

@benaadams
Copy link
Member Author

I'd be interested a trace of the Mono Json (after) with those high locks; I assume its run on .NET Core despite the name?

@benaadams
Copy link
Member Author

Also the plaintext, with the crazy high locks

@adamsitnik
Copy link
Member

I'd be interested a trace of the Mono Json (after) with those high locks; I assume its run on .NET Core despite the name?

Yes, Mono is just machine name

@adamsitnik
Copy link
Member

@sebastienros sth is wrong with the mono machine. It was working fine all day for me and now it fails without a clear reason. Do you have any ideas why?

PS C:\Projects\aspnet_benchmarks\src\BenchmarksDriver> dotnet run -- --display-output --server $mono --client $monoload --connections 512 --jobs ..\BenchmarksApps\Kestrel\PlatformBenchmarks\benchmarks.plaintext.json --scenario PlaintextPlatform --sdk 5.0.100-preview.6.20272.2 --runtime  5.0.0-preview.6.20272.2 --aspnetcoreversion 5.0.0-preview.5.20255.6  --collect-counters --framework netcoreapp5.0 --collect-trace
[05:26:11.967] WARNING: '--self-contained' has been set implicitly as custom local files are used.
[05:26:11.982] Using worker Wrk
[05:26:12.055] Running session '7469a407e9dd4831908b3382a0cf74f8' with description ''
[05:26:12.058] Starting scenario PlaintextPlatform on benchmark server...
[05:26:12.203] Fetching job: $mono/jobs/8
[05:26:13.523] Job has been selected by the server ...
[05:26:14.193] Job is now building ...
[05:26:35.807] Job failed on benchmark server, stopping...
Microsoft (R) Build Engine version 16.7.0-preview-20270-03+bee129d1b for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

  Determining projects to restore...
  Restored /tmp/benchmarks-agent/benchmarks-server-6/rrcz5cy5.ybt/Benchmarks/src/BenchmarksApps/Kestrel/PlatformBenchmarks/PlatformBenchmarks.csproj (in 757 ms).
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  PlatformBenchmarks -> /tmp/benchmarks-agent/benchmarks-server-6/rrcz5cy5.ybt/Benchmarks/src/BenchmarksApps/Kestrel/PlatformBenchmarks/bin/Release/netcoreapp5.0/linux-x64/PlatformBenchmarks.dll
  PlatformBenchmarks -> /tmp/benchmarks-agent/benchmarks-server-6/rrcz5cy5.ybt/Benchmarks/src/BenchmarksApps/Kestrel/PlatformBenchmarks/published/


Command dotnet publish PlatformBenchmarks.csproj -c Release -o /tmp/benchmarks-agent/benchmarks-server-6/rrcz5cy5.ybt/Benchmarks/src/BenchmarksApps/Kestrel/PlatformBenchmarks/published /p:BenchmarksAspNetCoreVersion=5.0.0-preview.5.20255.6 /p:MicrosoftAspNetCoreAllPackageVersion=5.0.0-preview.5.20255.6 /p:MicrosoftAspNetCoreAppPackageVersion=5.0.0-preview.5.20255.6 /p:BenchmarksNETStandardImplicitPackageVersion=5.0.0-preview.5.20255.6 /p:BenchmarksNETCoreAppImplicitPackageVersion=5.0.0-preview.5.20255.6 /p:BenchmarksRuntimeFrameworkVersion=5.0.0-preview.6.20272.2 /p:BenchmarksTargetFramework=netcoreapp5.0 /p:MicrosoftNETCoreAppPackageVersion=5.0.0-preview.6.20272.2 /p:MicrosoftWindowsDesktopAppPackageVersion=5.0.0-preview.7.20302.1 /p:NETCoreAppMaximumVersion=99.9 /p:MicrosoftNETCoreApp50PackageVersion=5.0.0-preview.6.20272.2 /p:GenerateErrorForMissingTargetingPacks=false /p:MicrosoftNETPlatformLibrary=Microsoft.NETCore.App /p:RestoreNoCache=true --framework netcoreapp5.0 --self-contained -r linux-x64  returned exit code 0

[05:36:34.163] Deleting scenario 'PlaintextPlatform' on benchmark server...

@sebastienros
Copy link
Member

sebastienros commented Jun 2, 2020

http://$mono/jobs/8/output

[STDERR] Unhandled exception. System.IO.IOException: Failed to bind to address http://192.168.1.1:8080: address already in use.

Looks like the port is still in use. Maybe it didn't get cleaned up correctly.

@fanyang-mono Do you think there could be other docker containers still running? In my scripts I force any container starting with benchmarks_ every night just in case one went rogue.

@fanyang-mono
Copy link
Member

http://$mono/jobs/8/output

[STDERR] Unhandled exception. System.IO.IOException: Failed to bind to address http://192.168.1.1:8080: address already in use.

Looks like the port is still in use. Maybe it didn't get cleaned up correctly.

@fanyang-mono Do you think there could be other docker containers still running? In my scripts I force any container starting with benchmarks_ every night just in case one went rogue.

Restart the server and it is working now.

@adamsitnik
Copy link
Member

@fanyang-mono @sebastienros thank you!

@benaadams I've captured and uploaded the traces:

obraz

@benaadams
Copy link
Member Author

benaadams commented Jun 3, 2020

Json after on Mono machine is spending 15% of time in ThreadPool Dequeue

image

Bunch of spinning also

image

Might be Numa issue? (or lots of cores issue)

FYI @stephentoub @kouvel

@benaadams
Copy link
Member Author

Yeah a large amount of the Mono machines time is spent in the synchnorisation primitves

image

@benaadams
Copy link
Member Author

@adamsitnik maybe the Citrine traces would be better; the Mono traces are just showing things can't do much about (with this PR) alas

@adamsitnik
Copy link
Member

Might be Numa issue?

Most probably these threads are waiting for more work.

maybe the Citrine traces would be better

Here you go :D

obraz

@benaadams
Copy link
Member Author

Added API suggestion #37472

@benaadams
Copy link
Member Author

Revist post 5.0

@benaadams benaadams closed this Aug 2, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants