Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-tuning DotNetty batching + removing scheduler from batching system #4685

Merged
merged 6 commits into from
Dec 28, 2020

Conversation

Aaronontheweb
Copy link
Member

@Aaronontheweb Aaronontheweb commented Dec 22, 2020

Ported the FlushConsolidationHandler from Netty and used it to auto-tune batching inside Akka.Remote without having to set explicit thresholds and without having to rely on the IScheduler.

This accomplishes:

  1. Lower Idle CPU consumption than even Move DotNetty batching scheduling off of DotNetty STEE and onto HashedWheelTimer #4678
  2. Significantly lower latencies on systems that don't write heavily
  3. No need for users to manually tune their threshold settings in akka.remote.dot-netty.tcp.batching - this is now handled automatically by the FlushConsolidationHandler.

close #4636
close #4563

We still have some more work to do optimizing the DedicatedThreadPool to scale down automatically, but this eliminates most of the CPU noise coming from DotNetty.

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Dec 22, 2020

How It Works

The FlushConsolidationHandler works by batching flushes together, rather than writes, and in this patch we've changed the TcpAssociationHandle to always call IChannel.WriteAndFlushAsync - which creates a 1:1 correlation between writes and flushes.

The algorithm is designed to capture flushes that occur in rapid succession and group them together in order to lower the total number of system calls to the socket, which improves average throughput and decreases CPU utilization.

The flushes are batched together when:

  1. The socket is currently performing a read or
  2. The total number of flushes is less than DefaultExplicitFlushAfterFlushes or whatever the configured value is - defaults to 30 in Akka.NET.

The flushes are allowed to pass and write out to the socket when:

  1. The socket is writing but not reading (immediate write);
  2. The total number of flushes is equal to DefaultExplicitFlushAfterFlushes; or
  3. A flush has been scheduled onto the EventLoop without being cancelled.

In the third case, we don't use a time-based delay to flush the socket - instead the "flush" call is simply added to the same event queue where all of the read and write events are. It works identically to an actor's mailbox. All of the writes queued up prior to that event get flushed together.

Copy link
Contributor

@IgorFedchenko IgorFedchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just left minor note and a question about flushing while reading is in process

@@ -2,7 +2,7 @@
<PropertyGroup>
<Copyright>Copyright © 2013-2020 Akka.NET Team</Copyright>
<Authors>Akka.NET Team</Authors>
<VersionPrefix>1.4.13</VersionPrefix>
<VersionPrefix>1.4.14</VersionPrefix>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should not be changed in this PR, it's just build,cmd updated this file when was running locally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct

// we only need to flush if we reach the explicitFlushAfterFlushes limit.
if (++_flushPendingCount == ExplicitFlushAfterFlushes)
{
FlushNow(context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when reading is complete we will flush in ChannelReadComplete anyway. But we are trying to flush right here if there are too many flushes are pending? Are we able to flush while reading is in process? Or this is safe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if a read is sitting in the channel pipeline right now we assume it means that some data is being read by the current application and that might, presumably, be used to produce a response right away - we're trying to batch those writes together into as few flushes as possible and when the read "completes" (we've finished reading all of the data currently inside the buffer) it's safe to flush any writes that are currently pending. This helps reduce latency in lower traffic system AND increases throughput in higher traffic systems.

@Aaronontheweb Aaronontheweb merged commit b8e74e0 into akkadotnet:dev Dec 28, 2020
@Aaronontheweb Aaronontheweb deleted the feature/FlushConsolidator branch December 28, 2020 18:33
Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this pull request Dec 30, 2020
Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this pull request Dec 30, 2020
Aaronontheweb added a commit that referenced this pull request Dec 30, 2020
* stubbing out performance documentation per #4685

* close #4685
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High Idle CPU in DotNetty Akka.Remote - exhaustion of TCP buffer after updating from 1.3.8 to 1.4.6
2 participants