Move DotNetty batching scheduling off of DotNetty STEE and onto HashedWheelTimer #4678

Aaronontheweb · 2020-12-17T22:52:16Z

Aaronontheweb · 2020-12-17T23:06:21Z

Raw throughput numbers were the same as dev on this branch - so no real impact to end to end performance.

However, significant change in idle CPU numbers!

Some important notes:

clusterwebcrawler_webcrawler.lighthouse2_1 - isn't in the cluster although it is running Akka.Remote
clusterwebcrawler_webcrawler.lighthouse_1 IS in the cluster but IS NOT using batching
all other services inside the cluster are using batching

Before

CONTAINER ID        NAME                                            CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O   
 PIDS                                                                                                                                                             
84ccf601e906        clusterwebcrawler_webcrawler.lighthouse2_1      0.55%               35.52MiB / 50.17GiB   0.07%               1.2kB / 0B          0B / 0B     
 21                                                                                                                                                               
37fd1e313cdb        clusterwebcrawler_webcrawler.web_1              9.98%               79.53MiB / 50.17GiB   0.15%               98.8kB / 97kB       0B / 0B     
 56                                                                                                                                                               
57d530c02a47        clusterwebcrawler_webcrawler.trackerservice_1   10.96%              48.97MiB / 50.17GiB   0.10%               99.9kB / 99kB       0B / 0B     
 39                                                                                                                                                               
5737cbaed752        clusterwebcrawler_webcrawler.crawlservice_1     11.31%              44.49MiB / 50.17GiB   0.09%               97.3kB / 97.4kB     0B / 0B     
 39                                                                                                                                                               
66dd2a7fe096        clusterwebcrawler_webcrawler.lighthouse_1       4.36%               45.39MiB / 50.17GiB   0.09%               106kB / 109kB       0B / 0B     
 27

After

 CONTAINER ID        NAME                                            CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O
 PIDS
b332f021fbe1        clusterwebcrawler_webcrawler.lighthouse2_1      0.52%               35.91MiB / 50.17GiB   0.07%               1.27kB / 0B         0B / 0B
 21
0093f3510df3        clusterwebcrawler_webcrawler.crawlservice_1     5.64%               52.1MiB / 50.17GiB    0.10%               201kB / 202kB       0B / 0B
 40
d738c3391a89        clusterwebcrawler_webcrawler.trackerservice_1   4.96%               48.66MiB / 50.17GiB   0.09%               201kB / 201kB       0B / 0B
 39
0392e8f7835d        clusterwebcrawler_webcrawler.web_1              5.29%               85.66MiB / 50.17GiB   0.17%               211kB / 204kB       0B / 0B
 57
fd492a3b6ff5        clusterwebcrawler_webcrawler.lighthouse_1       4.42%               46.12MiB / 50.17GiB   0.09%               216kB / 221kB       0B / 0B
 23

Looks like a ~50% drop in idle CPU consumption to me. The nodes in the cluster using batching are now in the same ballpark is the ones that aren't.

Aaronontheweb · 2020-12-18T00:11:42Z

src/core/Akka.Remote/Transport/DotNetty/BatchWriter.cs

+                () =>
+                {
+                    // want to fire this event through the top of the pipeline
+                    context.Channel.Pipeline.FireUserEventTriggered(FlushCmd.Instance);


I wanted to use an event for this as this will allow me to do some sampling around how often the FlushCmd has to be invoked separately from other flushes. But that may end up being a distinction without difference depending on how I write the sampler.

to11mtm

One question and one nit but looks good otherwise

to11mtm · 2020-12-18T00:05:57Z

src/core/Akka.Remote/Transport/DotNetty/BatchWriter.cs

        public readonly BatchWriterSettings Settings;
+        public readonly IScheduler Scheduler;
+        private ICancelable _flushSchedule;

        internal bool CanSchedule { get; private set; } = true;


It looks like we don't use CanSchedule anywhere, we are only setting it. Should it be removed?

I'll remove it

to11mtm · 2020-12-18T00:19:10Z

src/core/Akka.Remote/Transport/DotNetty/BatchWriter.cs

+            _flushSchedule = Scheduler.Advanced.ScheduleRepeatedlyCancelable(Settings.FlushInterval,
+                Settings.FlushInterval,
+                () =>
+                {
+                    // want to fire this event through the top of the pipeline
+                    context.Channel.Pipeline.FireUserEventTriggered(FlushCmd.Instance);
+                });


So this is one heck of an edge case.... But do we want to have any sort of warning if the flush interval is for some unknown reason less than tick-duration on HWT?

No we don't - but it shouldn't be an issue. The scheduler will just put it in the next available bucket if that's the case.

Aaronontheweb · 2020-12-18T16:10:40Z

src/core/Akka.Remote/Transport/DotNetty/BatchWriter.cs

@@ -114,72 +121,66 @@ public override Task WriteAsync(IChannelHandlerContext context, object message)
             * across the network.
             */
            var write = base.WriteAsync(context, message);
-            if (Settings.EnableBatching)


Removed the Settings.EnableBatching checks - this stage never runs if this is false.

Aaronontheweb added 2 commits December 17, 2020 16:10

moved DotNetty batch writer scheduling to HashedWheelTimer

c01bae9

force all flush activity to happen inside pipeline

8f2d07f

Aaronontheweb added akka-remote perf dotnetty labels Dec 17, 2020

Aaronontheweb mentioned this pull request Dec 17, 2020

High Idle CPU in DotNetty #4636

Closed

Aaronontheweb commented Dec 18, 2020

View reviewed changes

to11mtm approved these changes Dec 18, 2020

View reviewed changes

cleaned up batchwriter

9542709

Aaronontheweb commented Dec 18, 2020

View reviewed changes

Aaronontheweb merged commit 91e9c50 into akkadotnet:dev Dec 18, 2020

Aaronontheweb deleted the fix/DotNetty-scheduler-performance branch December 18, 2020 19:43

Aaronontheweb mentioned this pull request Dec 22, 2020

Auto-tuning DotNetty batching + removing scheduler from batching system #4685

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move DotNetty batching scheduling off of DotNetty STEE and onto HashedWheelTimer #4678

Move DotNetty batching scheduling off of DotNetty STEE and onto HashedWheelTimer #4678

Aaronontheweb commented Dec 17, 2020

Aaronontheweb commented Dec 17, 2020

Aaronontheweb Dec 18, 2020

to11mtm left a comment

to11mtm Dec 18, 2020

Aaronontheweb Dec 18, 2020

to11mtm Dec 18, 2020

Aaronontheweb Dec 18, 2020

Aaronontheweb Dec 18, 2020

Move DotNetty batching scheduling off of DotNetty STEE and onto HashedWheelTimer #4678

Move DotNetty batching scheduling off of DotNetty STEE and onto HashedWheelTimer #4678

Conversation

Aaronontheweb commented Dec 17, 2020

Aaronontheweb commented Dec 17, 2020

Before

After

Aaronontheweb Dec 18, 2020

Choose a reason for hiding this comment

to11mtm left a comment

Choose a reason for hiding this comment

to11mtm Dec 18, 2020

Choose a reason for hiding this comment

Aaronontheweb Dec 18, 2020

Choose a reason for hiding this comment

to11mtm Dec 18, 2020

Choose a reason for hiding this comment

Aaronontheweb Dec 18, 2020

Choose a reason for hiding this comment

Aaronontheweb Dec 18, 2020

Choose a reason for hiding this comment