Internalize reconnection mechanics and introduce connection pooling #492

jchambers · 2017-07-03T20:46:41Z

After several false starts, I'm pleased to share this work-in-progress attempt at channel pooling. In the past, the model was that a single ApnsClient would have a single connection to the APNs server. In this pull request, each client now has an ApnsChannelPool which manages access to one or more connections. Connections are created on demand, which eliminates the notion of explicitly connecting (or reconnecting) an ApnsClient.

Because the model is changing significantly, this change will introduce lots of breaking API changes. I'm very interested in all of your feedback.

Again, I emphasize that this is a work in progress, but I do think it should be fairly representative of the shape of things to come. I'm also hopeful that this will resolve the host of reconnection issues folks have been reporting lately. In particular, I believe this:

TODO:

Restore/update metrics
Restore exponential backoff when opening new connections
Add factory/pool tests
Document everything

jchambers · 2017-07-03T20:48:20Z

benchmark/src/main/java/com/turo/pushy/apns/ApnsClientBenchmark.java

@@ -52,6 +52,9 @@
    @Param({"10000"})
    public int notificationCount;

+    @Param({"1", "4", "8"})
+    public int concurrentConnections;


For the curious, early benchmarks show things to be maaaaaaaaybe slightly faster (it's pretty much in the noise) than they were before channel pooling with a single connection. Unsurprisingly, a single client's throughput goes up significantly when using multiple threads/connections.

clslrns · 2017-07-21T14:17:44Z

This one would be very useful.
I've managed to get only around 11K notifications/s from single connection with 8-threaded event loop group on machine with 8 cores. Neither network, nor CPU was fully loaded. So I switched to 8 single-threaded APNSClients: throughput improved to 35-44K notifications/s.

jchambers · 2017-09-03T20:32:19Z

Aside from the part where I think the Travis CI team is trolling me by constantly fixing and then un-fixing a weird hostname bug for OpenJDK 7, I think this is ready to roll. I'll sleep on it before I merge it, but anticipate a release some time tomorrow.

jchambers · 2017-09-16T01:45:47Z

For those of you waiting patiently for this change (and release): given the significance of the changes, we want to observe it in production for a little while on our end before we unleash it upon the wider world. I'm hoping for a release at the end of next week.

jchambers · 2017-09-18T01:39:22Z

Our work in #515 prompted some testing with very high levels of leak detection, which in turn increased load and slowed everything down. That, in turn, revealed some sporadic test failures when sending many notifications, which hints at a race condition. We're tracking it down, but it's something we'll need to resolve before the next release.

jchambers · 2017-09-18T03:47:53Z

Here's what I've figured out so far:

The "send many notifications" tests are sometimes failing because a "send" future fails because the stream was closed before the server sent a reply.
The stream appears to have been closed because the underlying channel was closed.
The underlying channel appears to have been closed because we tried to acquire it from the idle pool, but then noticed that it wasn't active.

The last part is really puzzling to me, and will be the focus of the investigation from here.

jchambers · 2017-09-18T19:06:33Z

In all cases of failure, the client sends a GOAWAY frame to the server, and the last stream created by the client is 143. This is shaping up to be either a spectacularly dumb or really interesting bug.

jchambers · 2017-09-18T19:25:44Z

Found it. We're attaching response promises to streams asynchronously in the client handler, and in some cases, we can get a reply from the server before that promise is in place, which causes a NullPointerException. This was likely also a problem in 0.10, but we just didn't notice it until now.

… avoid race conditions.

jchambers · 2017-09-18T21:04:00Z

Everything is fixed in ed1d764.

jchambers commented Jul 3, 2017

View reviewed changes

jchambers force-pushed the channel_pool branch from eafc457 to 2a2652e Compare July 3, 2017 23:35

jchambers added this to the v0.11 milestone Jul 3, 2017

jchambers force-pushed the channel_pool branch from 4ea0185 to 17b459d Compare July 4, 2017 04:00

jchambers mentioned this pull request Jul 4, 2017

A certain period of time, network differential, Pushy constantly re connect APNs server, resulting in paralysis of the company network #489

Closed

jchambers force-pushed the channel_pool branch 6 times, most recently from c79a916 to 57c9e12 Compare July 8, 2017 01:48

jchambers mentioned this pull request Aug 6, 2017

how can i create multiple connections for improving performance #500

Closed

This was referenced Aug 17, 2017

Use of ReferenceCountedSSLContext #502

Closed

ClientNotConnectedException When Creating ApnsClients #503

Closed

jchambers force-pushed the channel_pool branch from 57c9e12 to d36b7e9 Compare September 3, 2017 17:38

jchambers force-pushed the channel_pool branch 2 times, most recently from f444a70 to 1dc2d7e Compare September 10, 2017 15:50

Added channel pooling and internalized reconnection mechanics.

c8ce83b

jchambers force-pushed the channel_pool branch from 0aeafbd to c8ce83b Compare September 10, 2017 16:34

This was referenced Sep 15, 2017

ApnsClient Reconnection issue #512

Closed

Restore reference-counted SSL providers #515

Merged

Attach push notification/response promise to streams in event loop to…

ed1d764

… avoid race conditions.

jchambers force-pushed the channel_pool branch from ddea9dc to ed1d764 Compare September 18, 2017 21:01

jchambers merged commit e106400 into master Sep 20, 2017

jchambers mentioned this pull request Oct 11, 2017

Handle multiple sets of credentials with a single ApnsClient instance #540

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Internalize reconnection mechanics and introduce connection pooling #492

Internalize reconnection mechanics and introduce connection pooling #492

Uh oh!

jchambers commented Jul 3, 2017 •

edited

Loading

Uh oh!

jchambers Jul 3, 2017 •

edited

Loading

Uh oh!

clslrns commented Jul 21, 2017 •

edited

Loading

Uh oh!

jchambers commented Sep 3, 2017

Uh oh!

jchambers commented Sep 16, 2017

Uh oh!

jchambers commented Sep 18, 2017

Uh oh!

jchambers commented Sep 18, 2017

Uh oh!

jchambers commented Sep 18, 2017

Uh oh!

jchambers commented Sep 18, 2017 •

edited

Loading

Uh oh!

jchambers commented Sep 18, 2017

Uh oh!

Uh oh!

Internalize reconnection mechanics and introduce connection pooling #492

Internalize reconnection mechanics and introduce connection pooling #492

Uh oh!

Conversation

jchambers commented Jul 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jchambers Jul 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clslrns commented Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jchambers commented Sep 3, 2017

Uh oh!

jchambers commented Sep 16, 2017

Uh oh!

jchambers commented Sep 18, 2017

Uh oh!

jchambers commented Sep 18, 2017

Uh oh!

jchambers commented Sep 18, 2017

Uh oh!

jchambers commented Sep 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jchambers commented Sep 18, 2017

Uh oh!

Uh oh!

jchambers commented Jul 3, 2017 •

edited

Loading

jchambers Jul 3, 2017 •

edited

Loading

clslrns commented Jul 21, 2017 •

edited

Loading

jchambers commented Sep 18, 2017 •

edited

Loading