-
Notifications
You must be signed in to change notification settings - Fork 428
Rewrite client connection management #798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Motivation: The client connection management code is pretty bad. It's hard to follow and test reliably. There were some thread safety issues and it's also possible to give a call a multiplexer from a future channel which will fail but may succeed after retrying the connection. It also lacked the abilitiy to close an idle channel (i.e. one with no open HTTP streams). Modifications: - Rewrite the client connection management logic: this is handled by `ConnectionManager` and `ClientConnectivityHandler`. - Channels will idle after 5 minutes of having no open streams (configurability will come in a later PR) - Connectivity state changes are async'd onto a dispatch queue (configurable queue will come later) Result: - Connection management code is much easier to follow and can be tested with an EmbeddedChannel - Fewer bugs - We no longer hold a lock when calling out to the connectivity delegate
4eb954c
to
db13a14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks really good, just a few small notes in the diff.
// if we cancelled too late. | ||
case .transientFailure(let state): | ||
// Stop the creation of a new channel, if we can. | ||
state.scheduled.cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this cancel
may fail, do we need to handle the possibility the connection attempt goes ahead anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question: I think the comment may actually be wrong here; I don't think the cancel can fail: the scheduled task is startConnecting()
: if that gets run the actual connection attempt will start after the transition from transientFailure
to .connecting
(because we submit the connect onto the EventLoop).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, cancel
can always fail. The question is whether we tolerate its failure. I think the answer now is yes, but we should say what we expect to happen here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from a cursory review. Feel free to merge once you and @Lukasa are confident about this; no need to wait for my approval.
Reviewed 14 of 16 files at r1, 1 of 1 files at r2, 1 of 1 files at r3.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @glbrntt and @Lukasa)
Sources/GRPC/ClientConnectivityHandler.swift, line 36 at r3 (raw file):
case ready // We called close on the channel.
super-nit: surround "close" with backticks?
Sources/GRPC/ClientConnectivityHandler.swift, line 48 at r3 (raw file):
if event is NIOHTTP2StreamCreatedEvent { // We have a stream: don't go idle self.scheduledIdle?.cancel()
also set this to nil afterwards?
Sources/GRPC/ClientConnectivityHandler.swift, line 49 at r3 (raw file):
// We have a stream: don't go idle self.scheduledIdle?.cancel() self.activeStreams += 1
I assume these operations do not need to be thread-safe or atomic?
Sources/GRPC/ClientConnectivityHandler.swift, line 53 at r3 (raw file):
self.activeStreams -= 1 // No active streams: go idle soon. if self.activeStreams == 0 {
I wonder whether cancelling/scheduling could be done as a side effect on the activeStreams
setter instead?
Sources/GRPC/ClientConnectivityHandler.swift, line 90 at r3 (raw file):
if frame.streamID == .rootStream { switch (self.state, frame.payload) { // We only care about SETTINGS before we're `.ready`
s/before we're/as long as we are not yet/?
Sources/GRPC/ClientConnectivityHandler.swift, line 123 at r3 (raw file):
} self.scheduledIdle = context.eventLoop.scheduleTask(in: self.idleTimeout) {
Should any previous scheduled idle be cancelled, or can we 100% guarantee that that won't be necessary?
Sources/GRPC/ClientConnectivityHandler.swift, line 124 at r3 (raw file):
self.scheduledIdle = context.eventLoop.scheduleTask(in: self.idleTimeout) { self.idle(context: context)
should we reference self
weakly here?
Sources/GRPC/ConnectionManager.swift, line 152 at r3 (raw file):
// Create a new id; it'll be used for the *next* channel we create. self.channelNumber &+= 1
I assume HTTP2 allows for channel numbers >= 2^63?
Sources/GRPC/ConnectionManager.swift, line 262 at r3 (raw file):
// We don't have a channel and we don't want one, easy! case .idle: shutdown = ShutdownState(closeFuture: self.eventLoop.makeSucceededFuture(()))
Sorry, but what is closeFuture for
?
No: we're only ever accessing/modifying from the event loop.
We can if we hold on to the
It won't be necessary: the
No this should be fine: NIO will drop the callback as soon as it's executed/cancelled.
This isn't related to HTTP/2 streams; it's just a count of separate HTTP/2 connections that have existed.
Why do we hold on to it? In case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 15 of 16 files reviewed, 6 unresolved discussions (waiting on @glbrntt, @Lukasa, and @MrMage)
Sources/GRPC/ClientConnectivityHandler.swift, line 90 at r3 (raw file):
Previously, MrMage (Daniel Alm) wrote…
s/before we're/as long as we are not yet/?
I'd prefer an "as long as", IMHO that's clearer (unless that's actually wrong)
Sources/GRPC/ConnectionManager.swift, line 152 at r3 (raw file):
Previously, glbrntt (George Barnett) wrote…
This isn't related to HTTP/2 streams; it's just a count of separate HTTP/2 connections that have existed.
Alright, but are we okay with "overflows"?
Sources/GRPC/ConnectionManager.swift, line 262 at r3 (raw file):
Previously, glbrntt (George Barnett) wrote…
Why do we hold on to it? In case
shutdown()
is called twice it should return the same value.
Sorry, I don't quite understand what the future means exactly.
I think "as long as we're not
Yes: it's only used to disambiguate between different
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from my side, but I would still appreciate a slightly different wording for that one comment.
Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @glbrntt and @Lukasa)
Sources/GRPC/ClientConnectivityHandler.swift, line 90 at r3 (raw file):
Previously, glbrntt (George Barnett) wrote…
I think "as long as we're not
.ready
" suggestsclosed
is also okay (which it isn't).
How about this? "// We only care about SETTINGS as long as we are in state .notReady
." IMHO that has the least possible ambiguity. At least the original comment had confused me a bit.
Fine by me; I've taken it verbatim. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor note.
Motivation: Connections should be dropped to save resources if there are not RPCs for an idle timeout. This was added for clients in grpc#798; this adds similart functionality to the server. Modifications: - Add a `GRPCServerIdleHandler` to drop connections after 5 minutes of no RPCs Note: timeout configuration will be added for client and server in a followup PR. Result: - Server's will drop connections if the client isn't doing anything. - Resolves grpc#706
Motivation: Connections should be dropped to save resources if there are not RPCs for an idle timeout. This was added for clients in grpc#798; this adds similart functionality to the server. Modifications: - Add a `GRPCServerIdleHandler` to drop connections after 5 minutes of no RPCs Note: timeout configuration will be added for client and server in a followup PR. Result: - Server's will drop connections if the client isn't doing anything. - Resolves grpc#706
Motivation: Connections should be dropped to save resources if there are not RPCs for an idle timeout. This was added for clients in grpc#798; this adds similart functionality to the server. Modifications: - Rename a `ClientConnectivityHandler` to `GRPCIdleHandler` - Add a 'mode' to the idle handler Note: timeout configuration will be added for client and server in a followup PR. Result: - Server's will drop connections if the client isn't doing anything. - Resolves grpc#706
Motivation: Connections should be dropped to save resources if there are not RPCs for an idle timeout. This was added for clients in grpc#798; this adds similart functionality to the server. Modifications: - Rename a `ClientConnectivityHandler` to `GRPCIdleHandler` - Add a 'mode' to the idle handler Note: timeout configuration will be added for client and server in a followup PR. Result: - Server's will drop connections if the client isn't doing anything. - Resolves grpc#706
Motivation: Connections should be dropped to save resources if there are not RPCs for an idle timeout. This was added for clients in grpc#798; this adds similart functionality to the server. Modifications: - Rename a `ClientConnectivityHandler` to `GRPCIdleHandler` - Add a 'mode' to the idle handler Note: timeout configuration will be added for client and server in a followup PR. Result: - Server's will drop connections if the client isn't doing anything. - Resolves grpc#706
Motivation: Connections should be dropped to save resources if there are not RPCs for an idle timeout. This was added for clients in #798; this adds similart functionality to the server. Modifications: - Rename a `ClientConnectivityHandler` to `GRPCIdleHandler` - Add a 'mode' to the idle handler Note: timeout configuration will be added for client and server in a followup PR. Result: - Server's will drop connections if the client isn't doing anything. - Resolves #706
Motivation:
The client connection management code is pretty bad. It's hard to follow
and test reliably. There were some thread safety issues and it's also
possible to give a call a multiplexer from a future channel which will
fail but may succeed after retrying the connection. It also lacked the
abilitiy to close an idle channel (i.e. one with no open HTTP streams).
Modifications:
ConnectionManager
andClientConnectivityHandler
.(configurability will come in a later PR)
(configurable queue will come later)
Result:
with an EmbeddedChannel
This change is