Skip to content

Rewrite client connection management #798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 21, 2020

Conversation

glbrntt
Copy link
Collaborator

@glbrntt glbrntt commented May 15, 2020

Motivation:

The client connection management code is pretty bad. It's hard to follow
and test reliably. There were some thread safety issues and it's also
possible to give a call a multiplexer from a future channel which will
fail but may succeed after retrying the connection. It also lacked the
abilitiy to close an idle channel (i.e. one with no open HTTP streams).

Modifications:

  • Rewrite the client connection management logic: this is handled by
    ConnectionManager and ClientConnectivityHandler.
  • Channels will idle after 5 minutes of having no open streams
    (configurability will come in a later PR)
  • Connectivity state changes are async'd onto a dispatch queue
    (configurable queue will come later)

Result:

  • Connection management code is much easier to follow and can be tested
    with an EmbeddedChannel
  • Fewer bugs
  • We no longer hold a lock when calling out to the connectivity delegate

This change is Reviewable

@glbrntt
Copy link
Collaborator Author

glbrntt commented May 15, 2020

Resolves #756 and another issue mentioned in #783. Resolves the client half of #706.

@glbrntt glbrntt requested a review from Lukasa May 15, 2020 12:15
Motivation:

The client connection management code is pretty bad. It's hard to follow
and test reliably. There were some thread safety issues and it's also
possible to give a call a multiplexer from a future channel which will
fail but may succeed after retrying the connection. It also lacked the
abilitiy to close an idle channel (i.e. one with no open HTTP streams).

Modifications:

- Rewrite the client connection management logic: this is handled by
  `ConnectionManager` and `ClientConnectivityHandler`.
- Channels will idle after 5 minutes of having no open streams
  (configurability will come in a later PR)
- Connectivity state changes are async'd onto a dispatch queue
  (configurable queue will come later)

Result:

- Connection management code is much easier to follow and can be tested
  with an EmbeddedChannel
- Fewer bugs
- We no longer hold a lock when calling out to the connectivity delegate
@glbrntt glbrntt force-pushed the gb-connectivity-manager branch from 4eb954c to db13a14 Compare May 15, 2020 12:27
Copy link
Collaborator

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks really good, just a few small notes in the diff.

// if we cancelled too late.
case .transientFailure(let state):
// Stop the creation of a new channel, if we can.
state.scheduled.cancel()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this cancel may fail, do we need to handle the possibility the connection attempt goes ahead anyway?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question: I think the comment may actually be wrong here; I don't think the cancel can fail: the scheduled task is startConnecting(): if that gets run the actual connection attempt will start after the transition from transientFailure to .connecting (because we submit the connect onto the EventLoop).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, cancel can always fail. The question is whether we tolerate its failure. I think the answer now is yes, but we should say what we expect to happen here.

@MrMage MrMage self-requested a review May 15, 2020 13:19
Copy link
Collaborator

@MrMage MrMage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from a cursory review. Feel free to merge once you and @Lukasa are confident about this; no need to wait for my approval.

Reviewed 14 of 16 files at r1, 1 of 1 files at r2, 1 of 1 files at r3.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @glbrntt and @Lukasa)


Sources/GRPC/ClientConnectivityHandler.swift, line 36 at r3 (raw file):

    case ready

    // We called close on the channel.

super-nit: surround "close" with backticks?


Sources/GRPC/ClientConnectivityHandler.swift, line 48 at r3 (raw file):

    if event is NIOHTTP2StreamCreatedEvent {
      // We have a stream: don't go idle
      self.scheduledIdle?.cancel()

also set this to nil afterwards?


Sources/GRPC/ClientConnectivityHandler.swift, line 49 at r3 (raw file):

      // We have a stream: don't go idle
      self.scheduledIdle?.cancel()
      self.activeStreams += 1

I assume these operations do not need to be thread-safe or atomic?


Sources/GRPC/ClientConnectivityHandler.swift, line 53 at r3 (raw file):

      self.activeStreams -= 1
      // No active streams: go idle soon.
      if self.activeStreams == 0 {

I wonder whether cancelling/scheduling could be done as a side effect on the activeStreams setter instead?


Sources/GRPC/ClientConnectivityHandler.swift, line 90 at r3 (raw file):

    if frame.streamID == .rootStream {
      switch (self.state, frame.payload) {
      // We only care about SETTINGS before we're `.ready`

s/before we're/as long as we are not yet/?


Sources/GRPC/ClientConnectivityHandler.swift, line 123 at r3 (raw file):

    }

    self.scheduledIdle = context.eventLoop.scheduleTask(in: self.idleTimeout) {

Should any previous scheduled idle be cancelled, or can we 100% guarantee that that won't be necessary?


Sources/GRPC/ClientConnectivityHandler.swift, line 124 at r3 (raw file):

    self.scheduledIdle = context.eventLoop.scheduleTask(in: self.idleTimeout) {
      self.idle(context: context)

should we reference self weakly here?


Sources/GRPC/ConnectionManager.swift, line 152 at r3 (raw file):

        // Create a new id; it'll be used for the *next* channel we create.
        self.channelNumber &+= 1

I assume HTTP2 allows for channel numbers >= 2^63?


Sources/GRPC/ConnectionManager.swift, line 262 at r3 (raw file):

      // We don't have a channel and we don't want one, easy!
      case .idle:
        shutdown = ShutdownState(closeFuture: self.eventLoop.makeSucceededFuture(()))

Sorry, but what is closeFuture for?

@glbrntt
Copy link
Collaborator Author

glbrntt commented May 18, 2020

Sources/GRPC/ClientConnectivityHandler.swift, line 49 at r3 (raw file):

      // We have a stream: don't go idle
      self.scheduledIdle?.cancel()
      self.activeStreams += 1

I assume these operations do not need to be thread-safe or atomic?

No: we're only ever accessing/modifying from the event loop.


Sources/GRPC/ClientConnectivityHandler.swift, line 53 at r3 (raw file):

      self.activeStreams -= 1
      // No active streams: go idle soon.
      if self.activeStreams == 0 {

I wonder whether cancelling/scheduling could be done as a side effect on the activeStreams setter instead?

We can if we hold on to the context as well; I'm not sure it actually makes things any better though.


Sources/GRPC/ClientConnectivityHandler.swift, line 123 at r3 (raw file):

    }

    self.scheduledIdle = context.eventLoop.scheduleTask(in: self.idleTimeout) {

Should any previous scheduled idle be cancelled, or can we 100% guarantee that that won't be necessary?

It won't be necessary: the scheduleIdleTimeout from channelRead can only happen once and this happens before the channel is accessible to the application (i.e. it's not possible to create a stream before it's scheduled). The other scheduleIdleTimeout is hopefully clear as it is.


Sources/GRPC/ClientConnectivityHandler.swift, line 124 at r3 (raw file):

    self.scheduledIdle = context.eventLoop.scheduleTask(in: self.idleTimeout) {
      self.idle(context: context)

should we reference self weakly here?

No this should be fine: NIO will drop the callback as soon as it's executed/cancelled.


Sources/GRPC/ConnectionManager.swift, line 152 at r3 (raw file):

        // Create a new id; it'll be used for the *next* channel we create.
        self.channelNumber &+= 1

I assume HTTP2 allows for channel numbers >= 2^63?

This isn't related to HTTP/2 streams; it's just a count of separate HTTP/2 connections that have existed.


Sources/GRPC/ConnectionManager.swift, line 262 at r3 (raw file):

      // We don't have a channel and we don't want one, easy!
      case .idle:
        shutdown = ShutdownState(closeFuture: self.eventLoop.makeSucceededFuture(()))

Sorry, but what is closeFuture for?

Why do we hold on to it? In case shutdown() is called twice it should return the same value.

Copy link
Collaborator

@MrMage MrMage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 15 of 16 files reviewed, 6 unresolved discussions (waiting on @glbrntt, @Lukasa, and @MrMage)


Sources/GRPC/ClientConnectivityHandler.swift, line 90 at r3 (raw file):

Previously, MrMage (Daniel Alm) wrote…

s/before we're/as long as we are not yet/?

I'd prefer an "as long as", IMHO that's clearer (unless that's actually wrong)


Sources/GRPC/ConnectionManager.swift, line 152 at r3 (raw file):

Previously, glbrntt (George Barnett) wrote…

This isn't related to HTTP/2 streams; it's just a count of separate HTTP/2 connections that have existed.


Alright, but are we okay with "overflows"?


Sources/GRPC/ConnectionManager.swift, line 262 at r3 (raw file):

Previously, glbrntt (George Barnett) wrote…

Why do we hold on to it? In case shutdown() is called twice it should return the same value.

Sorry, I don't quite understand what the future means exactly.

@glbrntt
Copy link
Collaborator Author

glbrntt commented May 18, 2020

Reviewable status: 15 of 16 files reviewed, 6 unresolved discussions (waiting on @glbrntt, @Lukasa, and @MrMage)

Sources/GRPC/ClientConnectivityHandler.swift, line 90 at r3 (raw file):

Previously, MrMage (Daniel Alm) wrote…
I'd prefer an "as long as", IMHO that's clearer (unless that's actually wrong)

I think "as long as we're not .ready" suggests closed is also okay (which it isn't).

Sources/GRPC/ConnectionManager.swift, line 152 at r3 (raw file):

Previously, glbrntt (George Barnett) wrote…
Alright, but are we okay with "overflows"?

Yes: it's only used to disambiguate between different Channels in logging, and the likelihood of cycling through enough connection to reach an overflow here is practically zero anyway.

Sources/GRPC/ConnectionManager.swift, line 262 at r3 (raw file):

Previously, glbrntt (George Barnett) wrote…
Sorry, I don't quite understand what the future means exactly.

The Channel will hold a promise (not accessible to the user) which gets fulfilled when it is eventually closed: this is the futureResult for that promise. (I.e. the same future returned by channel.close())

Copy link
Collaborator

@MrMage MrMage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my side, but I would still appreciate a slightly different wording for that one comment.

Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @glbrntt and @Lukasa)


Sources/GRPC/ClientConnectivityHandler.swift, line 90 at r3 (raw file):

Previously, glbrntt (George Barnett) wrote…

I think "as long as we're not .ready" suggests closed is also okay (which it isn't).

How about this? "// We only care about SETTINGS as long as we are in state .notReady." IMHO that has the least possible ambiguity. At least the original comment had confused me a bit.

@glbrntt
Copy link
Collaborator Author

glbrntt commented May 18, 2020

How about this? "// We only care about SETTINGS as long as we are in state .notReady." IMHO that has the least possible ambiguity. At least the original comment had confused me a bit.

Fine by me; I've taken it verbatim.

Copy link
Collaborator

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor note.

@glbrntt glbrntt requested a review from Lukasa May 21, 2020 09:09
@glbrntt glbrntt merged commit e6fa863 into grpc:master May 21, 2020
@glbrntt glbrntt added nio 🔨 semver/patch No public API change. labels May 21, 2020
@glbrntt glbrntt deleted the gb-connectivity-manager branch May 21, 2020 09:24
glbrntt added a commit to glbrntt/grpc-swift that referenced this pull request Jun 1, 2020
Motivation:

Connections should be dropped to save resources if there are not RPCs
for an idle timeout. This was added for clients in grpc#798; this adds
similart functionality to the server.

Modifications:

- Add a `GRPCServerIdleHandler` to drop connections after 5 minutes of
  no RPCs

Note: timeout configuration will be added for client and server in a
followup PR.

Result:

- Server's will drop connections if the client isn't doing anything.
- Resolves grpc#706
glbrntt added a commit to glbrntt/grpc-swift that referenced this pull request Jun 1, 2020
Motivation:

Connections should be dropped to save resources if there are not RPCs
for an idle timeout. This was added for clients in grpc#798; this adds
similart functionality to the server.

Modifications:

- Add a `GRPCServerIdleHandler` to drop connections after 5 minutes of
  no RPCs

Note: timeout configuration will be added for client and server in a
followup PR.

Result:

- Server's will drop connections if the client isn't doing anything.
- Resolves grpc#706
@glbrntt glbrntt mentioned this pull request Jun 1, 2020
glbrntt added a commit to glbrntt/grpc-swift that referenced this pull request Jun 1, 2020
Motivation:

Connections should be dropped to save resources if there are not RPCs
for an idle timeout. This was added for clients in grpc#798; this adds
similart functionality to the server.

Modifications:

- Rename a `ClientConnectivityHandler` to `GRPCIdleHandler`
- Add a 'mode' to the idle handler

Note: timeout configuration will be added for client and server in a
followup PR.

Result:

- Server's will drop connections if the client isn't doing anything.
- Resolves grpc#706
glbrntt added a commit to glbrntt/grpc-swift that referenced this pull request Jun 1, 2020
Motivation:

Connections should be dropped to save resources if there are not RPCs
for an idle timeout. This was added for clients in grpc#798; this adds
similart functionality to the server.

Modifications:

- Rename a `ClientConnectivityHandler` to `GRPCIdleHandler`
- Add a 'mode' to the idle handler

Note: timeout configuration will be added for client and server in a
followup PR.

Result:

- Server's will drop connections if the client isn't doing anything.
- Resolves grpc#706
glbrntt added a commit to glbrntt/grpc-swift that referenced this pull request Jun 2, 2020
Motivation:

Connections should be dropped to save resources if there are not RPCs
for an idle timeout. This was added for clients in grpc#798; this adds
similart functionality to the server.

Modifications:

- Rename a `ClientConnectivityHandler` to `GRPCIdleHandler`
- Add a 'mode' to the idle handler

Note: timeout configuration will be added for client and server in a
followup PR.

Result:

- Server's will drop connections if the client isn't doing anything.
- Resolves grpc#706
glbrntt added a commit that referenced this pull request Jun 3, 2020
Motivation:

Connections should be dropped to save resources if there are not RPCs
for an idle timeout. This was added for clients in #798; this adds
similart functionality to the server.

Modifications:

- Rename a `ClientConnectivityHandler` to `GRPCIdleHandler`
- Add a 'mode' to the idle handler

Note: timeout configuration will be added for client and server in a
followup PR.

Result:

- Server's will drop connections if the client isn't doing anything.
- Resolves #706
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 semver/patch No public API change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants