Skip to content

GoAway are not handled properly by grpc client in reconnect.rs #2185

Open
@DimanNe

Description

@DimanNe

Bug Report

Version

v0.12.3

Platform

Linux

High-level problem

Even with retries implemented on the client side (manually) it seems, that grpc client uses same underlying TCP connection that is NOT accepting new http2 streams anymore.

How it happens

Server decides to shut down, sends GoAway to client, hyper-1.6.0/src/proto/h2/client.rs interprets it as Ready(Ok):

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        loop {
            match ready!(self.h2_tx.poll_ready(cx)) {
                Ok(()) => (),
                Err(err) => {
                    self.ping.ensure_not_timed_out()?;
                    return if err.reason() == Some(::h2::Reason::NO_ERROR) {
                        trace!("connection gracefully shutdown");
                        Poll::Ready(Ok(Dispatched::Shutdown))

and then tonic-0.12.3/src/transport/channel/service/reconnect.rs thinks that everything is fine, its state-machine does not initiate reconnect:

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        let mut state;

        if self.error.is_some() {
            return Poll::Ready(Ok(()));
        }

        loop {
            match self.state {
                State::Idle => {
                    trace!("poll_ready; idle");
                    match self.mk_service.poll_ready(cx) { ... }
                    let fut = self.mk_service.make_service(self.target.clone());
                    self.state = State::Connecting(fut);
                    continue;
                }
                State::Connecting(ref mut f) => {
                    trace!("poll_ready; connecting");
                    match Pin::new(f).poll(cx) { ... }
                }
                State::Connected(ref mut inner) => {
                    trace!("poll_ready; connected");

                    self.has_been_connected = true;

                    match inner.poll_ready(cx) {
                        Poll::Ready(Ok(())) => {
                            trace!("poll_ready; ready");
                            return Poll::Ready(Ok(()));
                        }
                        Poll::Pending => {
                            trace!("poll_ready; not ready");
                            return Poll::Pending;
                        }
                        Poll::Ready(Err(_)) => {
                            trace!("poll_ready; error");
                            state = State::Idle;
                        }
                    }
                }
            }

            self.state = state;
        }

        self.state = state;
        Poll::Ready(Ok(()))
    }

a consecutive call to fn call(&mut self, request: Request) -> Self::Future { returns a future that resolves into:

Internal Error: Status { code: Internal, message: "h2 protocol error: http2 error", source: Some(tonic::transport::Error(Transport, hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) }))) }

and then everything repeats (due to a retry mechanism on the client-side).

It looks like tonic does not know about Dispatched::Shutdown

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions