Description
Bug Report
Version
v0.12.3
Platform
Linux
High-level problem
Even with retries implemented on the client side (manually) it seems, that grpc client uses same underlying TCP connection that is NOT accepting new http2 streams anymore.
How it happens
Server decides to shut down, sends GoAway to client, hyper-1.6.0/src/proto/h2/client.rs
interprets it as Ready(Ok):
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
loop {
match ready!(self.h2_tx.poll_ready(cx)) {
Ok(()) => (),
Err(err) => {
self.ping.ensure_not_timed_out()?;
return if err.reason() == Some(::h2::Reason::NO_ERROR) {
trace!("connection gracefully shutdown");
Poll::Ready(Ok(Dispatched::Shutdown))
and then tonic-0.12.3/src/transport/channel/service/reconnect.rs
thinks that everything is fine, its state-machine does not initiate reconnect:
fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
let mut state;
if self.error.is_some() {
return Poll::Ready(Ok(()));
}
loop {
match self.state {
State::Idle => {
trace!("poll_ready; idle");
match self.mk_service.poll_ready(cx) { ... }
let fut = self.mk_service.make_service(self.target.clone());
self.state = State::Connecting(fut);
continue;
}
State::Connecting(ref mut f) => {
trace!("poll_ready; connecting");
match Pin::new(f).poll(cx) { ... }
}
State::Connected(ref mut inner) => {
trace!("poll_ready; connected");
self.has_been_connected = true;
match inner.poll_ready(cx) {
Poll::Ready(Ok(())) => {
trace!("poll_ready; ready");
return Poll::Ready(Ok(()));
}
Poll::Pending => {
trace!("poll_ready; not ready");
return Poll::Pending;
}
Poll::Ready(Err(_)) => {
trace!("poll_ready; error");
state = State::Idle;
}
}
}
}
self.state = state;
}
self.state = state;
Poll::Ready(Ok(()))
}
a consecutive call to fn call(&mut self, request: Request) -> Self::Future {
returns a future that resolves into:
Internal Error: Status { code: Internal, message: "h2 protocol error: http2 error", source: Some(tonic::transport::Error(Transport, hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) }))) }
and then everything repeats (due to a retry mechanism on the client-side).
It looks like tonic does not know about Dispatched::Shutdown