Description
What version of gRPC are you using?
1.64.0
What version of Go are you using (go version
)?
go1.22.3
What operating system (Linux, Windows, …) and version?
Linux
What did you do?
After upgrading to gRPC 1.64, closing connections started to take a very long time, specifically around 15 minutes. This happens when the server side abruptly goes away and the TCP connection breaks on one end.
I captured a goroutine profile while the application was trying to close one such connection. What stands out are the following two stacks:
# 0xbd151a google.golang.org/grpc/internal/transport.(*http2Client).Close+0x2ba /go/pkg/mod/google.golang.org/grpc@v1.64.0/internal/transport/http2_client.go:1012
# 0xc21225 google.golang.org/grpc.(*addrConn).tearDown+0x325 /go/pkg/mod/google.golang.org/grpc@v1.64.0/clientconn.go:1590
# 0xc1e627 google.golang.org/grpc.(*ClientConn).Close+0x207 /go/pkg/mod/google.golang.org/grpc@v1.64.0/clientconn.go:1154
and
1 @ 0x43e48e 0x436f17 0x46bb65 0x4e53e7 0x4e87b6 0x4e87a3 0x589565 0x59b525 0xbe2876 0xbe275c 0xbc2570 0xbcbd5b 0x4716c1
# 0x46bb64 internal/poll.runtime_pollWait+0x84 /usr/local/go/src/runtime/netpoll.go:343
# 0x4e53e6 internal/poll.(*pollDesc).wait+0x26 /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
# 0x4e87b5 internal/poll.(*pollDesc).waitWrite+0x2d5 /usr/local/go/src/internal/poll/fd_poll_runtime.go:93
# 0x4e87a2 internal/poll.(*FD).Write+0x2c2 /usr/local/go/src/internal/poll/fd_unix.go:388
# 0x589564 net.(*netFD).Write+0x24 /usr/local/go/src/net/fd_posix.go:96
# 0x59b524 net.(*conn).Write+0x44 /usr/local/go/src/net/net.go:191
# 0xbe2875 google.golang.org/grpc/internal/transport.(*bufWriter).flushKeepBuffer+0x55 /go/pkg/mod/google.golang.org/grpc@v1.64.0/internal/transport/http_util.go:362
# 0xbe275b google.golang.org/grpc/internal/transport.(*bufWriter).Flush+0x1b /go/pkg/mod/google.golang.org/grpc@v1.64.0/internal/transport/http_util.go:345
# 0xbc256f google.golang.org/grpc/internal/transport.(*loopyWriter).run+0x6f /go/pkg/mod/google.golang.org/grpc@v1.64.0/internal/transport/controlbuf.go:592
# 0xbcbd5a google.golang.org/grpc/internal/transport.newHTTP2Client.func6+0xba /go/pkg/mod/google.golang.org/grpc@v1.64.0/internal/transport/http2_client.go:467
I believe the (potential) bug was introduced on these lines where the client tries to send a GOAWAY packet to the server before closing the connection. In case the connection is half-closed, the call hangs for 15 minutes which is the default timeout for net.(*conn).Write
.
What did you expect to see?
I would expect there to be a more reasonable timeout for closing a connection, or perhaps a way to control the timeout by the client.
What did you see instead?
Long time to close connections.