Skip to content

TCP Connection reset after a single lost keepalive packet #6250

Closed
@Lucaber

Description

@Lucaber

What version of gRPC are you using?

1.54.0

What version of Go are you using (go version)?

go1.20.2 linux/amd64

What operating system (Linux, Windows, …) and version?

Linux 6.2

Bug Report

The configuration of TCP_USER_TIMEOUT to 20 seconds in #2307 and #5219 together with go default tcp keepalive interval of just 15 seconds results in a tcp connection being reset after a single lost keepalive packet.

Lost packets of a tcp connection are normally being re-transmitted after a short amount of time, well within the 20 seconds timeout. But tcp keepalive packets are not being re-transmitted (ACK segments that contain no data are not reliably transmitted by TCP). Therefore the timeout is reached after just a single lost packet.

Normally not re-transmitting tcp keepalive packets is fine as the connection is only reseted after TCP_KEEPCNT(default=9) lost keepalive packets.

Test

TCPDump of a test grpc connection (disabled keepalives on the client to reduce packet count, the same issue can be reproduced with default keepalives on both the server and client):
image

  • After packet 199 I dropped all traffic TO port 8090
sudo iptables -I INPUT -p tcp --dport 8090 -j DROP
  • Packet 200 gets received by the client and answered in packet 201
  • Packet 201 is only visible in the tcpdump but does not get received by the server
  • Packet 202 resets the connection by the server when the next keepalive would have been send

Increasing the TCP_USER_TIMEOUT to 50 seconds results in the connection only being reset after 3 lost keepalive packets.

		grpc.KeepaliveParams(
			keepalive.ServerParameters{
				Timeout: 50 * time.Second,
			},
		),

Note: Here "keepalive" refers to the grpc/http2 keepalive mechanism and timeout configuration, not the tcp keepalives.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions