Skip to content

net: TCP RST handling is unreliable on OSX #50254

Open
@marten-seemann

Description

@marten-seemann

What version of Go are you using (go version)?

$ go version
go version go1.17.5 darwin/amd64

Does this issue reproduce with the latest release?

Yes, tested with go.18beta1.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/marten/Library/Caches/go-build"
GOENV="/Users/marten/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/marten/src/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/marten/src/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.17.5/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.17.5/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17.5"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/marten/src/go/src/github.com/libp2p/go-libp2p/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/q0/b5ynf00142l7bl9sp8y098zr0000gn/T/go-build1068436186=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I'm seeing occasional TCP connection timeouts, even though the other side has sent (and we have received) the TCP RST packet.

I managed to reproduce the failure using the following minimal working example. It reliably fails at least once when run 1000 times (go test -run TestLinger -count 1000 -v -failfast).

import (
	"fmt"
	"io"
	"math/rand"
	"net"
	"testing"
	"time"
)

func TestLinger(t *testing.T) {
	ln, err := net.ListenTCP("tcp4", nil)
	if err != nil {
		t.Fatal(err)
	}
	defer ln.Close()

	done := make(chan struct{})
	accepted := make(chan struct{})
	go func() {
		defer close(done)
		conn, err := ln.Accept()
		if err != nil {
			t.Error(err)
		}
		close(accepted)
		io.ReadAll(conn)
	}()

	conn, err := net.DialTCP("tcp4", nil, ln.Addr().(*net.TCPAddr))
	if err != nil {
		t.Fatal(err)
	}
	// This makes sure a TCP RST is sent when Close is called.
	if err := conn.SetLinger(0); err != nil {
		t.Fatal(err)
	}
	fmt.Printf("%s <-> %s\n", conn.LocalAddr(), conn.RemoteAddr())
	go func() {
		for {
			b := make([]byte, 1+rand.Intn(1000))
			_, err := conn.Write(b)
			if err != nil {
				return
			}
			time.Sleep(time.Duration(rand.Intn(1000)) * time.Microsecond)
		}
	}()

	<-accepted
	time.Sleep(time.Duration(rand.Intn(30)) * time.Millisecond)
	if err := conn.Close(); err != nil {
		t.Fatal(err)
	}

	select {
	case <-done:
	case <-time.After(10 * time.Second):
		t.Fatal("remote conn didn't close")
	}

What did you expect to see?

I expect the tcp.Conn to be reliably closed / reset when we receive the TCP RST packet.
This works reliably on Linux (incl. the code I posted above), but not on OSX.

What did you see instead?

Occasionally (the test case above fails in maybe 1 out of 100 runs), the TCP RST doesn't seem to have any effect on the connection at all. The connection stays open, and eventually (after a long time) runs into a connection timeout.

Here's a pcap of this transport:
tcprst.pcapng.gz, showing that the RST was actually (and received).

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.OS-DarwinUnfortunate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions