Description
What version of Go are you using (go version
)?
$ go version go version go1.17.5 darwin/amd64
Does this issue reproduce with the latest release?
Yes, tested with go.18beta1.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/marten/Library/Caches/go-build" GOENV="/Users/marten/Library/Application Support/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/marten/src/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/marten/src/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/Cellar/go/1.17.5/libexec" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/Cellar/go/1.17.5/libexec/pkg/tool/darwin_amd64" GOVCS="" GOVERSION="go1.17.5" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="/Users/marten/src/go/src/github.com/libp2p/go-libp2p/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/q0/b5ynf00142l7bl9sp8y098zr0000gn/T/go-build1068436186=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
I'm seeing occasional TCP connection timeouts, even though the other side has sent (and we have received) the TCP RST packet.
I managed to reproduce the failure using the following minimal working example. It reliably fails at least once when run 1000 times (go test -run TestLinger -count 1000 -v -failfast
).
import (
"fmt"
"io"
"math/rand"
"net"
"testing"
"time"
)
func TestLinger(t *testing.T) {
ln, err := net.ListenTCP("tcp4", nil)
if err != nil {
t.Fatal(err)
}
defer ln.Close()
done := make(chan struct{})
accepted := make(chan struct{})
go func() {
defer close(done)
conn, err := ln.Accept()
if err != nil {
t.Error(err)
}
close(accepted)
io.ReadAll(conn)
}()
conn, err := net.DialTCP("tcp4", nil, ln.Addr().(*net.TCPAddr))
if err != nil {
t.Fatal(err)
}
// This makes sure a TCP RST is sent when Close is called.
if err := conn.SetLinger(0); err != nil {
t.Fatal(err)
}
fmt.Printf("%s <-> %s\n", conn.LocalAddr(), conn.RemoteAddr())
go func() {
for {
b := make([]byte, 1+rand.Intn(1000))
_, err := conn.Write(b)
if err != nil {
return
}
time.Sleep(time.Duration(rand.Intn(1000)) * time.Microsecond)
}
}()
<-accepted
time.Sleep(time.Duration(rand.Intn(30)) * time.Millisecond)
if err := conn.Close(); err != nil {
t.Fatal(err)
}
select {
case <-done:
case <-time.After(10 * time.Second):
t.Fatal("remote conn didn't close")
}
What did you expect to see?
I expect the tcp.Conn
to be reliably closed / reset when we receive the TCP RST packet.
This works reliably on Linux (incl. the code I posted above), but not on OSX.
What did you see instead?
Occasionally (the test case above fails in maybe 1 out of 100 runs), the TCP RST doesn't seem to have any effect on the connection at all. The connection stays open, and eventually (after a long time) runs into a connection timeout.
Here's a pcap of this transport:
tcprst.pcapng.gz, showing that the RST was actually (and received).