Skip to content

net/http: Client returns errors on POST if keep-alive connection closes at unfortunate time #22158

Closed
@horgh

Description

@horgh

What version of Go are you using (go version)?

go version go1.9.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/vagrant/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build936671241=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

I performed an HTTP POST using an *http.Client that had keepalives enabled and there was an idle connection available for re-use.

Specifically, I encountered this by using an *http.Client with an http.Transport set to have an IdleConnTimeout higher than the server's. Even a very small duration above can cause it. From my testing, I believe the connection gets taken for re-use just as the server closes it.

This program demonstrates the problem: https://play.golang.org/p/GTj6FEFWk-

Note if we change the request to be a GET, we don't see a problem.

What did you expect to see?

The request succeed. In particular, I would expect the *http.Client, being a connection pool and supporting connection re-use, would not raise an error when it finds a dead connection, but use a different connection. This assumes there was confidence the request indeed did not go through.

What did you see instead?

The sample program repeatedly makes HTTP requests. Sometimes they succeed, sometimes they fail with:

Post http://localhost:9999/test: EOF

Or:

Post http://localhost:9999/test: read tcp 127.0.0.1:36589->127.0.0.1:9999: read: connection reset by peer

For example:

$ ./test-connection-reuse 
2017/10/05 20:22:39 request succeeded
2017/10/05 20:22:40 request succeeded
2017/10/05 20:22:41 request succeeded
2017/10/05 20:22:42 request failed: error performing request: Post http://localhost:9999/test: EOF
2017/10/05 20:22:43 request succeeded
2017/10/05 20:22:44 request succeeded
2017/10/05 20:22:45 request failed: error performing request: Post http://localhost:9999/test: EOF
2017/10/05 20:22:46 request succeeded
2017/10/05 20:22:47 request succeeded
2017/10/05 20:22:48 request failed: error performing request: Post http://localhost:9999/test: read tcp 127.0.0.1:36563->127.0.0.1:9999: read: connection reset by peer
2017/10/05 20:22:49 request succeeded
2017/10/05 20:22:50 request failed: error performing request: Post http://localhost:9999/test: read tcp 127.0.0.1:36589->127.0.0.1:9999: read: connection reset by peer
2017/10/05 20:22:51 request succeeded

Further information

I searched for issues before reporting this. I found #8946 which I believe to be about this exact issue, except its example was an HTTP GET request. I believe the fix for #8946 was done for #4677 in 5dd372b.

From reading #4677, I believe the reason HTTP GET works is we retry in that case. Retrying an HTTP POST is unsafe depending on whether the request reached the server. But in the situation I'm describing, it didn't (although I don't know if we could reliably tell). See here in transport.go (comments closely below this line appear to be talking about this).

I noticed in #4677 that some of the basis for the behaviour was based on what Chromium does. I found its relevant code that handles what I think is going on here, and I don't see it talking about particular HTTP verbs (though of course I may not be looking in the right spot).

For my application, I've worked around this by ensuring the client's IdleConnTimeout is lower than the server's equivalent. I also added a retry at that level. I think a retry alone would not be sufficient, as presumably the retry could hit this very condition again.

It may be that nothing can be done. In that case, it might be good to document that this is expected, as it surprised me a little.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions