-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: a "bad" https connection which uses Header.Set could potentially block all https requests more than 15 mins in production environments #33006
Comments
@agnivade We have the same problem, please have a look? |
after testing, the problem still exists in go1.12.6, do you have any good ideas? @carter2000 |
You could use this CL as a starting point if you wish to resolve your issue. |
This is a dupe of #32388.
|
I will look further as to why the read side isn't breaking the write side which I believe should have happened. |
The only solution to break out of this situation is to start setting decent timeouts/deadlines. Using the default transport, you are saying to have infinite timeouts. When the network is broken, the polling just retries the outstanding reads and writes because there is nothing to say that the connection is permanently broken. |
I agree that setting write deadline to terminate tcp connection. But only the Set[Read|Write]Deadline(time.Time) methods exposed by net.Conn with are effective.The customized http.Client and transport which have set timeouts is invalid.Now the question is whether http.Client or transport has a method equivalent to SetDeadline exposed by net.Conn. |
There is Client.Timeout, or setting the request context using WithDeadline/WithTimeout which is for the entire request. Setting deadlines on the connection is the wrong approach given the issue is with http2 and you are sharing a connection. |
@fraenkel I think setting write or read deadline on connection is not bad. Only by doing that calller can return error early, but not waiting for broken connection for a long time. |
According the goroutine stack, http1.x shares some common logic with http2, is it possible to not block http1.x request(https) when calling WriteHeaders() ? |
@salmon7 My best advice right now is to use separate transports. Its not like you are getting connection sharing when going to two different sites. |
@gzcaimufu Adjust the keepalive on the connection to detect this sooner. |
@ianlancetaylor This testcase highlights an issue with epoll that I do not know how to easily resolve other than layering timeouts. If I run the testcase although I used url=https://www.google.com, and then kill my network connection, the read and write sides go into a constant EGAIN loop. If I re-enable my network, I never see the connection re-established. I don't see any easily solutions. |
Are you saying that |
@ianlancetaylor sorry, the EAGAIN are unrelated and seem to be expected. The behavior is what I would expect when the network disappears. Eventually the network will block. The http2 stack has issues with its locking which affects the http side of things. Because the http side will create a new connection and fail, it can progress although its failing. If one set the max number of connections per host, the http side would also not make any progress. |
@fraenkel Thanks. I will try it. |
Thanks all.The stuck problem seems like to have been solved after i define an http.Client with a sensible timeout and specify a custom net.Transport and net.Dialer.The modified code is below. package main import ( "flag" "io" "io/ioutil" "log" "net" "net/http" _ "net/http/pprof" "strings" "sync/atomic" "time" ) var url = flag.String("url", "https://10.210.180.138:9908", "url") func main() { flag.Parse() go func() { log.Println(http.ListenAndServe("localhost:6060", nil)) }() var reqA, doneA int64 var reqB, doneB int64 client := &http.Client{ Transport: &http.Transport{ Dial: (&net.Dialer{ }).Dial, }, Timeout: 5 * time.Second, } for { for i := 0; i < 10; i++ { go func() { atomic.AddInt64(&reqA, 1) resp, err := http.Get("https://www.qiniu.com") if err == nil { io.Copy(ioutil.Discard, resp.Body) resp.Body.Close() } atomic.AddInt64(&doneA, 1) }() } for i := 0; i < 10; i++ { go func() { atomic.AddInt64(&reqB, 1) req, _ := http.NewRequest("GET", *url, nil) req.Header.Set("X-Qiniu", strings.Repeat("helloworld", 1024)) resp, err := client.Do(req) if err == nil { io.Copy(ioutil.Discard, resp.Body) resp.Body.Close() } else { log.Println(err) } atomic.AddInt64(&doneB, 1) }() } log.Printf("reqA:%d doneA:%d, reqB:%d doneB:%d\n", atomic.LoadInt64(&reqA), atomic.LoadInt64(&doneA), atomic.LoadInt64(&reqB), atomic.LoadInt64(&doneB)) time.Sleep(1e9) } } |
For those who follow the example of #33006 (comment), please note that while following that example, we started getting errors like client := &http.Client{
Transport: &http.Transport{
Dial: (&net.Dialer{
}).Dial,
},
Timeout: 5 * time.Second,
} while looking at #16012 (comment) We think that using transport := &http.Transport{
Proxy: http.ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
DualStack: true,
}).DialContext,
ForceAttemptHTTP2: true,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}
client := &http.Client{
Transport: transport,
Timeout: 5 * time.Second,
} |
Recently we ran into what seems to be this problem with our go services and in our testing we noticed setting timeouts on the http client still causes all connections (even http/1 connections from the same http client) to be blocked after an http/2 connection goes bad. We were able to reproduce this by setting up a testing app, only one http client is created in the app and we've set the timeout to be 10 seconds on the http client. The app makes both an http/2 connection and an http/1 connection every 5 seconds, we then set up an iptables rule to block packets on the ephemeral port of the http/2 connection, we saw the app reports timeout on the http/2 connection as expected, the specific error was |
In this case, A and B use different transports, this just circumvents the problem, but does not really solve it . The problem should still exist. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
step1: Run the demo below to simulate the situation described in the title
step2: after step1 is successful , disconnect 10.210.180.138 server's power or network
What did you expect to see?
doneA continue increase, doneB stop increase
What did you see instead?
as the output shown, from "2019/07/09 19:19:20", both doneA and doneB stop increase, all requests blocked:
goroutine stack is shown below:
this issue has attracted a lot of attention such as 28824、23559、32388, however unfortunately has not been solved until now.
The text was updated successfully, but these errors were encountered: