Skip to content

net/http: a "bad" connection could potentially block all https requests #28824

Closed
@carter2000

Description

@carter2000

In out production environment, we found all the outgoing https requests with different destination address happened to blocked at the same time and last for about 14 miniutes, while the http requests were not.

So I check the net/http source code (https://github.com/golang/go/blob/ac7c0ee26dda18076d5f6c151d8f920b43340ae3/src/net/http/h2_bundle.go), and found the "block logic", as show below.

A https request goes through a few stages, step1 get connection, step2 write headers, ...

step1, get connection
step1-1 call http2clientConnPool.GetClientConn to get a valid connection
step1-2 acquire http2clientConnPool.mu, line 738
step1-3 call http2ClientConn.CanTakeNewRequest, line 740, and http2ClientConn.mu is acquired, line 7175

7174 func (cc *http2ClientConn) CanTakeNewRequest() bool {
7175	cc.mu.Lock()
7176	defer cc.mu.Unlock()
7177	return cc.canTakeNewRequestLocked()
7178 }

 738         p.mu.Lock()                                                                              |8008                         ErrCode:      cc.goAway.ErrCode,                                        
 739         for _, cc := range p.conns[addr] {                                                       |8009                         DebugData:    cc.goAwayDebug,                                           
 740                 if cc.CanTakeNewRequest() {                                                      |8010                 }                                                                               
 741                         p.mu.Unlock()                                                            |8011         } else if err == io.EOF {                                                               
 742                         return cc, nil                                                           |8012                 err = io.ErrUnexpectedEOF                                                       
 743                 }                                                                                |8013         }                                                                                       
 744         }                                                                                        |8014         for _, cs := range cc.streams {                                                         
 745         if !dialOnMiss {                                                                         |8015                 cs.bufPipe.CloseWithError(err) // no-op if already closed                       
 746                 p.mu.Unlock()                                                                    |8016                 select {                                                                        
 747                 return nil, http2ErrNoCachedConn                                                 |8017                 case cs.resc <- http2resAndError{err: err}:                                     
 748         }                                                                                        |8018                 default:                                                                        
 749         call := p.getStartDialLocked(addr)                                                       |8019                 }                                                                               
 750         p.mu.Unlock()   

step2, write headers
step2-1 acquire http2ClientConn.mu, line 7335
step2-2 call http2ClientConn.writeHeaders to write request headers
step2-3 http2ClientConn.writeHeaders call bw.Flush which could potentially block
step2-4 release http2ClientConn.mu, line 7384

7335	cc.mu.Lock()
	......
7382	cc.wmu.Lock()
7383	endStream := !hasBody && !hasTrailers
7384	werr := cc.writeHeaders(cs.ID, endStream, int(cc.maxFrameSize), hdrs)
7385	cc.wmu.Unlock()
7386	http2traceWroteHeaders(cs.trace)
7387	cc.mu.Unlock()

If a request write headers but block in step2-3,
the next request with the same addr will be blocked in step1-3,
and from now on all the following requests (even with different destination address) will be blocked in step1-2 until the blocked step2-3 writeHeaders returned.

I check the sysctl configuration, the total retransmission timeout is just about 14minutes(with TCP_RTO_MAX = 120):

net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.WaitingForInfoIssue is not actionable because of missing required information, which needs to be provided.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions