[HTTP Connection Pool] Lack of timeout on SSL connection establishment caused high number of pending connections in pool

### Description

Our service uses HttpClient to send requests to downstream services, and we observed that,
1. [expected] During a network outage on the infrastructure that hosts the destination of the requests, a lot of requests failed with timeouts/failures
2. [unexpected] After the network outage is resolved, the sender still experiences those timeouts/failures. This only resolves after the machine hosting the request sender is restarted

We took a dump and based on the discovery formed a hypothesis that explains above and would like .NET team to check if the hypothesis is reasonable. 

**Observations from dump**
1. The `HttpConnectionPool` that serves the destination has 88 associated connections and all of them are pending, which implies that the connection establishments are hanging
![Image](https://github.com/user-attachments/assets/54055897-1596-4583-86eb-400873b2bdae)
2. By counting `AsyncTaskMethodBuilder` for various methods on the heap, it seems that SSL connection establishment is the culprit, not TCP connection

| Count | Total Size | Class Name |
| - | - | - |
| 95 | 19,760 | `System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.ValueTuple<System.IO.Stream, System.Net.TransportContext, System.Net.IPEndPoint>>+AsyncStateMachineBox<System.Net.Http.HttpConnectionPool+<ConnectAsync>d__103>` |
| 7 | 1,344 | `System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.IO.Stream>+AsyncStateMachineBox<System.Net.Http.HttpConnectionPool+<ConnectToTcpHostAsync>d__104>` |
| 95 | 14,440 | `System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Net.Security.SslStream>+AsyncStateMachineBox<System.Net.Http.ConnectHelper+<EstablishSslConnectionAsync>d__2>` |
| 95 | 19,760 | `System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>+AsyncStateMachineBox<System.Net.Security.SslStream+<ForceAuthenticationAsync>d__150<System.Net.Security.AsyncReadWriteAdapter>>` |
| 95 | 14,440 | `System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Int32>+AsyncStateMachineBox<System.Net.Security.SslStream+<ReceiveHandshakeFrameAsync>d__151<System.Net.Security.AsyncReadWriteAdapter>>` |

**Analysis**
Upon checking code in `HttpConnectionPool`, it seems like, under default setting, there is no cancellation for `ConnectToTcpHostAsync` and `EstablishSslConnectionAsync` (a cancellation token is passed, but with InfiniteTimeSpan). It kind of makes sense for TCP connection, as OS has timeout at OS level, but for SSL connection, I am not aware of any OS level timeout. With no OS or application level timeout, SSL connection can hang indefinitely.

**Hypothesis**
1. With the network outage, `HttpConnectionPool` started to get contaminated with connections that hangs in SSL connection phase
2. With `PoolConnectionLifetime` set in our application, healthy connections start to die off when their lifetime is up, so there are less and less healthy connections in the connection pool. Pending connections does not seem to honor `PoolConnectionLifetime`.
3. Even after the network outage is resolved, pending connections are still hanging, counting towards `_pendingHttp11ConnectionCount` in the connection pool. High `_pendingHttp11ConnectionCount` makes it harder to start new connections (Connection pool has logic that only start new connection if request queue length is larger than `_pendingHttp11ConnectionCount`)
4. The connection pool ended up having no working connection (as the dump showed) and a lot of pending connections, which explains the failures and timeouts we saw.

**Asks to .NET team**
1. Is above hypothesis reasonable? (e.g. is there indeed no OS level timeout for SSL connection establishment, so that could theoretically hang indefinitely?)
2. We are planning to set ConnectTimeout to some concrete value (e.g. 30 seconds). Are there any concerns/things to think about regarding that?
3. What is the reason of having this value to be defaulted to infinite time span? What are the considerations that .NET team have?
4. I cannot share the dump per security protocol, but I am happy to run diagnostics on it if necessary.

### Reproduction Steps

No manual repro. As described above, our service sees timeouts/failures sending HTTP requests even after network outage is resolved on the infrastructure that hosts the destination of the request.

### Expected behavior

HttpClient should be able to send requests to destination, after the network outage impacting the destination is resolved, without restarting sender application/machine.

### Actual behavior

HttpClient reports failures and timeouts sending requests to destination, even after the network outage impacting the destination is resolved. The issue is only fixed with restarting application/machine.

### Regression?

n/a

### Known Workarounds

_No response_

### Configuration

n/a

### Other information

n/a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HTTP Connection Pool] Lack of timeout on SSL connection establishment caused high number of pending connections in pool #110598

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Count	Total Size	Class Name
95	19,760	`System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.ValueTuple<System.IO.Stream, System.Net.TransportContext, System.Net.IPEndPoint>>+AsyncStateMachineBox<System.Net.Http.HttpConnectionPool+<ConnectAsync>d__103>`
7	1,344	`System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.IO.Stream>+AsyncStateMachineBox<System.Net.Http.HttpConnectionPool+<ConnectToTcpHostAsync>d__104>`
95	14,440	`System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Net.Security.SslStream>+AsyncStateMachineBox<System.Net.Http.ConnectHelper+<EstablishSslConnectionAsync>d__2>`
95	19,760	`System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>+AsyncStateMachineBox<System.Net.Security.SslStream+<ForceAuthenticationAsync>d__150<System.Net.Security.AsyncReadWriteAdapter>>`
95	14,440	`System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Int32>+AsyncStateMachineBox<System.Net.Security.SslStream+<ReceiveHandshakeFrameAsync>d__151<System.Net.Security.AsyncReadWriteAdapter>>`

[HTTP Connection Pool] Lack of timeout on SSL connection establishment caused high number of pending connections in pool #110598

Description

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions