ConnectionPool cleanup is eager and results in unnecessary connection churn and (potentially) port exhaustion #4354

nicktrav · 2018-11-01T22:21:37Z

Currently, OkHttp reaps "idle" connections (i.e. connections not in use) if either of the following is true:

a connection has not been used in a period greater than the idle timeout, or
the number of idle connections is greater than some threshold

      if (longestIdleDurationNs >= this.keepAliveDurationNs
          || idleConnectionCount > this.maxIdleConnections) {
        // We've found a connection to evict. Remove it from the list, then close it below (outside
        // of the synchronized block).
        connections.remove(longestIdleConnection);
      }

okhttp/okhttp/src/main/java/okhttp3/ConnectionPool.java

Lines 226 to 230 in 95ae0cf

    
           if (longestIdleDurationNs >= this.keepAliveDurationNs 
        
               || idleConnectionCount > this.maxIdleConnections) { 
        
             // We've found a connection to evict. Remove it from the list, then close it below (outside 
        
             // of the synchronized block). 
        
             connections.remove(longestIdleConnection);

This policy favors keeping the number of open connections close to the maxIdleConnections value.

We've run into situations internally where we have multiple threads using a single OkHttpClient instance to issue concurrent requests to an upstream (roughly 60 threads), but we observe that many of the connections (and underlying TCP sockets) are closed very soon after their initial use. This eventually leads to a situation where the host may run out of ephemeral ports (as the sockets are placed into TIME_WAIT for a period of time, and the port cannot be reused). Eventually connect() syscalls fail.

One workaround is to set the maxIdleConnections value to some value sufficiently high enough to allow for the connections to be reused in high request-rate environments. However, these connections will not be closed in the case that the request rate falls.

Maybe this is by design? I'd posit that the principle of least surprise would have one assume that a connection is only closed if it has been idle for some extended period of time. Indeed that has caught a few of us by surprise.

An alternative we'd like to propose is that connections are closed only if an eligible connection exists and the maxIdleConnections value has been exceeded. i.e. change the OR to and AND in the code above. This would allow the connection pool to grow with the load during bursts, and to shrink when the connections are not being utilized.

Here's a small reproducer we put together where we observe the port exhaustion issue:
https://github.com/nicktrav/okhttp/commits/nickt.time-wait-issue

This has also been mentioned here, but the comments didn't seem to point at a resolution:
https://stackoverflow.com/questions/41011287/why-okhttp-doesnt-reuse-its-connections

The text was updated successfully, but these errors were encountered:

nicktrav · 2018-11-01T22:31:37Z

cc: @rpeng

swankjesse · 2018-11-02T18:10:20Z

We really need to close connections that have exceeded the keep alive duration. We evict these connections not to conserve resources but to prevent failures later caused by use of stale connections.

I think the right fix here is to dramatically increase the maxIdleConnections; perhaps to 1,000 or more. Closing non-stale connections is merely about conserving memory, and memory is cheap! Probably about ~64 KiB per socket, depending on system buffer sizes.

And we should change the default value in OkHttp also. Perhaps a default of 64? That would be 4 MiB of memory if my socket math is right, which seems in the right order of magnitude.

swankjesse · 2018-11-02T18:13:57Z

Next steps:

Measure on Android and JVM how much memory an idle socket needs.
Set a target idle connection count based on that per-socket memory requirement. Probably between 1 and 8 MiB is the right range.
Document this change and announce it loudly. Provide guidance for users on what they should use if not our new defaults.

swankjesse · 2018-11-02T18:15:41Z

One other secondary factor: each socket also has a cost on the server as well. Growing our pools will grow memory used by servers!

nicktrav · 2018-11-02T18:31:13Z

I definitely agree with you for the most part on your proposals!

One part I’d push back on a little is:

Closing non-stale connections is merely about conserving memory, and memory is cheap!

We’re seeing that’s not really the case for our clients, as a) it eats up all of the ports on the server the machine is running, given we’re opening tons of sockets and then closing them immediately (the ports can’t be reused as they’re in TIME_WAIT), b) there’s a non-trivial CPU cost here for new sockets that have to negotiate TLS when doing HTTPS.

Maybe the docs just need to be a little clearer? It took a while to work out why the sockets were immediately closed, and that was a little surprising, given experience at least with other servers that tend to close when idle conns > max value AND conn has been idle for a while.

For now, at least, we can get around this on our end by bumping up the max idle connection count, as you suggest.

rpeng · 2018-11-02T21:13:47Z

@swankjesse I think the defaults here are reasonable for mobile clients - i.e. I doubt an android app would really require more than 5 simultaneous connections anywhere.

However, for servers usage, there are a lot less memory / battery life constraints that we need to worry about. It might be a good idea to have a way to have different sane defaults for separate use cases (i.e. client side vs. server side, similar to how java has -server and -client modes).

In reality, every user is going to have a somewhat different use case, and would likely need to run some profiling to figure out the ideal idle size and keep-alive time to come up with reasonable values.

swankjesse · 2018-11-17T05:08:50Z

Punting to the next release ’cause there’s no code changes here.

yschimke · 2023-05-21T07:50:11Z

Closing as effectively a dupe of #4530

swankjesse added this to the 3.12 milestone Nov 4, 2018

swankjesse added enhancement Feature not a bug documentation Documentation task labels Nov 4, 2018

swankjesse modified the milestones: 3.12, Backlog Nov 17, 2018

swankjesse mentioned this issue Nov 21, 2018

why okhttp maxRequests is 64 and maxRequestsPerHost is 5? #4404

Closed

baywet mentioned this issue Nov 30, 2020

Socket closed errors appear very frequently microsoftgraph/msgraph-sdk-java#553

Closed

yschimke closed this as completed May 21, 2023

irisjae mentioned this issue Oct 20, 2023

Android fetch hangs indefinitely with IPv6 hosts on some devices (Happy Eyeballs) facebook/react-native#32730

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConnectionPool cleanup is eager and results in unnecessary connection churn and (potentially) port exhaustion #4354

ConnectionPool cleanup is eager and results in unnecessary connection churn and (potentially) port exhaustion #4354

nicktrav commented Nov 1, 2018 •

edited

Loading

nicktrav commented Nov 1, 2018

swankjesse commented Nov 2, 2018

swankjesse commented Nov 2, 2018

swankjesse commented Nov 2, 2018 •

edited

Loading

nicktrav commented Nov 2, 2018

rpeng commented Nov 2, 2018 •

edited

Loading

swankjesse commented Nov 17, 2018

yschimke commented May 21, 2023

ConnectionPool cleanup is eager and results in unnecessary connection churn and (potentially) port exhaustion #4354

ConnectionPool cleanup is eager and results in unnecessary connection churn and (potentially) port exhaustion #4354

Comments

nicktrav commented Nov 1, 2018 • edited Loading

nicktrav commented Nov 1, 2018

swankjesse commented Nov 2, 2018

swankjesse commented Nov 2, 2018

swankjesse commented Nov 2, 2018 • edited Loading

nicktrav commented Nov 2, 2018

rpeng commented Nov 2, 2018 • edited Loading

swankjesse commented Nov 17, 2018

yschimke commented May 21, 2023

nicktrav commented Nov 1, 2018 •

edited

Loading

swankjesse commented Nov 2, 2018 •

edited

Loading

rpeng commented Nov 2, 2018 •

edited

Loading