Description
The problem is pretty old and has been reported in Dispatch initially. After the investigation, it appears that it has to be fixed in Foundation. I am opening this issue for tracking and sharing the progress.
Background
URLSession utilizes cURL for socket management, including reading, writing, and basic protocol operations. cURL, in turn, requires external notification when there is pending data or an event on a socket. This is where DispatchSource
comes into play. The implementation ties the lifecycle of a DispatchSource
to the register/unregister requests from cURL.
The Issue
When cURL requests _MultiHandle
to unregister a socket, the _SocketSources.tearDown
function is implicitly called. This function invokes DispatchSource.cancel()
and immediately returns control to cURL. Right after that cURL closes the socket. The Dispatch documentation says
It is invalid to close a file descriptor or deallocate a mach port that is currently being tracked by a dispatch source object before the cancellation handler is invoked.
Details
Although this violation is common for all platforms, it seems that Windows is the most severely impacted. It is crucial not to close the socket too early, as doing so could result in Windows reusing the freed socket handle for a new connection. This, in turn, may lead to Dispatch handling the same socket simultaneously as both dead and alive. Also, Dispatch proactively checks for invalid sockets in critical locations (like this) and crashes immediately if an invalid socket handle is encountered. Therefore, it is highly important to prevent the socket from being closed until Dispatch has finished its operations.
The simplest crashing scenario is HTTP timeout (originally reported here). You will get crash immediately as the timeout handler removes _EasyHandle
from _MultiHandle
. Trying to mute the crash (by ignoring WSAENOTSOCK
error, as proposed here) reduces the crash rate, but leaves us open to much more cryptic crashes due to socket handle reuse.
Possible Solutions
Obviously, we have to wait for cancellation handlers, but it is a bit tricky to implement. URLSession is designed to process network events on a single queue. Cancellation handlers are also submitted on that queue. We can't just wait for a callback, because it would be an effective deadlock.
- We introduced a separate queue for DispatchSource events/handlers and it worked flawlessly. However, we acknowledge that this approach can be seen as more of a hacky workaround and contradicts the overall design of URLSession and
_MultiHandle
. - A more natural solution appears to be making the socket closure asynchronous. cURL offers the option to provide a custom socket close function instead of using the standard one. Within this function, we can add the socket handle to a designated list instead of immediately closing it. When the
DispatchSource
close handler is triggered, it can search the list for any "close pending" sockets and then proceed with the actual closure.
Additional Case
There is a real scenario when cURL still closes the socket internally (ignoring the custom close function) and passes us an already closed socket. It's unclear whether this is a bug in cURL or a valid case that may require a client-side workaround. We probably should solve that separately.
Next Steps
- Convenient test cases are needed. I have some test code for all known crash scenarios, but it is not portable enough. Would be ideal to implement tests based on Foundation Tests internal HTTP server.
- Once we have tests, we could experiment with "postponed close" solution (and other solutions if we have any)
- Deal with CURL already-closed-socket issue (perhaps, extracting this scenario as separate issue)