Skip to content

resurrected asyncio _SelectorTransport unregisters fds it doesn't own #130141

Open
@lunixbochs

Description

@lunixbochs

Bug report

Bug description:

Short story:

  1. Socket is closed in _SelectorTransport.__del__, so it doesn't own the fd anymore.
  2. Anything else in the process can create a new file descriptor at this point.
  3. If the transport is gc-resurrected to call close(), it can then remove the reader/writer from loop._selector here or here for a fd it doesn't control anymore, causing access to that fd to hang.

Long story:

  1. Introduce a GC cycle, such that an asyncio.selector_events._SelectorTransport and the object owning it are collected at the same time.
  2. The transport __del__ will be called, closing the socket, freeing its file descriptor, but not performing any other cleanup.
  3. The owning object, being a good citizen, resurrects the transport during gc to have close() called on it properly. In the case of this issue, an asyncio task was created to close it later. (I definitely don't love seeing asyncio.create_task in a __del__ method, but I expect that's not the only way to trigger this)
  4. At this point, anything in your asyncio runloop can open a file descriptor. In my case, it was asyncio.create_subprocess_exec calling os.pidfd_create() in PidfdChildWatcher.add_child_handler. If you're lucky (or running a lot of tasks at once), the new fd will be the same as the resurrected socket's fd.
  5. Now transport.close() can run any time later, which will call loop._selector.remove_child_watcher(fd), despite not really owning the fd anymore.
  6. Now the pidfd has been removed from loop._selector, so await proc.wait() will hang forever. Or whatever asyncio stream you were trying to use got removed from the selector and you won't be able to wait on it.

I definitely think the non-stdlib code in play here was unsound, but I don't think that should trigger such a cursed result in the stdlib, so I'd rather defend against this in asyncio.

I think this is the minimum patch to mitigate the issue, to prevent the remove_child_reader and remove_child_writer calls by breaking their pre-conditions. I'm not sure if it's worth trying to interact with the loop or selector from __del__ to clean up the fd. (Selecting on a missing fd is probably handled somewhere else anyway?)

--- a/Lib/asyncio/selector_events.py
+++ b/Lib/asyncio/selector_events.py
@@ -871,6 +871,8 @@ def close(self):
     def __del__(self, _warn=warnings.warn):
         if self._sock is not None:
             _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
+            self._closing = True
+            self._buffer.clear()
             self._sock.close()
             if self._server is not None:
                 self._server._detach(self)

Another consideration is that asyncio is using a cached self._sock_fd here without really knowing if the underlying socket was closed and fd was recycled or not. I think this isn't the first time the cached fd has caused a surprising behavior, see also #88968

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtopic-asynciotype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions