Description
Bug report
With a ProcessPoolExecutor, after submitting and quickly canceling a future, a call to shutdown(wait=True)
would hang indefinitely.
This happens pretty much on all platforms and all recent Python versions.
Here is a minimal reproduction:
import concurrent.futures
ppe = concurrent.futures.ProcessPoolExecutor(1)
ppe.submit(int).result()
ppe.submit(int).cancel()
ppe.shutdown(wait=True)
The first submission gets the executor going and creates its internal queue_management_thread
.
The second submission appears to get that thread to loop, enter a wait state, and never receive a wakeup event.
Introducing a tiny sleep between the second submit and its cancel request makes the issue disappear. From my initial observation it looks like something in the way the queue_management_worker
internal loop is structured doesn't handle this edge case well.
Shutting down with wait=False
would return immediately as expected, but the queue_management_thread
would then die with an unhandled OSError: handle is closed
exception.
Environment
- Discovered on macOS-12.2.1 with cpython 3.8.5.
- Reproduced in Ubuntu and Windows (x64) as well, and in cpython versions 3.7 to 3.11.0-beta.3.
- Reproduced in pypy3.8 as well, but not consistently. Seen for example in Ubuntu with Python 3.8.13 (PyPy 7.3.9).
Additional info
When tested with pytest-timeout
under Ubuntu and cpython 3.8.13, these are the tracebacks at the moment of timing out:
_____________________________________ test _____________________________________
@pytest.mark.timeout(10)
def test():
ppe = concurrent.futures.ProcessPoolExecutor(1)
ppe.submit(int).result()
ppe.submit(int).cancel()
> ppe.shutdown(wait=True)
test_reproduce_python_bug.py:14:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/concurrent/futures/process.py:686: in shutdown
self._queue_management_thread.join()
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py:1011: in join
self._wait_for_tstate_lock()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Thread(QueueManagerThread, started daemon 140003176535808)>
block = True, timeout = -1
def _wait_for_tstate_lock(self, block=True, timeout=-1):
# Issue #18808: wait for the thread state to be gone.
# At the end of the thread's life, after all knowledge of the thread
# is removed from C data structures, C code releases our _tstate_lock.
# This method passes its arguments to _tstate_lock.acquire().
# If the lock is acquired, the C code is done, and self._stop() is
# called. That sets ._is_stopped to True, and ._tstate_lock to None.
lock = self._tstate_lock
if lock is None: # already determined that the C code is done
assert self._is_stopped
> elif lock.acquire(block, timeout):
E Failed: Timeout >10.0s
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py:1027: Failed
----------------------------- Captured stderr call -----------------------------
+++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
~~~~~~~~~~~~~~~~~ Stack of QueueFeederThread (140003159754496) ~~~~~~~~~~~~~~~~~
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/queues.py", line 227, in _feed
nwait()
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 302, in wait
waiter.acquire()
~~~~~~~~~~~~~~~~ Stack of QueueManagerThread (140003176535808) ~~~~~~~~~~~~~~~~~
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/concurrent/futures/process.py", line 362, in _queue_management_worker
ready = mp.connection.wait(readers + worker_sentinels)
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/connection.py", line 931, in wait
ready = selector.select(timeout)
File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
+++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
Tracebacks in PyPy are similar on the concurrent.futures.process
level. Tracebacks in Windows are different in the lower-level areas, but again similar on the concurrent.futures.process
level.
Linked PRs:
Linked PRs
Metadata
Metadata
Assignees
Projects
Status