Skip to content

Deadlock in pool_in_threads.py (test_multiprocessing_pool_circular_import) in free-threaded build #119369

Closed
@colesbury

Description

@colesbury

Bug report

Running pool_in_threads.py in a loop will occasionally lead to deadlock even with #118745 applied.

Here's a summary of the state I observed:

Thread 28

Thread 28 holds the GIL and is blocked on the QSBR shared mutex:

#6  0x000055989dd7d8a6 in _PySemaphore_PlatformWait (sema=sema@entry=0x7f8ba4ff8c60, timeout=timeout@entry=-1) at Python/parking_lot.c:142
#7  0x000055989dd7d9dc in _PySemaphore_Wait (sema=sema@entry=0x7f8ba4ff8c60, timeout=timeout@entry=-1, detach=detach@entry=1) at Python/parking_lot.c:213
#8  0x000055989dd7db65 in _PyParkingLot_Park (addr=addr@entry=0x55989e0db1c8 <_PyRuntime+136840>, expected=expected@entry=0x7f8ba4ff8cf7, size=size@entry=1, timeout_ns=timeout_ns@entry=-1, park_arg=park_arg@entry=0x7f8ba4ff8d00, detach=detach@entry=1) at Python/parking_lot.c:316
#9  0x000055989dd7494d in _PyMutex_LockTimed (m=m@entry=0x55989e0db1c8 <_PyRuntime+136840>, timeout=timeout@entry=-1, flags=flags@entry=_PY_LOCK_DETACH) at Python/lock.c:112
#10 0x000055989dd74a4e in _PyMutex_LockSlow (m=m@entry=0x55989e0db1c8 <_PyRuntime+136840>) at Python/lock.c:53
#11 0x000055989dd9c8e1 in PyMutex_Lock (m=0x55989e0db1c8 <_PyRuntime+136840>) at ./Include/internal/pycore_lock.h:75
#12 _Py_qsbr_unregister (tstate=tstate@entry=0x7f8bec00bad0) at Python/qsbr.c:239
#13 0x000055989dd92142 in tstate_delete_common (tstate=tstate@entry=0x7f8bec00bad0) at Python/pystate.c:1797
#14 0x000055989dd92831 in _PyThreadState_DeleteCurrent (tstate=tstate@entry=0x7f8bec00bad0) at Python/pystate.c:1845

It's blocked trying to lock the QSBR shared mutex:

PyMutex_Lock(&shared->mutex);

Normally, this would release the GIL while blocking, but we clear the thread state before calling tstate_delete_common:

cpython/Python/pystate.c

Lines 1844 to 1845 in 9fa206a

current_fast_clear(tstate->interp->runtime);
tstate_delete_common(tstate);

We only detach (and release the GIL) if we both have a thread state and it's currently attached:

if (tstate && _Py_atomic_load_int_relaxed(&tstate->state) ==
_Py_THREAD_ATTACHED) {
// Only detach if we are attached
PyEval_ReleaseThread(tstate);

Note that the PyThreadState in this case is actually attached, just not visible from _PyThreadState_GET().

Thread 4

Thread 4 holds the shared mutex and is blocked trying to acquire the GIL:

#6  take_gil (tstate=tstate@entry=0x55989eeb6e00) at Python/ceval_gil.c:331
#7  0x000055989dd4c93c in _PyEval_AcquireLock (tstate=tstate@entry=0x55989eeb6e00) at Python/ceval_gil.c:585
#8  0x000055989dd92ca3 in _PyThreadState_Attach (tstate=tstate@entry=0x55989eeb6e00) at Python/pystate.c:2071
#9  0x000055989dd4c9bb in PyEval_AcquireThread (tstate=tstate@entry=0x55989eeb6e00) at Python/ceval_gil.c:602
#10 0x000055989dd7d9eb in _PySemaphore_Wait (sema=sema@entry=0x7f8bf3ffe240, timeout=timeout@entry=-1, detach=detach@entry=1) at Python/parking_lot.c:215
#11 0x000055989dd7db65 in _PyParkingLot_Park (addr=addr@entry=0x55989e0c0108 <_PyRuntime+26056>, expected=expected@entry=0x7f8bf3ffe2c8, size=size@entry=8, timeout_ns=timeout_ns@entry=-1, park_arg=park_arg@entry=0x0, detach=detach@entry=1) at Python/parking_lot.c:316
#12 0x000055989dd747ce in rwmutex_set_parked_and_wait (rwmutex=rwmutex@entry=0x55989e0c0108 <_PyRuntime+26056>, bits=bits@entry=1) at Python/lock.c:386
#13 0x000055989dd74e38 in _PyRWMutex_RLock (rwmutex=0x55989e0c0108 <_PyRuntime+26056>) at Python/lock.c:404
#14 0x000055989dd9357a in stop_the_world (stw=stw@entry=0x55989e0db190 <_PyRuntime+136784>) at Python/pystate.c:2234
#15 0x000055989dd936d3 in _PyEval_StopTheWorld (interp=interp@entry=0x55989e0d8780 <_PyRuntime+126016>) at Python/pystate.c:2331
#16 0x000055989dd9c761 in _Py_qsbr_reserve (interp=interp@entry=0x55989e0d8780 <_PyRuntime+126016>) at Python/qsbr.c:201
#17 0x000055989dd908d4 in new_threadstate (interp=0x55989e0d8780 <_PyRuntime+126016>, whence=whence@entry=2) at Python/pystate.c:1543
#18 0x000055989dd92769 in _PyThreadState_New (interp=<optimized out>, whence=whence@entry=2) at Python/pystate.c:1624

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.13bugs and security fixes3.14bugs and security fixestopic-free-threadingtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions