Description
Required prerequisites
- Make sure you've read the documentation. Your issue may be addressed there.
- Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
- Consider asking first in the Gitter chat room or in a Discussion.
Problem description
I observed some random crashes while the Python interpreter is exiting normally, when there are threads created from C++ that invoke Python code immediately before the exit (this thread "correctly" uses gil_scoped_acquire while invoking the Python code).
After some debugging, I believe the root cause is the following:
- When Python exits the
Py_FinalizeEx
function is called. After doing some initial work with the Python interpreter still in a valid state, it initiates the real shutdown by calling_PyRuntimeState_SetFinalizing
:
https://github.com/python/cpython/blob/09b4ad11f323f8702cde795e345b75e0fbb1a9a5/Python/pylifecycle.c#L1748
At this point, the main thread callingPy_FinalizeEx
holds the GIL and therefore no other threads can be performing any Python API calls. However, once_PyRuntimeState_SetFinalizing
is called, other threads are are no longer blocked waiting for the GIL. Instead, when they attempt to acquire the GIL,pthread_exit
is called. - On Linux with GCC and glibc,
pthread_exit
appears to behave as if a C++ exception of type__forced_unwind
is thrown. - That means that by default, if there are any C++ frames in the call stack, their destructors will run. In particular, if there are any pybind11::object local variables, their destructors will invoke Py_DECREF without holding the GIL, which can lead to a crash. If there are any
noexcept
functions in the call stack,std::terminate
will be called, which terminates the program with an error.
This issue is mentioned in this Python bug report as well:
https://bugs.python.org/issue42969
I'm not sure what happens on platforms other than Linux with glibc and GCC, but I suspect the behavior may be similar.
As I see it, this is really a bug in Python itself, but due to pybind11's use of C++ destructors to call Py_DECREF, the problem is much more apparent when using pybind11 (or any similar C++ RAII wrapper). When using pure C code, the thread will likely exit without doing any cleanup, which may be harmless in most cases.
Unfortunately I don't see an easy workaround. We need to ensure Py_DECREF or any other Python APIs are not called from a destructor while Python is finalizing. There are two ways we could go about that:
a. Checking _Py_IsFinalizing()
from the pybind1::object destructor (and just skip calling Py_DECREF
in that case) would work I think. Unfortunately _Py_IsFinalizing
is not inline, so the cost would not be negligible.
b. Alternatively, we could attempt to block the unwinding itself, and just make the thread hang. We could wrap the call to Python API functions with a try { } catch (...) {sleep_forever();}
block. However, there are a very large number of Python C API functions that can potentially lead to a call to pthread_exit
. In addition to the API calls that directly acquire the GIL, any API call that can trigger user Python code (including any call to Py_DECREF), could also trigger pthread_exit
, because that user Python code may release and then re-acquire the GIL.
I think (a) would definitely be the most practical option.
We would also need to document the fact that users should not invoke any Python API functions from destructors without checking _Py_IsFinalizing
.
Reproducible example code
No response