Skip to content

[BUG]: Python API calls in destructors are unsafe outside the main thread #3274

Closed
@jbms

Description

@jbms

Required prerequisites

Problem description

I observed some random crashes while the Python interpreter is exiting normally, when there are threads created from C++ that invoke Python code immediately before the exit (this thread "correctly" uses gil_scoped_acquire while invoking the Python code).

After some debugging, I believe the root cause is the following:

  • When Python exits the Py_FinalizeEx function is called. After doing some initial work with the Python interpreter still in a valid state, it initiates the real shutdown by calling _PyRuntimeState_SetFinalizing:
    https://github.com/python/cpython/blob/09b4ad11f323f8702cde795e345b75e0fbb1a9a5/Python/pylifecycle.c#L1748
    At this point, the main thread calling Py_FinalizeEx holds the GIL and therefore no other threads can be performing any Python API calls. However, once _PyRuntimeState_SetFinalizing is called, other threads are are no longer blocked waiting for the GIL. Instead, when they attempt to acquire the GIL, pthread_exit is called.
  • On Linux with GCC and glibc, pthread_exit appears to behave as if a C++ exception of type __forced_unwind is thrown.
  • That means that by default, if there are any C++ frames in the call stack, their destructors will run. In particular, if there are any pybind11::object local variables, their destructors will invoke Py_DECREF without holding the GIL, which can lead to a crash. If there are any noexcept functions in the call stack, std::terminate will be called, which terminates the program with an error.

This issue is mentioned in this Python bug report as well:
https://bugs.python.org/issue42969

I'm not sure what happens on platforms other than Linux with glibc and GCC, but I suspect the behavior may be similar.

As I see it, this is really a bug in Python itself, but due to pybind11's use of C++ destructors to call Py_DECREF, the problem is much more apparent when using pybind11 (or any similar C++ RAII wrapper). When using pure C code, the thread will likely exit without doing any cleanup, which may be harmless in most cases.

Unfortunately I don't see an easy workaround. We need to ensure Py_DECREF or any other Python APIs are not called from a destructor while Python is finalizing. There are two ways we could go about that:

a. Checking _Py_IsFinalizing() from the pybind1::object destructor (and just skip calling Py_DECREF in that case) would work I think. Unfortunately _Py_IsFinalizing is not inline, so the cost would not be negligible.
b. Alternatively, we could attempt to block the unwinding itself, and just make the thread hang. We could wrap the call to Python API functions with a try { } catch (...) {sleep_forever();} block. However, there are a very large number of Python C API functions that can potentially lead to a call to pthread_exit. In addition to the API calls that directly acquire the GIL, any API call that can trigger user Python code (including any call to Py_DECREF), could also trigger pthread_exit, because that user Python code may release and then re-acquire the GIL.

I think (a) would definitely be the most practical option.

We would also need to document the fact that users should not invoke any Python API functions from destructors without checking _Py_IsFinalizing.

Reproducible example code

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions