Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-104690: thread_run() checks for tstate dangling pointer #109056

Merged
merged 1 commit into from
Sep 8, 2023

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Sep 7, 2023

thread_run() of _threadmodule.c now calls
_PyThreadState_CheckConsistency() to check if tstate is a dangling pointer.

Rename ceval_gil.c is_tstate_valid() to
_PyThreadState_CheckConsistency() to reuse it in _threadmodule.c.

thread_run() of _threadmodule.c now calls
_PyThreadState_CheckConsistency() to check if tstate is a dangling
pointer.

Rename ceval_gil.c is_tstate_valid() to
_PyThreadState_CheckConsistency() to reuse it in _threadmodule.c.
@vstinner
Copy link
Member Author

vstinner commented Sep 7, 2023

This change provides a better error message rather than a blind crash. It should help debugging complicated crashes.

Example:

$ ./python bug.py 
python: Python/pystate.c:2904: _PyThreadState_CheckConsistency: Assertion `!_PyMem_IsPtrFreed(tstate->interp)' failed.
Abandon (core dumped)

$ gdb -args ./python bug.py 
(gdb) run
Starting program: /home/vstinner/python/main/python bug.py
[New Thread 0x7fffea2ff6c0 (LWP 1148671)]
[New Thread 0x7fffe9afe6c0 (LWP 1148672)]
python: Python/pystate.c:2904: _PyThreadState_CheckConsistency: Assertion `!_PyMem_IsPtrFreed(tstate->interp)' failed.

Thread 3 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe9afe6c0 (LWP 1148672)]
0x00007ffff7d7a844 in __pthread_kill_implementation () from /lib64/libc.so.6

(gdb) where
#0  0x00007ffff7d7a844 in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff7d29abe in raise () from /lib64/libc.so.6
#2  0x00007ffff7d1287f in abort () from /lib64/libc.so.6
#3  0x00007ffff7d1279b in __assert_fail_base.cold () from /lib64/libc.so.6
#4  0x00007ffff7d22147 in __assert_fail () from /lib64/libc.so.6
#5  0x00000000006e204a in _PyThreadState_CheckConsistency (tstate=0x7fffe4000e50) at Python/pystate.c:2904
#6  0x000000000078f006 in thread_run (boot_raw=0x7fffea4be1b0) at ./Modules/_threadmodule.c:1081
#7  0x00000000006ff043 in pythread_wrapper (arg=0xb86580) at Python/thread_pthread.h:233
#8  0x00007ffff7d78907 in start_thread () from /lib64/libc.so.6
#9  0x00007ffff7dfe870 in clone3 () from /lib64/libc.so.6

Patch to reintroduce the bug:

diff --git a/Modules/_threadmodule.c b/Modules/_threadmodule.c
index 49f34fcb9f..f5fc209e88 100644
--- a/Modules/_threadmodule.c
+++ b/Modules/_threadmodule.c
@@ -1158,11 +1161,13 @@ thread_PyThread_start_new_thread(PyObject *self, PyObject *fargs)
                         "thread is not supported for isolated subinterpreters");
         return NULL;
     }
+#if 0
     if (interp->finalizing) {
         PyErr_SetString(PyExc_RuntimeError,
                         "can't create new thread at interpreter shutdown");
         return NULL;
     }
+#endif

     struct bootstate *boot = PyMem_NEW(struct bootstate, 1);
     if (boot == NULL) {

Script bug.py:

import atexit
import threading

def t0():
    pass

def t1():
    threading.Thread(target=t0).start()

def f():
    threading.Thread(target=t1).start()

atexit.register(f)
exit()

@vstinner
Copy link
Member Author

vstinner commented Sep 7, 2023

I don't think that it's related, but in a recent job on Windows x86, the following assertion failed while running test_threading() in a test worker process:

Assertion failed: tstate_is_alive(tstate) && !tstate->_status.bound, file D:\\a\\cpython\\cpython\\Python\\pystate.c, line 245

Related code:

static void
bind_tstate(PyThreadState *tstate)
{
    assert(tstate != NULL);
    assert(tstate_is_alive(tstate) && !tstate->_status.bound); /// <=== HERE, line 245
    assert(!tstate->_status.unbound);  // just in case
    assert(!tstate->_status.bound_gilstate);
    assert(tstate != gilstate_tss_get(tstate->interp->runtime));
    assert(!tstate->_status.active);
    assert(tstate->thread_id == 0);
    assert(tstate->native_thread_id == 0);

I don't know if it would be worth it to repeat the _PyThreadState_CheckConsistency() check in bind_tstate(). I don't think so, since my change already calls it in thread_run().

See: #108987

cc @serhiy-storchaka

@vstinner
Copy link
Member Author

vstinner commented Sep 7, 2023

Oh! I managed to reproduce issue #108987 on Windows 32-bit with my PR and I get:

Assertion failed: !_PyMem_IsPtrFreed(tstate->interp), file C:\victor\python\main\Python\pystate.c, line 2911
Fatal Python error: Aborted

Thread 0x00000ef0 (most recent call first):
  <no Python frame>

So apparently, thread_run() is called after PyInterpreterState_Delete() has been called!

@vstinner vstinner merged commit f63d378 into python:main Sep 8, 2023
18 checks passed
@vstinner vstinner deleted the tstate_consistency branch September 8, 2023 09:50
@vstinner vstinner added needs backport to 3.11 only security fixes needs backport to 3.12 bug and security fixes labels Sep 8, 2023
@miss-islington
Copy link
Contributor

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @vstinner, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker f63d37877ad166041489a968233b57540f8456e8 3.11

@miss-islington
Copy link
Contributor

Sorry, @vstinner, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker f63d37877ad166041489a968233b57540f8456e8 3.12

vstinner added a commit to vstinner/cpython that referenced this pull request Sep 8, 2023
…hon#109056)

thread_run() of _threadmodule.c now calls
_PyThreadState_CheckConsistency() to check if tstate is a dangling
pointer when Python is built in debug mode.

Rename ceval_gil.c is_tstate_valid() to
_PyThreadState_CheckConsistency() to reuse it in _threadmodule.c.

(cherry picked from commit f63d378)
@bedevere-bot
Copy link

GH-109133 is a backport of this pull request to the 3.12 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.12 bug and security fixes label Sep 8, 2023
@bedevere-bot
Copy link

GH-109134 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Sep 8, 2023
vstinner added a commit to vstinner/cpython that referenced this pull request Sep 8, 2023
…hon#109056)

thread_run() of _threadmodule.c now calls
_PyThreadState_CheckConsistency() to check if tstate is a dangling
pointer when Python is built in debug mode.

Rename ceval_gil.c is_tstate_valid() to
_PyThreadState_CheckConsistency() to reuse it in _threadmodule.c.

(cherry picked from commit f63d378)
vstinner added a commit that referenced this pull request Sep 8, 2023
…09056) (#109134)

gh-104690: thread_run() checks for tstate dangling pointer (#109056)

thread_run() of _threadmodule.c now calls
_PyThreadState_CheckConsistency() to check if tstate is a dangling
pointer when Python is built in debug mode.

Rename ceval_gil.c is_tstate_valid() to
_PyThreadState_CheckConsistency() to reuse it in _threadmodule.c.

(cherry picked from commit f63d378)
Yhg1s pushed a commit that referenced this pull request Oct 2, 2023
…09056) (#109133)

gh-104690: thread_run() checks for tstate dangling pointer (#109056)

thread_run() of _threadmodule.c now calls
_PyThreadState_CheckConsistency() to check if tstate is a dangling
pointer when Python is built in debug mode.

Rename ceval_gil.c is_tstate_valid() to
_PyThreadState_CheckConsistency() to reuse it in _threadmodule.c.

(cherry picked from commit f63d378)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants