Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault during the test__xxsubinterpreters test #100981

Open
galaxy4public opened this issue Jan 12, 2023 · 6 comments
Open

Segmentation fault during the test__xxsubinterpreters test #100981

galaxy4public opened this issue Jan 12, 2023 · 6 comments
Labels
OS-unsupported pending The issue will be closed if no feedback is provided tests Tests in the Lib/test dir topic-subinterpreters type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@galaxy4public
Copy link

galaxy4public commented Jan 12, 2023

Crash report

When running the testsuite one test is intermittently failing with a segmentation fault. The test in question is test=test__xxsubinterpreters and the crash seems to be GIL related. I am able to consistently reproduce the crash on my environment where I am building Python inside a chroot (/proc, /sys, and /dev are "mount --bind"ed from the host system).

On a freshly built with GCC 12.2.0 Python 3.11 with the following configure options:

CFLAGS='-pipe -O2 -ggdb -fomit-frame-pointer'
export CFLAGS
CXXFLAGS='-pipe -ggdb -O0'
export CXXFLAGS
FFLAGS='-pipe -ggdb -O0 '
export FFLAGS
FCFLAGS='-pipe -ggdb -O0 '
export FCFLAGS
LDFLAGS=
export LDFLAGS
/home/galaxy/rpm-work/BUILD/Python-3.11.1/configure \
       --enable-ipv6 \
       --with-computed-gotos \
       --with-dbmliborder=gdbm:ndbm:bdb \
       --with-system-expat \
       --without-system-ffi \
       --without-system-libmpdec \
       --enable-loadable-sqlite-extensions \
       --without-dtrace \
       --with-lto \
       --without-ensurepip \
       --with-pkg-config=yes \
       --without-static-libpython \
       --with-tzpath=/usr/share/zoneinfo \
       --with-openssl-rpath=no \
       --with-ssl-default-suites=openssl \
       --disable-optimizations \
       --with-pydebug

This is the testsuite output with just the test in question:

galaxy@apollo:~/rpm-work/BUILD/Python-3.11.1 $ LD_LIBRARY_PATH=/home/galaxy/rpm-work/BUILD/Python-3.11.1/build/debug build/debug/python -m test -j4 --slowest --timeout=1800 -W -F test__xxsubinterpreters
0:00:00 load avg: 0.05 Run tests in parallel using 4 child processes (timeout: 30 min, worker timeout: 35 min)
0:00:08 load avg: 0.65 [  1/1] test__xxsubinterpreters crashed (Exit code -11)
Fatal Python error: Segmentation fault

Thread 0x00007effcf9a2468 (most recent call first):
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/test__xxsubinterpreters.py", line 283 in clean_up_interpreters
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/test__xxsubinterpreters.py", line 299 in tearDown
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/unittest/case.py", line 584 in _callTearDown
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/unittest/case.py", line 626 in run
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/unittest/case.py", line 678 in __call__
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/unittest/suite.py", line 122 in run
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/unittest/suite.py", line 84 in __call__
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/unittest/suite.py", line 122 in run
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/unittest/suite.py", line 84 in __call__
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/unittest/suite.py", line 122 in run
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/unittest/suite.py", line 84 in __call__
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/unittest/runner.py", line 217 in run
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/support/__init__.py", line 1095 in _run_suite
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/support/__init__.py", line 1221 in run_unittest
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/libregrtest/runtest.py", line 276 in _test_module
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/libregrtest/runtest.py", line 312 in _runtest_inner2
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/libregrtest/runtest.py", line 355 in _runtest_inner
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/libregrtest/runtest.py", line 214 in _runtest
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/libregrtest/runtest.py", line 260 in runtest
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/libregrtest/runtest_mp.py", line 90 in run_tests_worker
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/libregrtest/main.py", line 722 in _main
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/libregrtest/main.py", line 701 in main
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/libregrtest/main.py", line 763 in main
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/regrtest.py", line 43 in _main
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/test/regrtest.py", line 47 in <module>
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/runpy.py", line 88 in _run_code
  File "/home/galaxy/rpm-work/BUILD/Python-3.11.1/Lib/runpy.py", line 198 in _run_module_as_main

Extension modules: _testcapi, _xxsubinterpreters (total: 2)
Kill <TestWorkerProcess #2 running test=test__xxsubinterpreters pid=1282561 time=8.8 sec> process group
Kill <TestWorkerProcess #3 running test=test__xxsubinterpreters pid=1282560 time=8.8 sec> process group
Kill <TestWorkerProcess #4 running test=test__xxsubinterpreters pid=1282564 time=8.8 sec> process group

== Tests result: FAILURE ==

10 slowest tests:

1 test failed:
    test__xxsubinterpreters

Total duration: 9.0 sec
Tests result: FAILURE
galaxy@apollo:~/rpm-work/BUILD/Python-3.11.1 $

Error messages

I acquired a core dump and under gdb it looks as follows:

Reading symbols from python...
[New LWP 1282606]
[New LWP 1282563]
[New LWP 1282568]
Core was generated by `/home/galaxy/rpm-work/BUILD/Python-3.11.1/build/debug/python -u -m test.regrtes'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007effcf93b6a5 in __syscall4 (n=14, a1=2, a2=139637136696304, a3=0, a4=8)
    at ./arch/x86_64/syscall_arch.h:38
38		__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1), "S"(a2),
[Current thread is 1 (LWP 1282606)]
(gdb) info threads
  Id   Target Id         Frame
* 1    LWP 1282606       0x00007effcf93b6a5 in __syscall4 (n=14, a1=2, a2=139637136696304, a3=0, a4=8)
    at ./arch/x86_64/syscall_arch.h:38
  2    LWP 1282563       0x00007effcf545221 in validate_list (head=0x7effce211d10,
    flags=collecting_clear_unreachable_clear)
    at /home/galaxy/rpm-work/BUILD/Python-3.11.1/Modules/gcmodule.c:397
  3    LWP 1282568       __cp_end () at src/thread/x86_64/syscall_cp.s:29
(gdb) bt
#0  0x00007effcf93b6a5 in __syscall4 (n=14, a1=2, a2=139637136696304, a3=0, a4=8)
    at ./arch/x86_64/syscall_arch.h:38
#1  0x00007effcf93b74c in __restore_sigs (set=0x7effcdee23f0) at src/signal/block.c:43
#2  0x00007effcf93ba59 in raise (sig=11) at src/signal/raise.c:11
#3  0x00007effcf54a6fd in faulthandler_fatal_error (signum=11)
    at /home/galaxy/rpm-work/BUILD/Python-3.11.1/Modules/faulthandler.c:385
#4  <signal handler called>
#5  0x00007effcf47e945 in drop_gil (ceval=0x7effcf855060 <_PyRuntime+352>, ceval2=0x7effce07a1c0,
    tstate=0x7effce3ddaa0) at /home/galaxy/rpm-work/BUILD/Python-3.11.1/Python/ceval_gil.h:169
#6  0x00007effcf47f3a3 in _PyEval_ReleaseLock (tstate=0x7effce3ddaa0)
    at /home/galaxy/rpm-work/BUILD/Python-3.11.1/Python/ceval.c:448
#7  0x00007effcf50d4ef in _PyThreadState_DeleteCurrent (tstate=0x7effce3ddaa0)
    at /home/galaxy/rpm-work/BUILD/Python-3.11.1/Python/pystate.c:1126
#8  0x00007effcf5c95df in thread_run (boot_raw=0x7effcdf8e750)
    at /home/galaxy/rpm-work/BUILD/Python-3.11.1/Modules/_threadmodule.c:1098
#9  0x00007effcf52cee7 in pythread_wrapper (arg=0x7effce3dd380)
    at /home/galaxy/rpm-work/BUILD/Python-3.11.1/Python/thread_pthread.h:241
#10 0x00007effcf957277 in start (p=0x7effcdee2b00) at src/thread/pthread_create.c:207
#11 0x00007effcf95bffe in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC
(gdb)

Your environment

  • CPython versions tested on: 3.11.1 (3.10 works and I cannot trigger the crash doing the exactly same steps)
  • Operating system and architecture: ALT Linux x86_64 (kernel 5.15.50, glibc 2.35.0.6) -> chroot with musl 1.2.3
  • The host machine is Intel(R) Core(TM) i3 CPU 530 @ 2.93GHz with 8GB of RAM
@galaxy4public galaxy4public added the type-crash A hard crash of the interpreter, possibly with a core dump label Jan 12, 2023
@galaxy4public
Copy link
Author

galaxy4public commented Jan 12, 2023

BTW, I have to specify "-F" to run the test indefinitely otherwise it's really hard to catch with a single invocation of the test. Also, "-j4" seems to be accelerate the crash, since with "-j0" it runs quite a few successful passes before crashing:

galaxy@apollo:~/rpm-work/BUILD/Python-3.11.1 $ LD_LIBRARY_PATH=/home/galaxy/rpm-work/BUILD/Python-3.11.1/build/debug build/debug/python -m test -j0 --slowest --timeout=1800 -W -F test__xxsubinterpreters
0:00:00 load avg: 0.22 Run tests in parallel using 6 child processes (timeout: 30 min, worker timeout: 35 min)
0:00:15 load avg: 1.50 [  1] test__xxsubinterpreters passed
0:00:15 load avg: 1.50 [  2] test__xxsubinterpreters passed
0:00:15 load avg: 1.50 [  3] test__xxsubinterpreters passed
0:00:15 load avg: 1.50 [  4] test__xxsubinterpreters passed
0:00:16 load avg: 1.50 [  5] test__xxsubinterpreters passed
0:00:16 load avg: 1.50 [  6] test__xxsubinterpreters passed
0:00:31 load avg: 2.57 [  7] test__xxsubinterpreters passed
0:00:31 load avg: 2.57 [  8] test__xxsubinterpreters passed
0:00:31 load avg: 2.57 [  9] test__xxsubinterpreters passed
0:00:31 load avg: 2.57 [ 10] test__xxsubinterpreters passed
0:00:32 load avg: 2.57 [ 11] test__xxsubinterpreters passed
0:00:32 load avg: 2.57 [ 12] test__xxsubinterpreters passed
0:00:47 load avg: 3.33 [ 13] test__xxsubinterpreters passed
0:00:48 load avg: 3.33 [ 14] test__xxsubinterpreters passed
0:00:48 load avg: 3.33 [ 15] test__xxsubinterpreters passed
0:00:48 load avg: 3.33 [ 16] test__xxsubinterpreters passed
0:00:48 load avg: 3.33 [ 17] test__xxsubinterpreters passed
0:00:48 load avg: 3.33 [ 18] test__xxsubinterpreters passed
0:01:04 load avg: 3.92 [ 19] test__xxsubinterpreters passed
0:01:04 load avg: 3.92 [ 20] test__xxsubinterpreters passed
0:01:04 load avg: 3.92 [ 21] test__xxsubinterpreters passed
0:01:04 load avg: 3.92 [ 22] test__xxsubinterpreters passed
0:01:05 load avg: 4.09 [ 23] test__xxsubinterpreters passed
0:01:05 load avg: 4.09 [ 24] test__xxsubinterpreters passed
0:01:19 load avg: 4.38 [ 25] test__xxsubinterpreters passed
0:01:20 load avg: 4.51 [ 26] test__xxsubinterpreters passed
0:01:20 load avg: 4.51 [ 27] test__xxsubinterpreters passed
0:01:20 load avg: 4.51 [ 28] test__xxsubinterpreters passed
0:01:20 load avg: 4.51 [ 29] test__xxsubinterpreters passed
0:01:20 load avg: 4.51 [ 30] test__xxsubinterpreters passed
0:01:32 load avg: 5.04 [ 31/1] test__xxsubinterpreters crashed (Exit code -11)
Fatal Python error: Segmentation fault

I also noticed that load average is climbing up despite that noting else is occupying the machine (OK, this is expected, it seems, the number should be close to the number of children).

If I run the test with "-j1" (just one child process) then either I am not patient enough or it seems that it does not crash in that scenario:

galaxy@apollo:~/rpm-work/BUILD/Python-3.11.1 $ LD_LIBRARY_PATH=/home/galaxy/rpm-work/BUILD/Python-3.11.1/build/debug build/debug/python -m test -j1 --slowest --timeout=1800 -W -F test__xxsubinterpreters
0:00:00 load avg: 0.39 Run tests in parallel using 1 child processes (timeout: 30 min, worker timeout: 35 min)
0:00:06 load avg: 0.56 [  1] test__xxsubinterpreters passed
0:00:13 load avg: 0.59 [  2] test__xxsubinterpreters passed
0:00:19 load avg: 0.63 [  3] test__xxsubinterpreters passed
0:00:26 load avg: 0.68 [  4] test__xxsubinterpreters passed
0:00:32 load avg: 0.71 [  5] test__xxsubinterpreters passed
0:00:39 load avg: 0.65 [  6] test__xxsubinterpreters passed
0:00:46 load avg: 0.71 [  7] test__xxsubinterpreters passed
0:00:53 load avg: 0.97 [  8] test__xxsubinterpreters passed
0:01:00 load avg: 1.38 [  9] test__xxsubinterpreters passed
[skipped a lot of successful passes]
0:10:13 load avg: 1.02 [ 92] test__xxsubinterpreters passed
0:10:20 load avg: 1.09 [ 93] test__xxsubinterpreters passed
0:10:27 load avg: 1.09 [ 94] test__xxsubinterpreters passed
0:10:34 load avg: 1.08 [ 95] test__xxsubinterpreters passed
^C
Kill <TestWorkerProcess #1 running test=test__xxsubinterpreters pid=1285535 time=1.3 sec> process group

== Tests result: INTERRUPTED ==
Test suite interrupted by signal SIGINT.

95 tests OK.

@AlexWaygood AlexWaygood added tests Tests in the Lib/test dir topic-subinterpreters labels Jan 12, 2023
@AlexWaygood
Copy link
Member

Cc. @ericsnowcurrently

@Fidget-Spinner
Copy link
Member

Seems similar to or at least related to #100711.

@ericsnowcurrently
Copy link
Member

@galaxy4public, do you still get the failure with gh-101431 applied?

@galaxy4public
Copy link
Author

@ericsnowcurrently, unfortunately, I don't have enough knowledge for backporting the mentioned commit to the latest released version of Python (which is 3.11.2) and I cannot proceed with the bleeding edge, so it seems I cannot test the fix :(.

@ericsnowcurrently
Copy link
Member

Can you verify if this is fixed now? This may have been a case of gh-104341.

@ericsnowcurrently ericsnowcurrently added the pending The issue will be closed if no feedback is provided label Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-unsupported pending The issue will be closed if no feedback is provided tests Tests in the Lib/test dir topic-subinterpreters type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Status: Todo
Status: No status
Development

No branches or pull requests

5 participants