Skip to content

Creating an ungodly amount of sub interpreters in a short amount of time causes memory debug assertions. #123134

Open
@bruxisma

Description

Bug report

Bug description:

Hello. While working on a small joke program, I found a possible memory corruption issue (it could also be a threading issue?) when using the Python C API in a debug only build to quickly create, execute python code, and then destroy 463 sub interpreters. Before I post the code sample and the debug output I'm using a somewhat unique build environment for a Windows developer.

  • clang++ 18.1.7 for x86_64-pc-windows-msvc
  • Visual Studio Build Tools 2022 17.11.0
  • CMake 3.30
  • Python 3.12.5
  • ninja 1.12.1

When running the code sample I've attached at the bottom of this post, I am unable to get the exact same output each time, though the traceback does fire in the same location (Due to the size of the traceback I've not attached it, as it's about 10 MB of text for each thread). Additionally, I sometimes have to run the executable several times to get the error to occur. Lastly, release builds do not exhibit any thread crashes or issues as the debug assertions never fire or execute.

The error output seems to also halt in some cases, either because of stack failure or some other issue I was unable to determine and seemed to possibly be outside the scope of the issue presented here. I have the entire error output from one execution where I was able to save the output.

Debug memory block at address p=00000267272D6030: API 'p'
    16718206241729413120 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
        at p-7: 0xa0 *** OUCH
        at p-6: 0x00 *** OUCH
        at p-5: 0x00Hello, world! From Thread 31
Hello, world! From Thread 87
 *** OUCH
        at p-4: 0xfd
Hello, world! From Thread 43
Hello, world! From Thread 279
Generating thread state 314
        at p-3: 0xfd
        at p-2: 0xfd
Hello, world! From Thread 168
Generating thread state 315
        at p-1: 0xfd
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
Generating thread state 316
Generating thread state 317
    The 8 pad bytes at tail=E8030267272D6030 are

The output cut off after this, as the entire program crashed, taking my terminal with it 😅

You'll find the MRE code below. I've also added a minimal version of CMakeLists.txt file I used so anyone can recreate the build with the code below (Any warnings, or additional settings I have do not affect whether the error occurs or not). The code appears to breaks inside of _PyObject_DebugDumpAddress, based on what debugging I was able to do with WinDbg.

Important

std::jthread calls .join() on destruction, so all threads auto-join once the std::vector goes out of scope.

Additionally this code exhibits the same behavior regardless of whether it is a thread_local or declared within the lambda passed to std::thread

main.cxx

#include <vector>
#include <thread>
#include <cstdlib>
#include <print>

#include <Python.h>

namespace {

static thread_local inline PyThreadState* state = nullptr;
static inline constexpr auto MAX_STATES = 463; 
static inline constexpr auto config = PyInterpreterConfig {
  .use_main_obmalloc = 0,
  .allow_fork = 0,
  .allow_exec = 0,
  .allow_threads = 0,
  .allow_daemon_threads = 0,
  .check_multi_interp_extensions = 1,
  .gil = PyInterpreterConfig_OWN_GIL,
};

} /* nameless namespace */

void execute () {
  std::vector<std::jthread> tasks { };
  tasks.reserve(MAX_STATES);
  for (auto count = 0zu; count < tasks.capacity(); count++) {
    std::println("Generating thread state {}", count);
    tasks.emplace_back([count] {
      if (auto status = Py_NewInterpreterFromConfig(&state, &config); PyStatus_IsError(status)) {
        std::println("Failed to initialize thread state {}", count);
        return;
      }
      auto text = std::format(R"(print("Hello, world! From Thread {}"))", count);
      auto globals = PyDict_New();
      auto code = Py_CompileString(text.data(), __FILE__, Py_eval_input);
      auto result = PyEval_EvalCode(code, globals, globals);
      Py_DecRef(result);
      Py_DecRef(code);
      Py_DecRef(globals);
      Py_EndInterpreter(state);
      state = nullptr;
    });
  }
}

int main() {
  PyConfig config {};
  PyConfig_InitIsolatedConfig(&config);
  if (auto status = Py_InitializeFromConfig(&config); PyStatus_IsError(status)) {
    std::println("Failed to initialize with isolated config: {}", status.err_msg);
    return EXIT_FAILURE;
  }
  PyConfig_Clear(&config);
  execute();
  Py_Finalize();
}

CMakeLists.txt

cmake_minimum_required(VERSION 3.30)
project(463-interpreters LANGUAGES C CXX)

find_package(Python 3.12 REQUIRED COMPONENTS Development.Embed)

set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>")

add_executable(${PROJECT_NAME})
target_sources(${PROJECT_NAME} PRIVATE main.cxx)
target_compile_features(${PROJECT_NAME} PRIVATE cxx_std_23)
target_precompile_headers(${PROJECT_NAME} PRIVATE <Python.h>)
target_link_libraries(${PROJECT_NAME} PRIVATE Python::Python)

Command to build + run

$ cmake -Bbuild -S. -G "Ninja"
$ cmake --build build && .\build\463-interpreters.exe

CPython versions tested on:

3.12

Operating systems tested on:

Windows

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    • Status

      Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions