Skip to content

Using internal tokenize module's TokenizerIter in multiple threads crashes #120317

Closed
@lysnikolaou

Description

@lysnikolaou

Crash report

What happened?

Because the tokenizer is not thread-safe, using the same TokenizerIter in multiple threads under the free-threaded build leads to all kinds of unpredicted behavior. It sometimes succeeds, sometimes throws a SyntaxError when there's none and sometimes crashes with the following.

Example error backtrace
Fatal Python error: tok_backup: tok_backup: wrong character
Python runtime state: initialized

Current thread 0x0000000172e1b000 (most recent call first):
  File "/Users/lysnikolaou/repos/python/cpython/tmp/t1.py", line 9 in next_token
  File "/Users/lysnikolaou/repos/python/cpython/Lib/concurrent/futures/thread.py", line 58 in run
  File "/Users/lysnikolaou/repos/python/cpython/Lib/concurrent/futures/thread.py", line 92 in _worker
  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 990 in run
  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 1039 in _bootstrap_inner
  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 1010 in _bootstrap

Thread 0x0000000171e0f000 (most recent call first):
  File "/Users/lysnikolaou/repos/python/cpython/tmp/t1.py", line 10 in next_token
  File "/Users/lysnikolaou/repos/python/cpython/Lib/concurrent/futures/thread.py", line 58 in run
  File "/Users/lysnikolaou/repos/python/cpython/Lib/concurrent/futures/thread.py", line 92 in _worker
  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 990 in run
  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 1039 in _bootstrap_inner
  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 1010 in _bootstrap

Thread 0x0000000170e03000 (most recent call first):
  File "/Users/lysnikolaou/repos/python/cpython/Lib/concurrent/futures/_base.py", line 550 in set_exception
  File "/Users/lysnikolaou/repos/python/cpython/Lib/concurrent/futures/thread.py", line 60 in run
  File "/Users/lysnikolaou/repos/python/cpython/Lib/concurrent/futures/thread.py", line 92 in _worker
  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 990 in run
  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 1039 in _bootstrap_inner
  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 1010 in _bootstrap

Thread 0x000000016fdf7000 (most recent call first):
  File "/Users/lysnikolaou/repos/python/cpython/tmp/t1.py", line 10 in next_token
  File "/Users/lysnikolaou/repos/python/cpython/Lib/concurrent/futures/thread.py", line 58 in run
  File "/Users/lysnikolaou/repos/python/cpython/Lib/concurrent/futures/thread.py", line 92 in Assertion failed: (tok->done != E_ERROR), function _syntaxerror__workerrange, file helpers.c, line 17.

  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 990 in run
  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 1039 in _bootstrap_inner
  File "/Users/lysnikolaou/repos/python/cpython/Lib/threading.py", line 1010 in _bootstrap

Thread 0x000000016edeb000 (most recent call first):
  File "/Users/lysnikolaou/repos/python/cpython/tmp/t1.py", line 10 in next_token
  File "/Users/lysnikolaou/repos/python/cpython/Lib/concurrent/futures/thread.pyzsh: abort      ./python.exe tmp/t1.py

A minimal reproducer is the following:

import concurrent.futures
import io
import time
import tokenize

def next_token(it):
    while True:
        try:
            r = next(it)
            print(tokenize.TokenInfo._make(r))
            time.sleep(1)
        except StopIteration:
            return


for _ in range(20):
    with concurrent.futures.ThreadPoolExecutor() as executor:
        source = io.StringIO("a = 'abc'\nprint(b)\nfor _ in a:  do_something()")
        it = tokenize._tokenize.TokenizerIter(source.readline, extra_tokens=False)
        threads = (executor.submit(next_token, it) for _ in range(5))
        for t in concurrent.futures.as_completed(threads):
            t.result()
        print("######################################################")

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS

Output from running 'python -VV' on the command line:

Python 3.14.0a0 experimental free-threading build (heads/main:c3b6dbff2c8, Jun 10 2024, 14:33:07) [Clang 15.0.0 (clang-1500.3.9.4)]

Linked PRs

Metadata

Metadata

Assignees

Labels

3.13bugs and security fixes3.14new features, bugs and security fixestopic-free-threadingtype-crashA hard crash of the interpreter, possibly with a core dump

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions