Skip to content

intern_static is not thread-safe with multiple interpreters #122291

Closed
@colesbury

Description

@colesbury

Bug report

Most static strings are interned during Python initialization in _PyUnicode_InitStaticStrings. However, the _Py_LATIN1_CHR characters (code points 0-255) are static, but not interned. They may be interned later while the Python is running. This can happen for various reasons, including calls to sys.intern.

This isn't thread-safe: it modifies the hashtable _PyRuntime.cached_objects.interned_strings, which is shared across threads and interpreters, without any synchronization.

It also can break the interning identity invariant. You can have a non-static, interned 1-characters string later shadowed by the global interning of the static 1-character string.

Suggestions

  • The _PyRuntime.cached_objects.interned_strings should be immutable. We should not modify it after Py_Initialize() until shutdown (i.e., _PyUnicode_ClearInterned called from finalize_interp_types())
  • The 1-character latin1 strings should be interned. This can either be by explicitly interning them during startup, or by handling 1-character strings specially in intern_common.

cc @encukou @ericsnowcurrently

Linked PRs

Metadata

Metadata

Assignees

Labels

type-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions