Closed
Description
Bug report
Most static strings are interned during Python initialization in _PyUnicode_InitStaticStrings
. However, the _Py_LATIN1_CHR
characters (code points 0-255) are static, but not interned. They may be interned later while the Python is running. This can happen for various reasons, including calls to sys.intern
.
This isn't thread-safe: it modifies the hashtable _PyRuntime.cached_objects.interned_strings
, which is shared across threads and interpreters, without any synchronization.
It also can break the interning identity invariant. You can have a non-static, interned 1-characters string later shadowed by the global interning of the static 1-character string.
Suggestions
- The
_PyRuntime.cached_objects.interned_strings
should be immutable. We should not modify it afterPy_Initialize()
until shutdown (i.e.,_PyUnicode_ClearInterned
called fromfinalize_interp_types()
) - The 1-character latin1 strings should be interned. This can either be by explicitly interning them during startup, or by handling 1-character strings specially in
intern_common
.
cc @encukou @ericsnowcurrently