Closed
Description
Bug report
Not closing an f-string in the REPL or a file leads to a use-after-free. This had to do with how f-string buffers are updated when in need of reallocating more space for the tokenizer buffer and it was introduced in 1ef61cf. Here's an example (this only fails with address sanitizer enabled):
❯ ./python.exe
Python 3.12.0a7+ (heads/fix-updating-fstring-buffers-tok:0056701aa3, Apr 23 2023, 11:12:49) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> f"
...
=================================================================
==8991==ERROR: AddressSanitizer: heap-use-after-free on address 0x000104313550 at pc 0x0001006c6d98 bp 0x00016fa28e70 sp 0x00016fa28e68
READ of size 1 at 0x000104313550 thread T0
#0 0x1006c6d94 in unicode_decode_utf8 unicodeobject.c:4526
#1 0x1006cc0bc in PyUnicode_DecodeUTF8 unicodeobject.c:4431
#2 0x10050f1d8 in _syntaxerror_range tokenizer.c:1252
#3 0x10050da08 in syntaxerror tokenizer.c:1294
#4 0x100500bbc in _PyTokenizer_Get tokenizer.c:2639
#5 0x1003d6810 in _PyPegen_fill_token pegen.c:201
#6 0x100475738 in fstring_replacement_field_rule parser.c:15626
#7 0x1003eeb98 in fstring_rule parser.c:1334
#8 0x10044b1fc in strings_rule parser.c:15962
#9 0x10041b00c in atom_rule parser.c:14388
#10 0x100427ee4 in t_primary_rule parser.c:18429
#11 0x1004ef890 in single_subscript_attribute_target_rule parser.c:18319
#12 0x1004e75a8 in _tmp_13_rule parser.c:25515
#13 0x1004d8a34 in simple_stmt_rule parser.c:1730
#14 0x1003f4140 in simple_stmts_rule parser.c:1625
#15 0x1003e81f4 in _PyPegen_parse parser.c:41238
#16 0x1003da7b4 in _PyPegen_run_parser pegen.c:825
#17 0x1003dae14 in _PyPegen_run_parser_from_file_pointer pegen.c:897
#18 0x1004fdc6c in _PyParser_ASTFromFile peg_api.c:26
#19 0x1008f0c74 in PyRun_InteractiveOneObjectEx pythonrun.c:240
#20 0x1008efa04 in _PyRun_InteractiveLoopObject pythonrun.c:137
#21 0x1008ef6e8 in _PyRun_AnyFileObject pythonrun.c:72
#22 0x1008f0920 in PyRun_AnyFileExFlags pythonrun.c:104
#23 0x100943acc in Py_RunMain main.c:689
#24 0x1009448d4 in pymain_main main.c:719
#25 0x100944b74 in Py_BytesMain main.c:743
#26 0x1003d5b7c in main python.c:15
#27 0x18e8dff24 (<unknown module>)
0x000104313550 is located 16 bytes inside of 28-byte region [0x000104313540,0x00010431355c)
freed by thread T0 here:
#0 0x101c070ec in wrap_realloc+0x9c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x430ec) (BuildId: f0a7ac5c49bc3abc851181b6f92b308a32000000200000000100000000000b00)
#1 0x10065213c in _PyMem_RawRealloc obmalloc.c:64
#2 0x100653cd0 in _PyMem_DebugRawRealloc obmalloc.c:1957
#3 0x1006544e4 in _PyMem_DebugRealloc obmalloc.c:2045
#4 0x100654afc in PyMem_Realloc obmalloc.c:609
#5 0x10050e950 in tok_reserve_buf tokenizer.c:480
#6 0x10050c3c4 in tok_nextc tokenizer.c:1198
#7 0x100500490 in _PyTokenizer_Get tokenizer.c:2639
#8 0x1003d6810 in _PyPegen_fill_token pegen.c:201
#9 0x100475738 in fstring_replacement_field_rule parser.c:15626
#10 0x1003eeb98 in fstring_rule parser.c:1334
#11 0x10044b1fc in strings_rule parser.c:15962
#12 0x10041b00c in atom_rule parser.c:14388
#13 0x100427ee4 in t_primary_rule parser.c:18429
#14 0x1004ef890 in single_subscript_attribute_target_rule parser.c:18319
#15 0x1004e75a8 in _tmp_13_rule parser.c:25515
#16 0x1004d8a34 in simple_stmt_rule parser.c:1730
#17 0x1003f4140 in simple_stmts_rule parser.c:1625
#18 0x1003e81f4 in _PyPegen_parse parser.c:41238
#19 0x1003da7b4 in _PyPegen_run_parser pegen.c:825
#20 0x1003dae14 in _PyPegen_run_parser_from_file_pointer pegen.c:897
#21 0x1004fdc6c in _PyParser_ASTFromFile peg_api.c:26
#22 0x1008f0c74 in PyRun_InteractiveOneObjectEx pythonrun.c:240
#23 0x1008efa04 in _PyRun_InteractiveLoopObject pythonrun.c:137
#24 0x1008ef6e8 in _PyRun_AnyFileObject pythonrun.c:72
#25 0x1008f0920 in PyRun_AnyFileExFlags pythonrun.c:104
#26 0x100943acc in Py_RunMain main.c:689
#27 0x1009448d4 in pymain_main main.c:719
#28 0x100944b74 in Py_BytesMain main.c:743
#29 0x1003d5b7c in main python.c:15
previously allocated by thread T0 here:
#0 0x101c06e68 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x42e68) (BuildId: f0a7ac5c49bc3abc851181b6f92b308a32000000200000000100000000000b00)
#1 0x1006520f8 in _PyMem_RawMalloc obmalloc.c:42
#2 0x100654144 in _PyMem_DebugMalloc obmalloc.c:2022
#3 0x100654a04 in PyMem_Malloc obmalloc.c:587
#4 0x10050b620 in tok_nextc tokenizer.c:1198
#5 0x100504848 in tok_get_normal_mode tokenizer.c:1619
#6 0x1005006ac in _PyTokenizer_Get tokenizer.c:2639
#7 0x1003d6810 in _PyPegen_fill_token pegen.c:201
#8 0x1003e71f4 in _PyPegen_parse parser.c:41238
#9 0x1003da7b4 in _PyPegen_run_parser pegen.c:825
#10 0x1003dae14 in _PyPegen_run_parser_from_file_pointer pegen.c:897
#11 0x1004fdc6c in _PyParser_ASTFromFile peg_api.c:26
#12 0x1008f0c74 in PyRun_InteractiveOneObjectEx pythonrun.c:240
#13 0x1008efa04 in _PyRun_InteractiveLoopObject pythonrun.c:137
#14 0x1008ef6e8 in _PyRun_AnyFileObject pythonrun.c:72
#15 0x1008f0920 in PyRun_AnyFileExFlags pythonrun.c:104
#16 0x100943acc in Py_RunMain main.c:689
#17 0x1009448d4 in pymain_main main.c:719
#18 0x100944b74 in Py_BytesMain main.c:743
#19 0x1003d5b7c in main python.c:15
#20 0x18e8dff24 (<unknown module>)
SUMMARY: AddressSanitizer: heap-use-after-free unicodeobject.c:4526 in unicode_decode_utf8
Shadow bytes around the buggy address:
0x007020882650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x007020882660: fa fa fa fa fa fa fa fa fa fa fa fa 00 00 00 05
0x007020882670: fa fa 00 00 00 05 fa fa fd fd fd fd fa fa fd fd
0x007020882680: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa
0x007020882690: fd fd fd fd fa fa 00 00 00 00 fa fa 00 00 00 00
=>0x0070208826a0: fa fa fd fd fd fd fa fa fd fd[fd]fd fa fa fd fd
0x0070208826b0: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa
0x0070208826c0: fd fd fd fd fa fa 00 00 00 00 fa fa 00 00 00 06
0x0070208826d0: fa fa 00 00 00 03 fa fa 00 00 02 fa fa fa 00 00
0x0070208826e0: 00 00 fa fa 00 00 02 fa fa fa 00 00 02 fa fa fa
0x0070208826f0: 00 00 02 fa fa fa 00 00 02 fa fa fa 00 00 00 01
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==8991==ABORTING
zsh: abort ./python.exe