Closed
Description
Bug report
Bug description:
Code which contains line breaks is not round-trip invariant:
import tokenize, io
source_code = r"""
1 + \
2
"""
tokens = list(tokenize.generate_tokens(io.StringIO(source_code).readline))
x = tokenize.untokenize(tokens)
print(x)
# 1 +\
# 2
Notice that the space between +
and \
is now missing. The current tokenizer code simply inserts a backslash when it encounters two subsequent tokens with a differeing row offset:
Lines 179 to 182 in 9c2bb7d
I think this should be fixed. The docstring of tokenize.untokenize
says:
Round-trip invariant for full input:
Untokenized source will match input source exactly
To fix this, it will probably be necessary to inspect the raw line contents and count how much whitespace there is at the end of the line.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux