The stdlib tokenize
module does not properly roundtrip. This wrapper
around the stdlib provides two additional tokens ESCAPED_NL
and
UNIMPORTANT_WS
, and a Token
data type. Use src_to_tokens
and
tokens_to_src
to roundtrip.
This library is useful if you're writing a refactoring tool based on the python tokenization.
pip install tokenize-rt
A token offset, useful as a key when cross referencing the ast
and the
tokenized source.
Construct a token
name
: one of the token names listed intoken.tok_name
orESCAPED_NL
orUNIMPORTANT_WS
src
: token's source as textline
: the line number that this token appears on.utf8_byte_offset
: the utf8 byte offset that this token appears on in the line.
Retrieves an Offset
for this token.
A frozenset
containing tokens which may appear between others while not
affecting control flow or code:
COMMENT
ESCAPED_NL
NL
UNIMPORTANT_WS
parse a string literal into its prefix and string content
>>> parse_string_literal('f"foo"')
('f', '"foo"')
yields (index, token)
pairs. Useful for rewriting source.
find the indices of the string parts of a (joined) string literal
i
should start at the end of the string literal- returns
()
(an empty tuple) for things which are not string literals
>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)
tokenize-rt
addsESCAPED_NL
for a backslash-escaped newline "token"tokenize-rt
addsUNIMPORTANT_WS
for whitespace (discarded intokenize
)tokenize-rt
normalizes string prefixes, even if they are not parsed -- for instance, this means you'll seeToken('STRING', "f'foo'", ...)
even in python 2.tokenize-rt
normalizes python 2 long literals (4l
/4L
) and octal literals (0755
) in python 3 (for easier rewriting of python 2 code while running python 3).