Open
Description
Proc macros operate on tokens, including string/character/byte-string/byte literal tokens, which they can get from various sources.
- Source 1: Lexer.
This is the most reliable source, the token is passed to a macro precisely like it was written in source code.
"C"
will be passed as"C"
, but the same C in escaped form"\x43"
will be passed as"\x43"
.
Proc macros can observe the difference becauseToString
(the only way to get the literal contents in proc macro API) also prints the literal precisely. - Source 2: Proc macro API.
Literal::string(s: &str)
will make you a string literal containing datas
, approximately.
The precise token (returned byToString
) will contain:escape_debug(s)
for string literals (Literal::string
)escape_unicode(s)
for character literals (Literal::character
)escape_default(s)
for byte string literals (Literal::byte_string
)
- Source 3: Recovered from non-attribute AST
AST goes through pretty-printing first, then re-tokenized.
The precise token (returned byToString
) will contain:- precise
s
for raw AST strings escape_debug(s)
for non-raw AST stringsescape_default(s)
for AST characters, bytes and byte strings (both raw and non-raw)
- precise
- Source 4: Recovered from attribute AST
Just an ad-hoc recovery without pretty-printing.
The precise token (returned byToString
) will contain:- precise
s
for raw AST strings escape_default(s)
for non-raw AST strings, AST characters, bytes and byte strings (both raw and non-raw)
- precise
EDIT: Also doc comments go through escape_debug
when converted to #[doc = "content"]
tokens for proc macros.
It would be nice to
- Figure out what escaping we actually want (perhaps none?) and document the motivation behind the escaping choices.
- Get rid of the escaping differences between token sources, so that at least literals of the same kind are escaped identically.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment