Skip to content

Inconsistent literal escaping in proc macros #60495

Open

Description

Proc macros operate on tokens, including string/character/byte-string/byte literal tokens, which they can get from various sources.

  • Source 1: Lexer.
    This is the most reliable source, the token is passed to a macro precisely like it was written in source code.
    "C" will be passed as "C", but the same C in escaped form "\x43" will be passed as "\x43".
    Proc macros can observe the difference because ToString (the only way to get the literal contents in proc macro API) also prints the literal precisely.
  • Source 2: Proc macro API.
    Literal::string(s: &str) will make you a string literal containing data s, approximately.
    The precise token (returned by ToString) will contain:
    • escape_debug(s) for string literals (Literal::string)
    • escape_unicode(s) for character literals (Literal::character)
    • escape_default(s) for byte string literals (Literal::byte_string)
  • Source 3: Recovered from non-attribute AST
    AST goes through pretty-printing first, then re-tokenized.
    The precise token (returned by ToString) will contain:
    • precise s for raw AST strings
    • escape_debug(s) for non-raw AST strings
    • escape_default(s) for AST characters, bytes and byte strings (both raw and non-raw)
  • Source 4: Recovered from attribute AST
    Just an ad-hoc recovery without pretty-printing.
    The precise token (returned by ToString) will contain:
    • precise s for raw AST strings
    • escape_default(s) for non-raw AST strings, AST characters, bytes and byte strings (both raw and non-raw)

EDIT: Also doc comments go through escape_debug when converted to #[doc = "content"] tokens for proc macros.

It would be nice to

  • Figure out what escaping we actually want (perhaps none?) and document the motivation behind the escaping choices.
  • Get rid of the escaping differences between token sources, so that at least literals of the same kind are escaped identically.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

A-frontendArea: Compiler frontend (errors, parsing and HIR)A-macrosArea: All kinds of macros (custom derive, macro_rules!, proc macros, ..)A-proc-macrosArea: Procedural macrosC-bugCategory: This is a bug.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions