Description
Proposal
Problem Statement
Currently, proc-macros that handle string literals receive raw strings with escape sequences and surrounding quotes. For example:
#[my_macro]
#[my_attr("\u{x78} blabla")]
pub struct B;
In the my_attr
proc-macro, the received value is "\u{x78} blabla"
, including escape sequences and quotes, instead of the parsed equivalent ("x blabla"
). This makes working with string literals cumbersome, as proc-macro authors need to reimplement unescape logic that already exists within the Rust compiler.
Motivating Examples or Use Cases
-
Simplifying
syn
Library: Libraries likesyn
need to manually reimplement string literal unescaping. Having theunescape
functionality available in theproc_macro
crate would allowsyn::LitStr::value()
to use the standardized unescape function directly, leading to simplified and more reliable code. -
Consistency Across Tools: The Rust compiler already provides unescape functionality in
rustc_lexer::unescape
. Making this available publicly would ensure that tools and proc-macros handle escape sequences consistently. -
Reducing Code Duplication: Many proc-macro authors currently need to implement their own logic to handle escape sequences, resulting in duplicated code and potential inconsistencies. Exposing the compiler's unescape functionality would reduce redundancy.
Solution Sketch
-
Expose Unescape Functionality in
proc_macro
Crate: The unescape functionality fromrustc_lexer::unescape
should be exposed in theproc_macro
crate, making it accessible for use in proc-macros. -
Public API for Literal Processing: A new API can be added to the
proc_macro
crate that allows developers to parse and unescape string literals in an ergonomic and standardized way. This would significantly simplify the process of handling string literals in attributes and proc-macros.
Alternatives
-
Reimplement in Libraries: The current approach is for libraries like
syn
to reimplement the unescape logic. This is not ideal due to code duplication, maintenance burdens, and the potential for inconsistencies. -
External Crate: Instead of adding the unescape functionality to the
proc_macro
crate, another option would be to create an external crate. However, considering that this functionality is tied to parsing Rust literals, adding it to the standard library seems more suitable. -
Leave as Is: Another alternative is to continue requiring proc-macro authors to implement their own unescape logic. However, this is not desirable due to the associated complexity and inconsistency.
Additional Considerations
-
Extend to All Literals: Extending this unescape functionality to all literal types, such as C-strings, integers, and floats, would improve consistency across different literal types and make parsing easier for proc-macro authors working with diverse literals.
-
Refactoring to Work Outside Compiler: The
proc_macro
crate is being refactored to work even when run outside of the compiler. Therefore, the unescape functionality should be implemented in a way that does not depend on the compiler being available. This means making the unescape logic sufficiently library-agnostic so it can be used independently of the compiler context. -
Library-First Approach: The unescape function can likely be developed in a library-agnostic way to avoid code duplication. This suggests an opportunity to make the unescape function reusable, without relying on tight coupling with compiler internals, and making it broadly available.