Skip to content

ACP: Expose rustc_lexer::unescape Functionality in the proc_macro Crate for Standardized Literal Parsing #459

Open
@lucarlig

Description

@lucarlig

Proposal


Problem Statement

Currently, proc-macros that handle string literals receive raw strings with escape sequences and surrounding quotes. For example:

#[my_macro]
#[my_attr("\u{x78} blabla")]
pub struct B;

In the my_attr proc-macro, the received value is "\u{x78} blabla", including escape sequences and quotes, instead of the parsed equivalent ("x blabla"). This makes working with string literals cumbersome, as proc-macro authors need to reimplement unescape logic that already exists within the Rust compiler.

Motivating Examples or Use Cases

  • Simplifying syn Library: Libraries like syn need to manually reimplement string literal unescaping. Having the unescape functionality available in the proc_macro crate would allow syn::LitStr::value() to use the standardized unescape function directly, leading to simplified and more reliable code.

  • Consistency Across Tools: The Rust compiler already provides unescape functionality in rustc_lexer::unescape. Making this available publicly would ensure that tools and proc-macros handle escape sequences consistently.

  • Reducing Code Duplication: Many proc-macro authors currently need to implement their own logic to handle escape sequences, resulting in duplicated code and potential inconsistencies. Exposing the compiler's unescape functionality would reduce redundancy.

Solution Sketch

  • Expose Unescape Functionality in proc_macro Crate: The unescape functionality from rustc_lexer::unescape should be exposed in the proc_macro crate, making it accessible for use in proc-macros.

  • Public API for Literal Processing: A new API can be added to the proc_macro crate that allows developers to parse and unescape string literals in an ergonomic and standardized way. This would significantly simplify the process of handling string literals in attributes and proc-macros.

Alternatives

  • Reimplement in Libraries: The current approach is for libraries like syn to reimplement the unescape logic. This is not ideal due to code duplication, maintenance burdens, and the potential for inconsistencies.

  • External Crate: Instead of adding the unescape functionality to the proc_macro crate, another option would be to create an external crate. However, considering that this functionality is tied to parsing Rust literals, adding it to the standard library seems more suitable.

  • Leave as Is: Another alternative is to continue requiring proc-macro authors to implement their own unescape logic. However, this is not desirable due to the associated complexity and inconsistency.

Additional Considerations

  • Extend to All Literals: Extending this unescape functionality to all literal types, such as C-strings, integers, and floats, would improve consistency across different literal types and make parsing easier for proc-macro authors working with diverse literals.

  • Refactoring to Work Outside Compiler: The proc_macro crate is being refactored to work even when run outside of the compiler. Therefore, the unescape functionality should be implemented in a way that does not depend on the compiler being available. This means making the unescape logic sufficiently library-agnostic so it can be used independently of the compiler context.

  • Library-First Approach: The unescape function can likely be developed in a library-agnostic way to avoid code duplication. This suggests an opportunity to make the unescape function reusable, without relying on tight coupling with compiler internals, and making it broadly available.

Links and Related Work

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions