Skip to content

Lexer accidentally(?) does not use is_ascii_whitespace for literal whitespace in string continuations #136600

Open
@hkBst

Description

@hkBst

#108403 proposed to fix this, but it was claimed that the current behavior was documented in the reference in this comment. Incorrectly, as far as I can see, as that page only describes whitespace escapes as being \r, \t, and \n and the fix was about literal whitespace in string continuations. Now https://doc.rust-lang.org/reference/expressions/literal-expr.html#string-continuation-escapes does describe this behavior, but this was added later in Jan 2024. Indeed, this PR shows the reference documented skipping all whitespace, until Jun 13, 2022.

Current behavior has this ui test. It seems like this behavior was once implemented like it is now, then got claimed to be canon then got documented as canon. Anyway, I'm not sure why not all unicode whitespace is skipped, but just almost all ascii whitespace, but it seems important to pick an existing whitespace set, instead of using an old bad manual implementation of is_ascii_whitespace...

Perhaps we can see a crater run at least...

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-parserArea: The lexing & parsing of Rust source code to an ASTC-bugCategory: This is a bug.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.T-langRelevant to the language team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions