Skip to content

Character and string token definitions need updating. #626

Open
@ehuss

Description

@ehuss

There are multiple issues here. Some of this has changed in 1.37 via rust-lang/rust#60793.

  • RAW_BYTE_STRING_LITERAL no longer allows bare CR (new 1.37). Input format #1459

  • "Raw string" and "raw byte string" needs to be updated that CRLF is converted to LF (new 1.37). Input format #1459

  • Several tokens need to sync the English text with the "Lexer" definition.

    • STRING_LITERAL indicates several rules (like isolated CR's are not allowed), but the text does not mention any of those restrictions.
    • CHAR_LITERAL says "single Unicode character…except U+0027" which is not complete.
    • RAW_STRING_LITERAL does not allow bare CR's.
    • BYTE_LITERAL escapes are not described.
    • BYTE_STRING_LITERAL restrictions are not described.
    • In general, just make sure they are all in sync!
  • Typo in RAW_BYTE_STRING_CONTENT, points to RAW_STRING_CONTENT when it should be RAW_BYTE_STRING_CONTENT. Fixes minor errors #818

  • I cannot find anywhere that mentions CRLF in a string is converted to LF. Am I blind? Input format #1459

  • The description for string continuations says "\ immediately before U+000A", but it can also be before CRLF. How should this be handled? I haven't looked at how it is implemented, but are all CRLF's translated everywhere? Should there just be a blanket statement somewhere about this, to avoid having to discuss it in every string literal definition? Input format #1459

I may be missing some things here. Need to very thoroughly review everything to make sure it is correct and up-to-date with the changes from 60793.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-lexerArea: Lexical specification

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions