Skip to content

Lexer chokes on certain kinds of whitespace #29590

Closed
@catern

Description

@catern

The presence in a Rust source file of unusual but useful kinds of whitespace, such as ASCII 0x0C (form feed), leads to the following error:

src/main.rs:1:1: 1:2 error: unknown start of token: \u{c}
src/main.rs:1 
              ^

I have a specific use case for form-feeds in source files. But I think in general it is nice to ignore the same whitespace that every other programming language and file format ignores; it lessens confusion for people coming from other languages and backgrounds.

My specific use case is the long-standing, but somewhat uncommon use of the form-feed character (which semantically is a separator between pages of text) as a way to group together especially closely related functions or blocks in a file of source code. Text editors or IDEs such as vim, Emacs or XCode provide convenience features to display these form-feeds in aesthetically pleasing way, move between form-feed-delimited pages, and restrict editing to one form-feed-delimited page at a time. It's just a simple convenience feature, but it would really be nice to support it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-parserArea: The lexing & parsing of Rust source code to an ASTT-langRelevant to the language team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions