Skip to content

Too many errors on non-breaking space characters in source code #106101

@dtolnay

Description

@dtolnay

Rustc emits a separate error for every single time U+00A0 appears in the source file.

(See #106098 for how someone might very reasonably end up with non-breaking space characters in their source code. Even if that issue gets resolved in rustdoc, I still think rustc's parser needs to handle this better, because non-breaking spaces might be copied in from some other website, or from documentation rendered by older versions of rustdoc.)

My preferred behavior would be that rustc should emit just a single error on the first non-breaking space in the entire file. Then silently interpret every subsequent non-breaking space in the file as an ordinary space.

If that is too tricky, a more conservative change that would still be an improvement would be to emit a single error for a consecutive sequence of non-breaking space characters (i.e. this would typically result in one error per line, instead of one error per space).

Repro:

$ echo -e '\u00a0\u00a0\u00a0\u00a0fn main() {}' | rustc /dev/stdin -o a.out
error: unknown start of token: \u{a0}
 --> /dev/stdin:1:1
  |
1 |     fn main() {}
  | ^
  |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
  |
1 |     fn main() {}
  | +

error: unknown start of token: \u{a0}
 --> /dev/stdin:1:2
  |
1 |     fn main() {}
  |  ^
  |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
  |
1 |     fn main() {}
  |  +

error: unknown start of token: \u{a0}
 --> /dev/stdin:1:3
  |
1 |     fn main() {}
  |   ^
  |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
  |
1 |     fn main() {}
  |   +

error: unknown start of token: \u{a0}
 --> /dev/stdin:1:4
  |
1 |     fn main() {}
  |    ^
  |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
  |
1 |     fn main() {}
  |    +

error: aborting due to 4 previous errors

Metadata

Metadata

Assignees

Labels

A-diagnosticsArea: Messages for errors, warnings, and lintsA-parserArea: The lexing & parsing of Rust source code to an ASTC-bugCategory: This is a bug.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions