Skip to content

uncommon_codepoints is only checked post-NFC #120697

Open

Description

Code

#![forbid(uncommon_codepoints)]
pub const L·L: u32 = 7;

Current output

(compiles successfully)

Desired output

error: identifier contains an uncommon Unicode codepoint: '·'
 --> src/lib.rs:2:11
  |
2 | pub const L·L: u32 = 7;
  |           ^^^
  |

Rationale and extra context

The · in the above code snippet is U+0387 GREEK ANO TELEIA, which has an Identifier_Status of Restricted and should therefore trigger the uncommon_codepoints lint. However, U+0387 has an NFC decomposition to U+00B7 ( · ) MIDDLE DOT, which has an Identifier_Status of Allowed, and is therefore not flagged by the lint. Because the compiler applies NFC normalization to identifiers before checking uncommon_codepoints, the lint incorrectly fails to fire in this case.

Rust Version

rustc 1.75.0 (82e1608df 2023-12-21)
binary: rustc
commit-hash: 82e1608dfa6e0b5569232559e3d385fea5a93112
commit-date: 2023-12-21
host: x86_64-unknown-linux-gnu
release: 1.75.0
LLVM version: 17.0.6

@rustbot label A-unicode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    A-diagnosticsArea: Messages for errors, warnings, and lintsA-lintArea: Lints (warnings about flaws in source code) such as unused_mut.A-unicodeArea: UnicodeC-bugCategory: This is a bug.L-uncommon_codepointsLint: uncommon_codepointsT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions