Skip to content

Consider linting against 00B7 aka interpunct aka middle dot #120797

Open
@pnkfelix

Description

Code

#![allow(dead_code)]
#![deny(uncommon_codepoints)]
const COL·LECCIÓ: () = ();// This is Catalan

// The below is not allowed by the lexer today...
// const ·START: () = ();

// ... but this is allowed today ...
const MID·DLE: () = ();

// ... and this is also allowed today
const END·: () = ();


fn main() {
println!("{}", r#"
COL·LECCIÓ
·START
MID·DLE
END·
"#)
}

Current output

COL·LECCIÓ
·START
MID·DLE
END·

but note that visual of the first line is font-dependent, in terms of how the columns of a fixed-width font line up; the playpen collapses the L·L into a single glyph that occupies one character width.

Desired output

I'm not certain. I just want to make sure we follow-up on PR #120695

The options I see are either:

  1. Leave things as they are (00B7 is hard-rejected as an initial character, and silently accepted in all other contexts)
  2. Adopt something like what was proposed in PR uncommon_codepoints: lint against 00B7 MIDDLE DOT in final position #120695: continue hard-rejecting 00B7 as an initial character; lint against its occurrence as a final character, and silently accept it as a "medial" character
  3. Something more aggressive than PR uncommon_codepoints: lint against 00B7 MIDDLE DOT in final position #120695, like linting against 00B7 in all contexts (except perhaps when it occurs in between two L's, to accommodate Catalan, as suggested by Manish here)
  4. Other options? (We probably don't get any benefit from deviating far from Unicode committee recommendations, so we probably do not want to start accepting 00B7 as an initial character)

Rationale and extra context

No response

Other cases

No response

Rust Version

Stable channel

Build using the Stable version: 1.76.0

Anything else?

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    A-diagnosticsArea: Messages for errors, warnings, and lintsT-langRelevant to the language team, which will review and decide on the PR/issue.disposition-postponeThis issue / PR is in PFCP or FCP with a disposition to postpone it.finished-final-comment-periodThe final comment period is finished for this PR / Issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions