Skip to content

Tracking issue for RFC 2457, "Allow non-ASCII identifiers" #55467

Closed
@Centril

Description

@Centril

This is a tracking issue for the RFC "Allow non-ASCII identifiers" (rust-lang/rfcs#2457).

Steps:

Unresolved questions:

  • Which context is adequate for confusable detection: file, current scope, crate?
  • Should ZWNJ and ZWJ be allowed in identifiers?
  • How are non-ASCII idents best supported in debuggers?
    Resolved: DWARF and debuggers handle UTF-8 just fine
  • Which name mangling scheme is used by the compiler? (Punycode, see RFC2603)
  • Is there a better name for the less_used_codepoints lint?
    Resolved in favour of uncommon_codepoints
  • Which lint should the global mixed scripts confusables detection trigger?
    Resolved in favor of mixed_script_confusables
  • How badly do non-ASCII idents exacerbate const pattern confusion
    (Statics shadow local variables causing "refutable pattern error", and non-obvious bugs. #7526, We shouldn't even try to resolve irrefutable patterns as constants #49680)?
    Can we improve precision of linting here?
  • In mixed_script_confusables, do we actually need to make an exception for Latin identifiers?
  • Terminal width is a tricky with unicode. Some characters are long, some have lengths dependent on the fonts installed (e.g. emoji sequences), and modifiers are a thing. The concept of monospace font doesn't generalize to other scripts as well. How does rustfmt deal with this when determining line width?
  • right-to-left scripts can lead to weird rendering in mixed contexts (depending on the software used), especially when mixed with operators. This is not something that should block stabilization, however we feel it is important to explicitly call out. Future RFCs (preferably put forth by RTL-using communities) may attempt to improve this situation (e.g. by allowing bidi control characters in specific contexts).
  • Tweak XID_Start / XID_Continue? XID_Start / XID_Continue might not be quite right #4928

    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1518.htm

    The ISO JTC1/SC22/WG14 (C language) think that possibly UTR#31 didn't quite hit the nail on the head in terms of defining identifier syntax. They have a couple tweaks in mind. Consider following their lead.


zulip channel topic for real-time discussion:
https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/nonascii.20identifiers(rfc.202457)

Metadata

Metadata

Assignees

No one assigned

    Labels

    B-RFC-approvedBlocker: Approved by a merged RFC but not yet implemented.B-RFC-implementedBlocker: Approved by a merged RFC and implemented but not stabilized.B-unstableBlocker: Implemented in the nightly compiler and unstable.C-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFCF-non_ascii_idents`#![feature(non_ascii_idents)]`T-langRelevant to the language team, which will review and decide on the PR/issue.disposition-mergeThis issue / PR is in PFCP or FCP with a disposition to merge it.finished-final-comment-periodThe final comment period is finished for this PR / Issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions