Closed
Description
This is a tracking issue for the RFC "Allow non-ASCII identifiers" (rust-lang/rfcs#2457).
Steps:
- Implement the RFC (cc @rust-lang/compiler @Manishearth)
- Normalize identifiers to NFC whilst parsing (Normalize ident #66670, Add symbol normalization for proc_macro_server. #67702)
- Ensure that
#![forbid(non_ascii_idents)]
works. (non_ascii_idents
lint (part of RFC 2457) #61883) - Lint:
confusable_idents
(Implementconfusable_idents
lint. #71542, Implement mixed script confusable lint. #72770) - Lint:
less_used_codepoints
uncommon_codepoints
(Implement uncommon_codepoints lint. #67810) - Adjustments to "
bad stylenon_standard_style
" lints. (See Split and expand nonstandard-style lints unicode unit test. #73839) - Lint:
mixed_script_confusables
(Implement mixed script confusable lint. #72770) - Provide reusable crates for above lints and checks on crates.io. (unicode-security)
- Similarly to out-of-line modules (
mod фоо;
), extern crates and paths with a first segment naming a crate should not be able to do filesystem search using those non-ASCII identifiers (i.e. no ,extern crate ьаг;
orму_сгате::baz
). (Disallow loading crates with non-ascii identifier name. #73305) - Disallow using non-ascii identifiers in extern blocks.(Disable using non-ascii identifiers in extern blocks. #83936)
- Adjust documentation (see instructions on forge) (Move non-ascii-idents content from unstable book to reference. reference#999)
- Stabilization PR (see instructions on forge) (Stablize
non-ascii-idents
#83799)
Unresolved questions:
- Which context is adequate for confusable detection: file, current scope, crate?
- Should ZWNJ and ZWJ be allowed in identifiers?
- How are non-ASCII idents best supported in debuggers?
Resolved: DWARF and debuggers handle UTF-8 just fine - Which name mangling scheme is used by the compiler? (Punycode, see RFC2603)
- Is there a better name for the
less_used_codepoints
lint?
Resolved in favour ofuncommon_codepoints
- Which lint should the global mixed scripts confusables detection trigger?
Resolved in favor ofmixed_script_confusables
- How badly do non-ASCII idents exacerbate const pattern confusion
(Statics shadow local variables causing "refutable pattern error", and non-obvious bugs. #7526, We shouldn't even try to resolve irrefutable patterns as constants #49680)?
Can we improve precision of linting here? - In
mixed_script_confusables
, do we actually need to make an exception forLatin
identifiers? - Terminal width is a tricky with unicode. Some characters are long, some have lengths dependent on the fonts installed (e.g. emoji sequences), and modifiers are a thing. The concept of monospace font doesn't generalize to other scripts as well. How does rustfmt deal with this when determining line width?
- right-to-left scripts can lead to weird rendering in mixed contexts (depending on the software used), especially when mixed with operators. This is not something that should block stabilization, however we feel it is important to explicitly call out. Future RFCs (preferably put forth by RTL-using communities) may attempt to improve this situation (e.g. by allowing bidi control characters in specific contexts).
- Tweak
XID_Start
/XID_Continue
? XID_Start / XID_Continue might not be quite right #4928http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1518.htm
The ISO JTC1/SC22/WG14 (C language) think that possibly UTR#31 didn't quite hit the nail on the head in terms of defining identifier syntax. They have a couple tweaks in mind. Consider following their lead.
zulip channel topic for real-time discussion:
https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/nonascii.20identifiers(rfc.202457)
Metadata
Metadata
Assignees
Labels
Blocker: Approved by a merged RFC but not yet implemented.Blocker: Approved by a merged RFC and implemented but not stabilized.Blocker: Implemented in the nightly compiler and unstable.Category: An issue tracking the progress of sth. like the implementation of an RFC`#![feature(non_ascii_idents)]`Relevant to the language team, which will review and decide on the PR/issue.This issue / PR is in PFCP or FCP with a disposition to merge it.The final comment period is finished for this PR / Issue.