Skip to content

canonicalize unicode identifiers #5434

Closed
@stevengj

Description

@stevengj

As discussed on the mailing list, It is very confusing that

const μ = 3
µ + 1

throws a µ not defined exception (because unicode codepoints 0x00b5 and 0x03bc are rendered almost identically). This could easily be encountered in real usage because option-m on a Mac produces 0x00b5 ("micro sign"), which is different from 0x03bc ("Greek small letter mu").

It would be good if Julia internally stored a table of easily confused Unicode codepoints, i.e. homoglyphs, and used them to help prevent these sorts of confusions. Three possibilities are:

  • foo not defined exceptions could check whether a homograph of foo is defined and let the user know if so.
  • Julia could issue a warning if a non-canonical homoglyph is used in an identifier.
  • Simply canonicalize all homoglyphs in identifiers (so the users can type them any way they want, but they are treated as equivalent identifiers).

My preference would be for the third option. I don't see any useful purpose being served by treating μ and µ as distinct identifiers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs decisionA decision on this change is neededunicodeRelated to unicode characters and encodings

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions