Skip to content

make isuppercase and islowercase agree with Unicode standard #36618

@stevengj

Description

@stevengj

Currently, islowercase checks whether a character is in category Ll, Letter: Lowercase, and isuppercase checks for category Lu, Letter: Uppercase or Lt, Letter: Titlecase.

However, it was recently brought to my attention that there are actually official Unicode derived properties called Lowercase and Uppercase which differ from these definitions.

  • Titlecase characters like Dž (U+01c5) are not considered uppercase. (Note that uppercase('Dž') yields a different character 'DŽ', so this makes a certain sense.)
  • Some Lo, Letter: Other characters like ª are included as Lowercase (or Uppercase in other cases like ).

The next version of utf8proc will provide islower and isupper functions compliant with these definitions (JuliaStrings/utf8proc#196), so we may want to switch to them.

(My guess is that it makes little difference in practice — I'm not clear how useful these functions are for general Unicode strings — but the standard here seems fairly sensible. Apparently this is what Python's isupper/islower functions do.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    unicodeRelated to unicode characters and encodings

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions