-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
Currently, islowercase checks whether a character is in category Ll, Letter: Lowercase, and isuppercase checks for category Lu, Letter: Uppercase or Lt, Letter: Titlecase.
However, it was recently brought to my attention that there are actually official Unicode derived properties called Lowercase and Uppercase which differ from these definitions.
- Titlecase characters like
Dž(U+01c5) are not considered uppercase. (Note thatuppercase('Dž')yields a different character'DŽ', so this makes a certain sense.) - Some Lo, Letter: Other characters like
ªare included as Lowercase (or Uppercase in other cases likeⒶ).
The next version of utf8proc will provide islower and isupper functions compliant with these definitions (JuliaStrings/utf8proc#196), so we may want to switch to them.
(My guess is that it makes little difference in practice — I'm not clear how useful these functions are for general Unicode strings — but the standard here seems fairly sensible. Apparently this is what Python's isupper/islower functions do.)