make isuppercase and islowercase agree with Unicode standard

Currently, `islowercase` checks whether a character is in category *Ll, Letter: Lowercase*, and `isuppercase` checks for category *Lu, Letter: Uppercase* or *Lt, Letter: Titlecase*.

However, it was recently brought to my attention that there are actually official Unicode [derived properties](https://unicode.org/reports/tr44/#Derived_Props) called *Lowercase* and *Uppercase* which differ from these definitions.

* Titlecase characters like `ǅ` (U+01c5) are not considered uppercase.  (Note that `uppercase('ǅ')` yields a different character `'Ǆ'`, so this makes a certain sense.)
* Some *Lo, Letter: Other* characters like `ª` are included as *Lowercase* (or *Uppercase* in other cases like `Ⓐ`).

The next version of utf8proc will provide `islower` and `isupper` functions compliant with these definitions (https://github.com/JuliaStrings/utf8proc/pull/196), so we may want to switch to them.

(My guess is that it makes little difference in practice — I'm not clear how useful these functions are for general Unicode strings — but the standard here seems fairly sensible.   Apparently this is what Python's isupper/islower functions do.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

make isuppercase and islowercase agree with Unicode standard #36618

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

make isuppercase and islowercase agree with Unicode standard #36618

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions