Separate to_lowercase() into correct Unicode and simple implementations

I think there are two distinct use cases for string lowercasing:
1. to display a lowercased string to a user
2. to manipulate strings in string algorithms (e.g. building a "case-insensitive" trie or other kind of index. Only having Unicode-aware case-insensitive comparison function is often not enough.)

Currently the locale-unaware `to_lowercase` tries to do both, but doesn't do either one quite right. It isn't quite correct for the first case (it handles Greek #26035, but doesn't handle Turkish), and it's quirky which makes it difficult to be used [safely](https://labs.spotify.com/2013/06/18/creative-usernames/) in the second case.

Therefore I suggest splitting this function into two, e.g., `to_locale_lowercase(locale)` and `to_partial_lowercase()`: one that fully implements Unicode (requires locale specified and is good for displaying strings to people), and another which is incorrect in many cases, shouldn't be displayed to users, but preserves simple invariants of ASCII lowercasing that make it useful and safe for algorithms that need code-point-wise lowercasing.

The partial implementation should meet invariants for every valid string `a` and `b`:

```
lower(a) == lower(upper(a)) // No ß/SS
lower(a) == lower(lower(a))
lower(a) == lower(b) <=> upper(a) == upper(b)
lower(a + b) == lower(a) + lower(b) // No Σ/σ/ς
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separate to_lowercase() into correct Unicode and simple implementations #26244

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Separate to_lowercase() into correct Unicode and simple implementations #26244

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions