Closed
Description
The Chars
iterator iterates over Unicode Scalar Values, but when people think about "characters" they usually mean something closer to what Unicode calls "grapheme clusters".
This leads to surprising results and thus potential errors, for example:
"éé".chars().count() == 3
"🇺🇸".chars().count() == 2
I suggest adding a warning to the documentation that this iterator isn't iterating over "characters", and that users should consider using UnicodeSegmentation::graphemes
iterator instead.
Metadata
Metadata
Assignees
Labels
No labels