Skip to content

Word-final sigma in str::to_lowercase #26035

Closed
@SimonSapin

Description

@SimonSapin

By design, str::to_lowercase and str::to_uppercase do not depend on the language of the text (which shouldn’t be assumed to be the same as the locale of the machine running the program).

Mostly, this means ignoring the conditional mappings in Unicode’s SpecialCasing.txt, with one exception: the greek letter Sigma is Σ in upper-case and σ in lower-case except in word-final position, where it is ς. The corresponding mapping in SpecialCasing.txt is:

# <code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment>
03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA

With Final_Sigma defined in the Unicode standard:

C is preceded by a sequence consisting of a cased letter and then zero or more case-ignorable characters, and C is not followed by a sequence consisting of zero or more case-ignorable characters and then a cased letter.

(cased letter and other terms have a precise definition given beforehand.)

Since char::to_lowercase doesn’t know context, I think it should just return σ for Σ. But str::to_lowercase does have context and could implement this conditional mapping.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions