Skip to content

LC_CTYPE incorrectly references case sensitivity of "the functions of module string" #111276

Closed
@glyph

Description

@glyph

Documentation

https://docs.python.org/3.12/library/locale.html#locale.LC_CTYPE says:

Locale category for the character type functions. Depending on the settings of this category, the functions of module string dealing with case change their behaviour.

I believe this is referring to Python 2.7's 'string.lower et. al., which have been gone for quite some time. I think since Python 3.3 unicode case-conversion functions have quite intentionally been locale-independent.

Confusion about this issue seems pervasive, even in CPython itself; consider this bit of code with a somewhat misleading comment:

#The map below appears to be trivially lowercasing the key. However,
#there's more to it than meets the eye - in some locales, lowercasing
#gives unexpected results. See SF #1524081: in the Turkish locale,
#"INFO".lower() != "info"
priority_map = {
"DEBUG" : "debug",
"INFO" : "info",
"WARNING" : "warning",
"ERROR" : "error",
"CRITICAL" : "critical"
}

So it would be good to clean up the docs. Earlier in the same document it does say:

There is no way to perform case conversions and character classifications according to the locale. For (Unicode) text strings these are done according to the character value only, while for byte strings, the conversions and classifications are done according to the ASCII value of the byte, and bytes whose high bit is set (i.e., non-ASCII bytes) are never converted or considered part of a character class such as letter or whitespace.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation in the Doc dir

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions