Skip to content

improve character category predicates #5939

Closed
@stevengj

Description

As @jiahao suggested in #5576, it might be worthwhile to use utf8proc (which we are shipping with Julia anyway) to provide functions like isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, and possibly isblank in string.jl. The reason is that utf8proc seems to be more up-to-date on the Unicode standard than libc, and is unhampered by legacy issues (e.g. isblank returns false for a non-breaking space, apparently for legacy reasons).

utf8proc's results are also locale-independent. This may be a plus or a minus; I don't really understand how the locale affects the results of the abovementioned predicates in libc.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    breakingThis change will break codeneeds decisionA decision on this change is neededunicodeRelated to unicode characters and encodings

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions