Skip to content

Unicode normalisation across apertium tools #24

@flammie

Description

@flammie

It seems to me that good portion of apertium IRC traffic is people checking on unicode character variants like:

10:43 +spectie> .u ô
10:43  begiak> U+006F LATIN SMALL LETTER O (o)
10:43  begiak> U+0302 COMBINING CIRCUMFLEX ACCENT (âWL̂)
10:43 +spectie> .u ô
10:43  begiak> U+00F4 LATIN SMALL LETTER O WITH CIRCUMFLEX (ô)

I think this is something that the tools should take care of somehow, I'd suggest NFC normalization for all input, perhaps with a warning in compiler type tools. NFC is the nicest for most FSA letter automata. If agreed this might be a good starter task for gsoc candidates?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions