-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
enhancementNew feature or requestNew feature or request
Description
It seems to me that good portion of apertium IRC traffic is people checking on unicode character variants like:
10:43 +spectie> .u ô
10:43 begiak> U+006F LATIN SMALL LETTER O (o)
10:43 begiak> U+0302 COMBINING CIRCUMFLEX ACCENT (âWL̂)
10:43 +spectie> .u ô
10:43 begiak> U+00F4 LATIN SMALL LETTER O WITH CIRCUMFLEX (ô)
I think this is something that the tools should take care of somehow, I'd suggest NFC normalization for all input, perhaps with a warning in compiler type tools. NFC is the nicest for most FSA letter automata. If agreed this might be a good starter task for gsoc candidates?
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request