Description
It would probably be a good idea for us to be conformant with both UAX #31 and UTS #55. We're already in #673 and the bidi-usability design doc working on some of the missing pieces, but we should do a more thorough review and provide explicit conformance statements somewhere.
When reviewing these, one part that I noticed us missing is UAX31-R4:
Equivalent Normalized Identifiers: To meet this requirement, an implementation shall specify the Normalization Form and shall provide a precise specification of the characters that are excluded from normalization, if any. [...] Except for identifiers containing excluded characters, any two identifiers that have the same Normalization Form shall be treated as equivalent by the implementation.
This is specifically recommended in UTS 55:
It is recommended that all languages that use default identifiers meet requirement UAX31-R4 Equivalent Normalized Identifiers, with the normalization described in this section. [...] Case-sensitive computer languages should meet requirement UAX31-R4 with normalization form C. They should not ignore default ignorable code points in identifier comparison.
The easiest way to be conformant with that would be to normalize indentifiers with form C before comparing them, such that e.g. tämä
and tämä
are considered equal ("this" in Finnish, normalized with forms C & D respectively).
I'm not completely certain whether further changes would be needed for other conformance requirements.