Skip to content

Conformance with UAX #31 & UTS #55 #847

Closed
@eemeli

Description

@eemeli

It would probably be a good idea for us to be conformant with both UAX #31 and UTS #55. We're already in #673 and the bidi-usability design doc working on some of the missing pieces, but we should do a more thorough review and provide explicit conformance statements somewhere.

When reviewing these, one part that I noticed us missing is UAX31-R4:

Equivalent Normalized Identifiers: To meet this requirement, an implementation shall specify the Normalization Form and shall provide a precise specification of the characters that are excluded from normalization, if any. [...] Except for identifiers containing excluded characters, any two identifiers that have the same Normalization Form shall be treated as equivalent by the implementation.

This is specifically recommended in UTS 55:

It is recommended that all languages that use default identifiers meet requirement UAX31-R4 Equivalent Normalized Identifiers, with the normalization described in this section. [...] Case-sensitive computer languages should meet requirement UAX31-R4 with normalization form C. They should not ignore default ignorable code points in identifier comparison.

The easiest way to be conformant with that would be to normalize indentifiers with form C before comparing them, such that e.g. tämä and tämä are considered equal ("this" in Finnish, normalized with forms C & D respectively).

I'm not completely certain whether further changes would be needed for other conformance requirements.

Metadata

Metadata

Assignees

Labels

Action-ItemAction item assigned by the WGLDML46.1MF2.0 Draft Candidateblocker-candidateThe submitter thinks this might be a block for the next releasenormativeIssue affects normative text in the specificationsyntaxIssues related with syntax or ABNF

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions