Should character classes be mutually exclusive? #455
-
|
Character classes in the current jlreq are not mutually exclusive.
Is this intentional? If so, e.g., in Table 1 Spacing between characters, which class should take precedence if the character hits multiple rows/columns? Background: Unicode draft #59: East Asian Spacing suggests different treatment of |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Thanks for clarifying the problem. jlreq character classes are indeed not mutually exclusive. A Twitter user directed me to §3.9.2 aa, “Western characters (cl-27),” which states that “several marks are used both in a Japanese context and a Western context.” This could suggest that the ‘%’ symbol belongs to cl-13 in a Japanese context and cl-27 in a Western context. jlreq stops here, but in practice ‘%’ (U+FF05) is used in a Japanese context, while ‘%’ (U+0025) is used in a Western context. So the simple solution is to treat ‘%’ (U+0025) as a Western character (cl-27) and insert a quarter-em space between it and most Japanese characters, and treat ‘%’ (U+FF05) as a Japanese postfixed abbreviation (cl-13). For ‘‰’ (U+2030), which is also both cl-13 and cl-27, there is only one Unicode code point, and no such simple solution exists. YDX-2147483647 lists other examples. My intuition is that ‘‰’ is a Western character by default, ‘,’ and ‘!’ are a comma and a punctuation mark, but I hope users can set each character's class by some means. |
Beta Was this translation helpful? Give feedback.
Thanks for clarifying the problem. jlreq character classes are indeed not mutually exclusive. A Twitter user directed me to §3.9.2 aa, “Western characters (cl-27),” which states that “several marks are used both in a Japanese context and a Western context.” This could suggest that the ‘%’ symbol belongs to cl-13 in a Japanese context and cl-27 in a Western context. jlreq stops here, but in practice ‘%’ (U+FF05) is used in a Japanese context, while ‘%’ (U+0025) is used in a Western context.
So the simple solution is to treat ‘%’ (U+0025) as a Western character (cl-27) and insert a quarter-em space between it and most Japanese characters, and treat ‘%’ (U+FF05) as a Japanese postfixed abbre…