Should character classes be mutually exclusive? #455

YDX-2147483647 · 2025-02-12T06:03:29Z

YDX-2147483647
Feb 12, 2025

Character classes in the current jlreq are not mutually exclusive.

% (U+0025) ∈ Postfixed abbreviations (cl-13) ∩ Western characters (cl-27).
! (U+0021) ∈ Dividing punctuation marks (cl-04) ∩ Western characters (cl-27).
， (U+002C) ∈ Commas (cl-07) ∩ Grouped numerals (cl-24) ∩ Western characters (cl-27).
…

Is this intentional? If so, e.g., in Table 1 Spacing between characters, which class should take precedence if the character hits multiple rows/columns?

Background: Unicode draft #59: East Asian Spacing suggests different treatment of 增加20%以后 (Chinese) and 進捗は20%です (Japanese). On Typst forum, we found % belongs to both cl-13 and cl-27.

Answered by okumuralab

Feb 12, 2025

Thanks for clarifying the problem. jlreq character classes are indeed not mutually exclusive. A Twitter user directed me to §3.9.2 aa, “Western characters (cl-27),” which states that “several marks are used both in a Japanese context and a Western context.” This could suggest that the ‘%’ symbol belongs to cl-13 in a Japanese context and cl-27 in a Western context. jlreq stops here, but in practice ‘％’ (U+FF05) is used in a Japanese context, while ‘%’ (U+0025) is used in a Western context.

So the simple solution is to treat ‘%’ (U+0025) as a Western character (cl-27) and insert a quarter-em space between it and most Japanese characters, and treat ‘％’ (U+FF05) as a Japanese postfixed abbre…

View full answer

okumuralab · 2025-02-12T23:58:56Z

okumuralab
Feb 12, 2025

Thanks for clarifying the problem. jlreq character classes are indeed not mutually exclusive. A Twitter user directed me to §3.9.2 aa, “Western characters (cl-27),” which states that “several marks are used both in a Japanese context and a Western context.” This could suggest that the ‘%’ symbol belongs to cl-13 in a Japanese context and cl-27 in a Western context. jlreq stops here, but in practice ‘％’ (U+FF05) is used in a Japanese context, while ‘%’ (U+0025) is used in a Western context.

So the simple solution is to treat ‘%’ (U+0025) as a Western character (cl-27) and insert a quarter-em space between it and most Japanese characters, and treat ‘％’ (U+FF05) as a Japanese postfixed abbreviation (cl-13).

For ‘‰’ (U+2030), which is also both cl-13 and cl-27, there is only one Unicode code point, and no such simple solution exists. YDX-2147483647 lists other examples. My intuition is that ‘‰’ is a Western character by default, ‘,’ and ‘!’ are a comma and a punctuation mark, but I hope users can set each character's class by some means.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should character classes be mutually exclusive? #455

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Should character classes be mutually exclusive? #455

Uh oh!

YDX-2147483647 Feb 12, 2025

Replies: 1 comment

Uh oh!

okumuralab Feb 12, 2025

YDX-2147483647
Feb 12, 2025

okumuralab
Feb 12, 2025