-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fullwidth punctuation missing from character classes appendix #185
Comments
@kidayasuo I don't think it's related to #166 that U+FF0E etc. are included in JIS 0208, but I think we had this discussion at F2F not to update definitions with including fullwidth corresponding ones (but consider in future updates as #166). Do you remember our discussions? |
If we want to update them, here are the fullwidth punctuations that need to be updated:
And some fullwidth symbols, digits, and latin letters in https://w3c.github.io/jlreq/#cl-19 might need updating too (Greek and Cyrillic letters seem to be correct). Note that the brackets appear in more than one character class. |
Latin punctuation in the ASCII range should not be confused with full-width punctuation (not ASCII) in terms of their use or the mojikumi class they belong to. '(' is not the same class or spacing as '('. I would argue that '(' and ')' are not eligible for use in Japanese composition or warichuu and the text must be '(' and ’)'. Am I misreading the table? |
JLReq describes characters as if there is no such thing as “fullwidth” version (i.e. characters in fullwidth compatible area in Unicode). It is a part of its effort to make the description independent of the technology at the time as much as possible. It tried to separate the concept of “character” and its style such as their width, following unicode’s principle. It however made the character class appendix confusing. As in a sense it is inherent in how JLReq is written, changing it will be a major work. I believe it would be a kind of work that should be done in the major rewrite of JLReq, or as a new document. (JLReq is a record of what is and have been done in print. Its line layout rules assume and sometimes dependent of the workflow that involves manual inspection and manual adjustment. It is clear that we need a line layout rules for the digital architecture. It is what I meant by the major rewrite.) |
But i think it is a clear error to have the name 002E | FULL STOP alongside . (which is the fullwidth character) in the table. Alternatives may include:
|
Unicode name of U+002E is FULL STOP, right? I think the solution #1 is reasonable in that it is along with how JLReq is written + smallest change. Let me discuss this with original authors in TF. I personally believe ignoring fullwidth compatibility characters is confusing and it should be fixed at some point. |
I also found that confusing, initially. (And sometimes still trip up over it.) I also worry about changing the character to ASCII full stop, since it suggests that that is what authors should use, which i believe is incorrect. The distinction between the two as described in jlreq may be logically feasible, but in practise, especially without all the clever handling described in jlreq about optimal character widths, i think people are better off using the fullwidth forms, and i think they do use them generally. Therefore, i'd be more inclined to change the label to U+FF0E FULLWIDTH FULL STOP rather than change the character displayed in the chart. That also makes it easy to understand the jlreq doc, because otherwise you have to get your head around the idea that this proportionally spaced character needs to be regarded as having width in order to follow the text. |
Bin-sensei, on the JLReq TF mailing list, explained how this has happened: JLReq inherited the character class from JIS X 4051 where it indicates code points with JIS X 0213 plane, column and row. The Japanese period is translated to U+002E. He explained the situation in the NOTE at the beginning of the appendix. as it is explained he believes we can leave it as-is. The discussion is continuing. You can jump in on the mailing list. I will translate. |
Link to the note: https://w3c.github.io/jlreq/#h-note-283 (see especially the text after "To work around this issue...") The method in jlreq (and JIS X 4051) may be logically correct, but I still think the correct code point should be used in order to reduce confusion. (FWIW, clreq does not use this method. Personally, I think both the methods in clreq and jlreq have their own advantages, and are just different ways of thinking.) |
It seems agreement @ JLReq TF is OK to make this change. Let’s take the approach #2 among possible approaches Richard suggested. as this is not a small change and changes need to be throughly reviewed, deferring it to the next update. |
A.6 Full stops (cl-06)
A.7 Commas (cl-07)
https://w3c.github.io/jlreq/#cl-06
https://w3c.github.io/jlreq/#cl-07
These rows contain U+FF0E FULLWIDTH FULL STOP and U+FF0C FULLWIDTH COMMA in the first column, but ASCII code points and names in the 2nd & 3rd. This appears to be incorrect.
We seem to have a similar issue wrt parentheses too.
The text was updated successfully, but these errors were encountered: