Fullwidth punctuation missing from character classes appendix #185

r12a · 2020-02-26T08:07:25Z

A.6 Full stops (cl-06)
A.7 Commas (cl-07)
https://w3c.github.io/jlreq/#cl-06
https://w3c.github.io/jlreq/#cl-07

． | 002E | FULL STOP
， | 002C | COMMA

These rows contain U+FF0E FULLWIDTH FULL STOP and U+FF0C FULLWIDTH COMMA in the first column, but ASCII code points and names in the 2nd & 3rd. This appears to be incorrect.

We seem to have a similar issue wrt parentheses too.

himorin · 2020-02-27T05:43:16Z

@kidayasuo I don't think it's related to #166 that U+FF0E etc. are included in JIS 0208, but I think we had this discussion at F2F not to update definitions with including fullwidth corresponding ones (but consider in future updates as #166). Do you remember our discussions?

xfq · 2020-02-27T06:11:27Z

If we want to update them, here are the fullwidth punctuations that need to be updated:

（	0028	LEFT PARENTHESIS
［	005B	LEFT SQUARE BRACKET
｛	007B	LEFT CURLY BRACKET
）	0029	RIGHT PARENTHESIS
］	005D	RIGHT SQUARE BRACKET
｝	007D	RIGHT CURLY BRACKET
！	0021	EXCLAMATION MARK
？	003F	QUESTION MARK
：	003A	COLON
；	003B	SEMICOLON
．	002E	FULL STOP
，	002C	COMMA

And some fullwidth symbols, digits, and latin letters in https://w3c.github.io/jlreq/#cl-19 might need updating too (Greek and Cyrillic letters seem to be correct). Note that the brackets appear in more than one character class.

macnmm · 2020-02-28T00:21:00Z

Latin punctuation in the ASCII range should not be confused with full-width punctuation (not ASCII) in terms of their use or the mojikumi class they belong to. '(' is not the same class or spacing as '（'. I would argue that '(' and ')' are not eligible for use in Japanese composition or warichuu and the text must be '（' and ’）'. Am I misreading the table?

kidayasuo · 2020-06-11T02:16:57Z

JLReq describes characters as if there is no such thing as “fullwidth” version (i.e. characters in fullwidth compatible area in Unicode). It is a part of its effort to make the description independent of the technology at the time as much as possible. It tried to separate the concept of “character” and its style such as their width, following unicode’s principle. It however made the character class appendix confusing.

As in a sense it is inherent in how JLReq is written, changing it will be a major work. I believe it would be a kind of work that should be done in the major rewrite of JLReq, or as a new document.

(JLReq is a record of what is and have been done in print. Its line layout rules assume and sometimes dependent of the workflow that involves manual inspection and manual adjustment. It is clear that we need a line layout rules for the digital architecture. It is what I meant by the major rewrite.)

r12a · 2020-06-11T09:37:39Z

But i think it is a clear error to have the name 002E | FULL STOP alongside ． (which is the fullwidth character) in the table. Alternatives may include:

change the character in the table
change the Unicode name and code point value in the table
replace the Unicode name & code point with text explaining that the character is amiguous wrt its code point assigment in Unicode
include both ordinary and fullwidth characters, code points and names in the same row

kidayasuo · 2020-06-11T10:41:17Z

Unicode name of U+002E is FULL STOP, right? I think the solution #1 is reasonable in that it is along with how JLReq is written + smallest change. Let me discuss this with original authors in TF.

I personally believe ignoring fullwidth compatibility characters is confusing and it should be fixed at some point.

r12a · 2020-06-11T10:53:56Z

I also found that confusing, initially. (And sometimes still trip up over it.)

I also worry about changing the character to ASCII full stop, since it suggests that that is what authors should use, which i believe is incorrect. The distinction between the two as described in jlreq may be logically feasible, but in practise, especially without all the clever handling described in jlreq about optimal character widths, i think people are better off using the fullwidth forms, and i think they do use them generally.

Therefore, i'd be more inclined to change the label to U+FF0E FULLWIDTH FULL STOP rather than change the character displayed in the chart. That also makes it easy to understand the jlreq doc, because otherwise you have to get your head around the idea that this proportionally spaced character needs to be regarded as having width in order to follow the text.

kidayasuo · 2020-06-12T01:37:32Z

Bin-sensei, on the JLReq TF mailing list, explained how this has happened: JLReq inherited the character class from JIS X 4051 where it indicates code points with JIS X 0213 plane, column and row. The Japanese period is translated to U+002E. He explained the situation in the NOTE at the beginning of the appendix. as it is explained he believes we can leave it as-is.

The discussion is continuing. You can jump in on the mailing list. I will translate.

xfq · 2020-06-12T04:37:41Z

Link to the note: https://w3c.github.io/jlreq/#h-note-283 (see especially the text after "To work around this issue...")

The method in jlreq (and JIS X 4051) may be logically correct, but I still think the correct code point should be used in order to reduce confusion.

(FWIW, clreq does not use this method. Personally, I think both the methods in clreq and jlreq have their own advantages, and are just different ways of thinking.)

kidayasuo · 2020-06-12T09:36:01Z

It seems agreement @ JLReq TF is OK to make this change. Let’s take the approach #2 among possible approaches Richard suggested.

as this is not a small change and changes need to be throughly reviewed, deferring it to the next update.

kidayasuo added the jlreq-doc:future [JLReq-doc] Discussion items to be considered for future version(s) of JLreq document label Jun 11, 2020

himorin mentioned this issue Jan 5, 2021

[META] Reorganize character classes and its adoption of Unicode based definition #240

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fullwidth punctuation missing from character classes appendix #185

Fullwidth punctuation missing from character classes appendix #185

r12a commented Feb 26, 2020

himorin commented Feb 27, 2020

xfq commented Feb 27, 2020

macnmm commented Feb 28, 2020

kidayasuo commented Jun 11, 2020

r12a commented Jun 11, 2020

kidayasuo commented Jun 11, 2020

r12a commented Jun 11, 2020

kidayasuo commented Jun 12, 2020

xfq commented Jun 12, 2020

kidayasuo commented Jun 12, 2020

Fullwidth punctuation missing from character classes appendix #185

Fullwidth punctuation missing from character classes appendix #185

Comments

r12a commented Feb 26, 2020

himorin commented Feb 27, 2020

xfq commented Feb 27, 2020

macnmm commented Feb 28, 2020

kidayasuo commented Jun 11, 2020

r12a commented Jun 11, 2020

kidayasuo commented Jun 11, 2020

r12a commented Jun 11, 2020

kidayasuo commented Jun 12, 2020

xfq commented Jun 12, 2020

kidayasuo commented Jun 12, 2020