-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to do bunsetu-separated rendering #17
Comments
No, it has been widely used in the first or second grade students in elementary schools for years. It is also useful for those students who have some problems such as Dyslexia. But the idea of using a single source for both wakachi-gaki-rendering and non-wakachi-gaki rendering is new.
The JLreq WG of APL is discussing this topic. We will get back to you soon.
Bunsetsu-based spacing and bunsetsu-based line-breaking are related but they are different. I guess that we need a CSS property for enabling/disabling bunsetsu-based line-breaking.
I think that words-spacing of CSS is good enough for bunsetsu-based spacing. But we need a Unicode character as a boundary. |
@murata2makoto: You say "I think that words-spacing of CSS is good enough".
|
@duerst First of all, I am open to suggestions. At this stage, I would like to first make the requirements very clear. Comments on Accessibility Requirements on Japanese Typography are very welcome. Having said that, I actually assumed some Unicode character that would occupy zero-width when letter-spacing and word-spacing are both normal. I did not assume negative values for these properties. |
My understanding is that
It's probably worth understanding why it says that, and whether there is going to be any webcompat impact if we change that behavior. |
(not sure what the -100 means, but i think it's just a cut&paste glytch) I'm not so sure. The I'm thinking off the top of my head here, but I think it would be better to define a qualitative switch that says "turn on word separation for scriptio continua scripts". This would then allow us to apply accessibility improvements to ordinary text that hasn't been specially prepared in advance (ie. with insertion of ZWSP or whatever). It would also allow us to apply the same property to those SE Asian scripts where we also cannot expect people to insert ZWSP as a general rule. Let's suppose we invent a new property called We could also define This is just brainstorming at this point. |
It's not only for accessibility for Japanese either. I heard similar requests for presentation slides or short text in UI, where people wants line breaking only at "word" boundary. In the example below, the 2nd item from the bottom breaks early because "ニュース/スタンド" looks much nicer than "ニューススタ/ンド". This site does this by: <span style="word-break: keep-all">ニュース​スタンド</span>
Me too, agree it's great if we can solve nicely. Maybe this has some similarity with the |
A recently announced DAISY reader supports bunsetsu-based line breaking and bunsetsu spacing. http://www.plextalk.com/jp/education/products/e-reader/ I heard from the developers that they use morphological analysis and some manual adjustment for creating HTML markup that represents bunsetsu boundaries. Then, their reading system uses such HTML for bunsetsu-based line breaking and bunsetsu spacing. |
inserting extra characters could have the side-effect of breaking existing mojikumi spacing, in that the adjacent character class logic looks at the unicode of the space and not the character after it. I agree the application of such a feature is useful for display type usage or social media graphics type layout, where breaking short lines on linguistic boundaries is more desirable than breaking anywhere. In such applications we are considering running the text through linguistic analysis to determine "desired" line breaks in addition to the strictly legal ones. If you put a special linguistic break marker (ignored by mojikumi processing) similar to how hyphens are inserted (and show or hide optionally like hyphens), that could work... |
@macnmm wrote:
@frivoal, @fantasai, @r12a and other APL members discussed about this. We are inclined to use |
Did you mean "
|
Yes, both. @frivoal, could you explain why both? |
Probably because the standard use of 200B is to mark invisible word boundaries and by default you don't want to add inter-word space there (The Unicode Standard suggests that adding inter-character space is expected, e.g. in justification). Not sure whether the CSS wording would need to remain if a new CSS parameter were added that explicitly calls for increased spacing, as long as the default remained. |
We meat that implementations would have to support both, not that authors would have to use both together. Authors can use either. As to why:
I've made a quick-and-dirty draft specification based on the discussion we had in Tokyo last month, including a few examples. Please have a look: https://specs.rivoal.net/css-space-expansion/
We should absolutely make sure that this is not the case. That sounds like an addition/clarification to https://drafts.csswg.org/css-text-4/#text-spacing-property |
Question. I understood you want to use either ZWSP or I was guessing that you're planning to use |
What about using a nonbreaking space in these cases? |
Using |
Sorry my question was misleading. I understand people here do not want to use What is the recommended way to prohibit normal break opportunities within "bunsetu"? Not only spaces, "bunsetsu" can include "365日の" (without spaces) or EAW=A characters, which |
Makoto Murata is working on Accessibility Requirements on Japanese Typography.
He says the following about adding space between bunsetsu (word-like phrases in Japanese - see https://en.wikipedia.org/wiki/Japanese_grammar#Sentences,_phrases_and_words):
Normal Japanese/Chinese text does not, in itself, indicate break opportunities for bunsetsu separation. It will be necessary to provide a mechanism that allows bunsetsu separation to be applied to normal text.
I have a number of questions around the topic:
Murata-san, is bunsetsu-spacing a recognised and widely used technique in existing text? Or is this a new idea?
Will this mechanism will be different from the way line-breaking occurs in Japanese, since the grammatical particles are considered part of the bunsetsu unit.
Would we be looking at a new CSS property? Styling seems appropriate, since the intent appears to be to use the text as normal elsewhere, and to apply the accessibility changes to existing text (ruling out the possible use of spaces, zero-width or otherwise).
Given a new property for bunsetsu spacing, will it be necessary to change default line-breaking and justification behaviour, since presumably (?) the gaps will count as word separators.
The text was updated successfully, but these errors were encountered: