-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Letter and word spacing #171
Letter and word spacing #171
Conversation
d33b18d
to
ad0428a
Compare
I've rebased it o it doesn't apply on the baseline patch anymore, because I am increasingly skeptical I can get that one to work. |
I think the letter spacing logic needs to be reworked. You currently iterate over input code points, but you need to iterate over grapheme clusters (not to be confused with HarfBuzz clusters) and add the letter spacing to the last code point in the cluster. Then, when adding the spacing to the glyphs themselves, you need to add it to the last glyph (in visual order) in any sequence of glyphs that has the same HarfBuzz cluster. You can test Latin text with decomposed combining marks (e.g. Furthermore, you need to disable ligatures when letter spacing is used (at least |
ad0428a
to
e85b395
Compare
Ok, so I had some trouble getting this to work. Namely, when I reuse raqm's grapheme break functionality, and then use the harfbuzz clusters to apply spacing, the spacing just did not get applied, especially with grapheme clusters that had more than two characters. Part of my confusion here is also that : if harfbuzz clusters and grapheme clusters are not the same thing, why are we applying these to the last entry in the grapheme cluster, furthermore, if we are only applying letter spacing once per harfbuzz cluster, why do we worry about grapheme clusters at all? I currently have offset added to RTL characters that aren't the last in their cluster, which works, but I now realize that's somewhat inconsistent and potentially very troublsome in the future, and will try to adjust this now.
I've implemented this in the letter-spacing function, using clig and liga, as those are what css-fonts specifies are the common ligatures. I could also turn off dlig, hlig and calt, as all those are also considered ligatures according to css-fonts. I am reusing the set_font_feature function here, as that reduces code duplication, but that does rely on letter-spacing to happen after setting the font-features. If you'd prefer this to happen just before shaping, please say so. Furthermore, I am having a bit of trouble with the tests: It seems deja vu sans doesn't do decomposition for latin with diacritics, so testing letterspacing on latin is somewhat tricky. Is this why we should be using the grapheme clusters? EDIT: I forgot to say that I did apply a check for grapheme breaks for wordspacing, as I've noticed that browsers don't consider a space with a combining mark a word space. I have also fixed RTL now. |
Ok, after a night of sleep, I managed to get grapheme breaking to work by checking whether the loop is at the start of a cluster instead of the end, and then adding the spacing to the first codepoint. I can't get my head around applying it to the last code point, even though I understand there might be a situation where, say, the grapheme cluster goes from 3 to 9 and the harfbuzz cluster from 3 to 5. I've tried the following, which would need adjustment for RTL, but should work on LTR: diff --git a/src/raqm.c b/src/raqm.c
index 2e7ac71..452532a 100644
--- a/src/raqm.c
+++ b/src/raqm.c
@@ -1086,28 +1086,25 @@ _raqm_set_spacing (raqm_t *rq,
for (size_t i = start; i < end; i++)
{
- bool set_spacing = i == 0;
+ bool set_spacing = i == rq->text_len - 1;
if (!set_spacing)
- set_spacing = _raqm_allowed_grapheme_boundary (rq->text[i-1], rq->text[i]);
+ set_spacing = _raqm_allowed_grapheme_boundary (rq->text[i], rq->text[i+1]);
if (set_spacing)
{
if (word_spacing)
{
- if (_raqm_allowed_grapheme_boundary (rq->text[i], rq->text[i+1]))
+ /* CSS word seperators, word spacing is only applied on these.*/
+ if (rq->text[i] == 0x0020 || /* Space */
+ rq->text[i] == 0x00A0 || /* No Break Space */
+ rq->text[i] == 0x1361 || /* Ethiopic Word Space */
+ rq->text[i] == 0x10100 || /* Aegean Word Seperator Line */
+ rq->text[i] == 0x10101 || /* Aegean Word Seperator Dot */
+ rq->text[i] == 0x1039F || /* Ugaric Word Divider */
+ rq->text[i] == 0x1091F) /* Phoenician Word Separator */
{
- /* CSS word seperators, word spacing is only applied on these.*/
- if (rq->text[i] == 0x0020 || /* Space */
- rq->text[i] == 0x00A0 || /* No Break Space */
- rq->text[i] == 0x1361 || /* Ethiopic Word Space */
- rq->text[i] == 0x10100 || /* Aegean Word Seperator Line */
- rq->text[i] == 0x10101 || /* Aegean Word Seperator Dot */
- rq->text[i] == 0x1039F || /* Ugaric Word Divider */
- rq->text[i] == 0x1091F) /* Phoenician Word Separator */
- {
- rq->text_info[i].spacing_after = spacing;
- rq->text_info[i].spacing_is_percentage = percentage;
- }
+ rq->text_info[i].spacing_after = spacing;
+ rq->text_info[i].spacing_is_percentage = percentage;
}
}
else
@@ -2175,11 +2172,22 @@ _raqm_shape (raqm_t *rq)
pos = hb_buffer_get_glyph_positions (run->buffer, &len);
info = hb_buffer_get_glyph_infos (run->buffer, &len);
+ unsigned int cluster_length = 0;
+ unsigned int current_cluster = 0;
for (unsigned int i = 0; i < len; i++)
{
_raqm_ft_transform (&pos[i].x_advance, &pos[i].y_advance, matrix);
_raqm_ft_transform (&pos[i].x_offset, &pos[i].y_offset, matrix);
+ if (current_cluster == info[i].cluster)
+ {
+ cluster_length += 1;
+ }
+ else
+ {
+ current_cluster = info[i].cluster;
+ cluster_length = 0;
+ }
bool set_spacing = false;
if (run->direction == HB_DIRECTION_RTL)
{
@@ -2193,7 +2201,7 @@ _raqm_shape (raqm_t *rq)
set_spacing = info[i].cluster != info[i+1].cluster;
}
- _raqm_text_info rq_info = rq->text_info[info[i].cluster];
+ _raqm_text_info rq_info = rq->text_info[current_cluster + cluster_length];
if (rq_info.spacing_after != 0 && set_spacing)
{ The problem I am seeing is that Deja Vu Sans and Noto Sans will give different results for letter spacing for letters with excessive combining marks, with the former spacing out each letter while the latter clumps some letters. With the current solution, all my test-cases work, though I am worried about the edge case identified earlier on in the post. I've also added disabling the other opentype features. |
It isn't actually parrt of CSS and seems odd in any case.
Ok, half a year later I haven't really found an example of the problem I was worried about, despite my other text work, and in hindsight it may have also been a bug somewhere else... I've removed 'percentage' based spacing because it is not actually in the CSS spec, and I must've mixed something up. |
Thanks! (and sorry for the late review). |
Implements letter and word spacing. This patch applies on top of #169
As discussed in #166 , LTR has the spacing added to the advance, RTL added to offset and advance, TTB spacing 'removed' from the advance because TTB advance is negative.
Fixes #166