Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kerns #2423

Merged
merged 6 commits into from
Sep 29, 2024
Merged

Kerns #2423

merged 6 commits into from
Sep 29, 2024

Conversation

brucemiller
Copy link
Owner

A small PR providing more consistent treatment of kerns and skips, and plausibly implements \unkern,\unskip,\lastkern,\lastskip. Several test cases are updated due to which & how many spacing codepoints find their way into text.

@brucemiller brucemiller requested a review from dginev September 28, 2024 20:27
@@ -31,7 +31,7 @@
<listingline xml:id="algx1.l2"><tags>
<tag><text fontsize="80%">2:</text></tag>
<tag role="refnum">2</tag>
</tags>     <text font="bold">return</text> 7
</tags>  <text font="bold">return</text> 7
Copy link
Collaborator

@dginev dginev Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leading space reduction in the algx test, and the algorithmic snippet in the figure_mixed_content test rang a bell, where we've had two arXiv reports mentioning the leading whitespace in listings is getting wrongly removed. That's arXiv/html_feedback 1712 and arXiv/html_feedback 2201 for reference.

Maybe the underlying trickiness comes from even trying to model whitespace as actual characters inside an XML document (rather than some <mspace>-like element with a spacing size attribute). But since we're doing characters, I wonder if it's worth taking a closer look at what exactly changed in those {algorithmic}-related tests, and if it is the same cause as the {lslistings} reports.

Copy link
Collaborator

@dginev dginev Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now understand that the line I am commenting on had 6 ascii spaces (U+0020) converted to two alternative space chars: (U+2003)(U+2002)

I should probably take some time next to study how successful various fonts are in getting the "expected" widths displayed. GitHub's review interface seems to be unsuccessful? Today it uses the ui-monospace font-family.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trigger for this PR was noticing that some kerns & skips were being dropped (they show up a lot in plain tex stuff); followed by improving the accuracy of converting them to spaces and filling in a few unimplemented commands.
That said, I too was surprised at being reminded that these spaces are (still) actually relied on for layout in listings. Even so, maybe that's enough to fix the cases you mention; I checked the 2nd one and the algorithms now come out indented nicely.
It would seem better in a few cases (not all kerns & skips!) to use an element w/width and CSS, rather than just clever spaces. The constructors for \hskip,\kern could do that, if it could detect the situation. But I'm not sure how to define it; in this case possibly following an ltx:tags would be sufficient, but I haven't thought it through.

Copy link
Collaborator

@dginev dginev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR code looks healthy on review ✅

I indulged in some discussion, you're welcome to merge whenever.

# a candidate for use by \hskip, \hspace, etc... ?
our @spaces = ( # Spaces to fake spacing, with width in ems
[0.167, "\x{2006}"], [0.222, "\x{2005}"], [0.278, "\x{2004}"],
[0.333, UTF(0xA0)], [0.5, "\x{2002}"], [1.0, "\x{2003}"]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading more into the details...

\x{2004} is three-per-em, but it won't get used at the 0.333 threshold? In favour of a no-break space, or (U+00A0).

It would be helpful to have a comment explaining how the values are chosen here, in case they get revised at some later point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another curious qeustion that comes to mind is whether we have a reason to use the "medium mathematical space" character, U+205F, which is 4/18 em.

if ($ems + 0.01 > $w) {
my $n = int(($ems + 0.01) / $w);
$ems -= $n * $w; $s .= $spaces[$i][1] x $n; } }
return $s; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this loop would avoid adding any character if the requested dimension is narrower than 0.167em, is that correct? Just wondering if there are legitimate cases where someone wants an even smaller amount of space. The hair space (U+200A) could fill in smaller gaps. I've seen one mention it is traditionally 1/16em or 0.0625.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, my revision makes it easy to add more spacing chars to the mix, but I didn't actively hunt down any extras. You have a list you'd like to include?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a list really. More of a learning-as-I-review process.

@brucemiller
Copy link
Owner Author

I tweaked the set of space chars to be a little cleaner. There's lots of space (cough) for over-thinking here; we're really just approximating the space requested.

@brucemiller
Copy link
Owner Author

Would be nice if you could check whether the 1st arXiv issue is also fixed; and record resolutions (however that's being done)

@dginev
Copy link
Collaborator

dginev commented Sep 29, 2024

Would be nice if you could check whether the 1st arXiv issue is also fixed; and record resolutions (however that's being done)

They're not latexml issues, so nothing to record here. The arXiv issue will get closed when an updated LaTeXML reconverts the article and it goes live, which is easily a month away from today, likely longer. The CSS issue Vincenzo identified is likely to get resolved a little quicker, but likely still weeks away.

@brucemiller brucemiller merged commit 2233d87 into master Sep 29, 2024
26 checks passed
@brucemiller brucemiller deleted the kerns branch September 29, 2024 23:14
for (my $i = $#LaTeXML::LIST ; $i > 0 ; $i--) {
my $box = $LaTeXML::LIST[$i];
last if !$box || !$box->getProperty('isSkip');
return $box->getProperty('width'); }
Copy link
Collaborator

@dginev dginev Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust's clippy gave me a fun message while porting this loop:

error: this loop never actually loops

I didn't notice that while reviewing -- may be worth a second look? @brucemiller
The current code only ever inspects the last element of the box list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the last should have been a next ?

teepeemm pushed a commit to teepeemm/LaTeXML that referenced this pull request Oct 29, 2024
* Make kerns add hint in math, spacing in plain text; implement \unkern, \lastkern

* Improve converting a Dimension to space chars; Make \hskip add hints to math; implement \unskip, \lastskip

* Add \unskip,\unkern etc tests to test case

* Updates test cases for changed spacing conversion

* Slightly better choice of space chars, updating tests

* Whooops missed a teset
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants