Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plain fonts #2411

Merged
merged 17 commits into from
Sep 18, 2024
Merged

Plain fonts #2411

merged 17 commits into from
Sep 18, 2024

Conversation

brucemiller
Copy link
Owner

This PR gets most of the low-level font related commands better cooperating, in particular control sequences defined by \font are now represented by a FontDef which then will change to the font, and is also recognized by \the and \meaning, as well as the accessors \fontdimen, etc. These commands are also stored in \textfont,\scriptfont,\scriptscriptfont for use with \fam and the \mathcode mechanism to choose fonts for math tokens in a particularly interesting way, which now mostly works. Consequently most of plain.tex mostly works.

This mostly cooperates with the higher-level LaTeX style of font selection currently used by LaTeXML (by specifying family, shape, etc, w/o explicitly naming fonts or font files); later work will need to integrate these better.

…by \font (ie FontDef); update font info accessors which use this token ( \fontdimen, \fontname, \hyphenchar, \skewchar)
…s them correctly; remove badly named Register Parameter type which was only used for \the
…ode mathcodes and recognize the \fam font family for the class=7 special case, and more reliably lookup the codepoint in the encoding appropriate to the selected font; Initialize \fam (font_family) when entering math mode, and use decodeMathChar when digesting simple tokens in math.
@brucemiller brucemiller requested a review from dginev September 3, 2024 23:58
@brucemiller brucemiller marked this pull request as draft September 5, 2024 22:58
…n math,

but with (semantic) font styling which is post-processed as before;
BUT while allowing those same chars to map to Unicode when in text mode,
since these cases are often rather exotic fonts not easily covered by pure CSS.
…ange, but unicode in text, Now returns BOTH the glyph and font; corresponding changes to decodeMathChar
@brucemiller
Copy link
Owner Author

Interesting pragmatic, if counter-intuitive, development: When alphanumerics appear in math, we've been keeping their content ASCII, but recording the (presumably semantic) styling, which gets dealt with appropriately by the post-processor. That's not so convenient for certain more exotic styles (Caligraphic, Blackboard bold) in text mode, where the style ultimately must be handled with CSS. This last set of commits makes this distinct handling more explicit and consistent, even when lower-level TeX commands such as \char are used.

@brucemiller brucemiller marked this pull request as ready for review September 16, 2024 21:48
Copy link
Collaborator

@dginev dginev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very impressive (and partially beyond my understanding).

I added some minor comments related, largely just perl tidbits.
I will study some more the new test - which looks very helpful.

Looks good to merge!


sub invoke {
my ($self, $stomach) = @_;
my $current = $STATE->lookupValue('font');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$current is never used?

if (my $fontinfo = $STATE->lookupValue($$self{fontID})) {
# Temporary hack for \the\font; remember the last font def executed
$STATE->assignValue(current_FontDef => $$self{cs}, 'local');
$STATE->assignValue(font => $STATE->lookupValue('font')->merge(%$fontinfo), 'local');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use $current here, or better yet keep as-is and remove the outer lookup (since it's only needed if $fontinfo has a value).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thanks!

DefMacro('\DeclareTextSymbolDefault DefToken {}', ''); # '\DeclareTextSymbol{#1}{?}');
DefPrimitive('\DeclareTextSymbolDefault DefToken {}', sub {
my ($stomach, $cs, $encoding) = @_;
$encoding = ToString(Expand($encoding));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question on "LaTeXML customization style" comes to me reading here. ToString(Expand(...)) is a common and very familiar pattern that I've used many times.
Nothing wrong with it per se.

But I wonder if there is value (maybe as a separate PR of my own, focused on refactoring) to switch to an Expanded parameter type instead, and only do $encoding = ToString($encoding) in the binding code.

Mostly wondering out loud when to reach for the advanced parameter types, and when for the Package subroutine utilities.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, there's already an Expanded parameter type, but it's slightly special: requires braces and treats \the specially. Perhaps it wants a clearer name, since "expanded" is so generic and would be tempting to use for what you suggest here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, what is the existing Expanded parameter type meant to be I wonder? I half-remember it being related to \edef, but that is the parameter DefExpanded nowadays.

In the spirit of the XToken parameter, maybe a name such as XPlain would be a good mnemonic for what I'm after? But maybe such a name is hard to "explain".

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used in \pdfstrcmp, but I think XGeneralText is actually what should be used. Need to do a bit more checking, and followup with another PR. Maybe take up your initial suggestion?

my $glyph = FontDecode($code->valueOf, ($info ? $$info{encoding} : $class));
my $info = LookupValue('fontdeclaration@' . $class);
# my $glyph = FontDecode($code->valueOf, ($info ? $$info{encoding} : $class));
my ($glyph) = FontDecode($code->valueOf, ($info ? $$info{encoding} : $class));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This array unwrap looks a little scary/fragile...
Should we check for wantsarray in FontDecode?

We could even avoid computing the font details if we are only requesting the scalar $glyph back.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd been tempted to minimize the change to the API by doing that, but Perl tricks you. A previous call had been Box(FontDecode(...),...) where you'd think you were getting only the $glyph, but actually wantarray is true!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the Box() auto-unlisting pattern is also a dangerous thing to do in Perl :>

Box(scalar(FontDecode(...)),... doesn't have quite the same ring to it, I admit.

$type =~ s/^LaTeXML:://;
if (my $fontinfo = LookupValue('fontinfo_' . ToString($definition))) {
$meaning = 'select font ' . ($$fontinfo{fontname} || 'fontname');
if ($type =~ /(fontdef)$/i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(fontdef) doesn't need the parens, I assume they're leftover

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

$meaning = 'select font ' . ($$fontinfo{fontname} || 'fontname');
if ($type =~ /(fontdef)$/i) {
if (my $fontinfo = $definition->isFontDef) {
$meaning = 'select font ' . ($$fontinfo{name} || 'fontname');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is a very healthy improvement for \meaning.

DefParameterType('FontDef', sub {
my ($gullet) = @_;
my $readtoken = $gullet->readToken;
my $token = $readtoken;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think $readtoken is never used. Leftover?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup; thanks!

@brucemiller brucemiller merged commit d3d3fcc into master Sep 18, 2024
26 checks passed
@brucemiller brucemiller deleted the plain-fonts branch September 18, 2024 19:54
teepeemm pushed a commit to teepeemm/LaTeXML that referenced this pull request Oct 29, 2024
* New Definition class FontDef for font selecting commands defined by \font; modify \font accordingly

* Rename parameter type FontToken to FontDef for cs explicitly defined by \font (ie FontDef); update font info accessors which use this token ( \fontdimen, \fontname, \hyphenchar, \skewchar)

* Update \meaning to recognize new FontDef font commands

* Update \the to recognize FontDef commands defined by \font and process them correctly; remove badly named Register Parameter type which was only used for \the

* Moved decodeMathChar to Package.pm, updating it to more correctly decode mathcodes and recognize the \fam font family for the class=7 special case, and more reliably lookup the codepoint in the encoding appropriate to the selected font; Initialize \fam (font_family) when entering math mode, and use decodeMathChar when digesting simple tokens in math.

* Note that \mit doesn't REQUIRE math, but ony has effect in math (sets \fam)

* Consistent use of font decoding makes apparent misuse of T_OTHER when reconstructing duals in siunitx

* Fix mangled renesting of if/else

* \cal also does not require math and does nothing in text

* Add test case for plain style font manipulations

* Improve decoding of font filenames into family/series/shape and IMPORTANTLY encoding; rearrange lists

* Update FontMap to provide options for alphanumerics to remain ASCII in math,
but with (semantic) font styling which is post-processed as before;
BUT while allowing those same chars to map to Unicode when in text mode,
since these cases are often rather exotic fonts not easily covered by pure CSS.

* Make FontDecode in math keep alphanumerics in math as ASCII w/font change, but unicode in text, Now returns BOTH the glyph and font; corresponding changes to decodeMathChar

* Update all callers of FontDecode

* Make \cal return a Box so that it can revert

* Enhance and correct plain fonts test cases

* Code cleanup suggested by D.Ginev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants