Skip to content

Commit

Permalink
Plain fonts (brucemiller#2411)
Browse files Browse the repository at this point in the history
* New Definition class FontDef for font selecting commands defined by \font; modify \font accordingly

* Rename parameter type FontToken to FontDef for cs explicitly defined by \font (ie FontDef); update font info accessors which use this token ( \fontdimen, \fontname, \hyphenchar, \skewchar)

* Update \meaning to recognize new FontDef font commands

* Update \the to recognize FontDef commands defined by \font and process them correctly; remove badly named Register Parameter type which was only used for \the

* Moved decodeMathChar to Package.pm, updating it to more correctly decode mathcodes and recognize the \fam font family for the class=7 special case, and more reliably lookup the codepoint in the encoding appropriate to the selected font; Initialize \fam (font_family) when entering math mode, and use decodeMathChar when digesting simple tokens in math.

* Note that \mit doesn't REQUIRE math, but ony has effect in math (sets \fam)

* Consistent use of font decoding makes apparent misuse of T_OTHER when reconstructing duals in siunitx

* Fix mangled renesting of if/else

* \cal also does not require math and does nothing in text

* Add test case for plain style font manipulations

* Improve decoding of font filenames into family/series/shape and IMPORTANTLY encoding; rearrange lists

* Update FontMap to provide options for alphanumerics to remain ASCII in math,
but with (semantic) font styling which is post-processed as before;
BUT while allowing those same chars to map to Unicode when in text mode,
since these cases are often rather exotic fonts not easily covered by pure CSS.

* Make FontDecode in math keep alphanumerics in math as ASCII w/font change, but unicode in text, Now returns BOTH the glyph and font; corresponding changes to decodeMathChar

* Update all callers of FontDecode

* Make \cal return a Box so that it can revert

* Enhance and correct plain fonts test cases

* Code cleanup suggested by D.Ginev
  • Loading branch information
brucemiller authored Sep 18, 2024
1 parent 041dcdc commit d3d3fcc
Show file tree
Hide file tree
Showing 24 changed files with 727 additions and 306 deletions.
4 changes: 4 additions & 0 deletions MANIFEST
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ lib/LaTeXML/Core/Definition.pm
lib/LaTeXML/Core/Definition/Expandable.pm
lib/LaTeXML/Core/Definition/Conditional.pm
lib/LaTeXML/Core/Definition/Primitive.pm
lib/LaTeXML/Core/Definition/FontDef.pm
lib/LaTeXML/Core/Definition/Register.pm
lib/LaTeXML/Core/Definition/CharDef.pm
lib/LaTeXML/Core/Definition/Constructor.pm
Expand Down Expand Up @@ -1352,6 +1353,9 @@ t/fonts/mixed.xml
t/fonts/omencodings.pdf
t/fonts/omencodings.tex
t/fonts/omencodings.xml
t/fonts/plainfonts.pdf
t/fonts/plainfonts.tex
t/fonts/plainfonts.xml
t/fonts/sizes.pdf
t/fonts/sizes.tex
t/fonts/sizes.xml
Expand Down
151 changes: 87 additions & 64 deletions lib/LaTeXML/Common/Font.pm
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ my $FLAG_EMPH = 0x10;
# Mappings from various forms of names or component names in TeX
# Given a font, we'd like to map it to the "logical" names derived from LaTeX,
# (w/ loss of fine grained control).
# and (importantly) the encoding needed to lookup unicode in a FontMap!
# I'd like to use Karl Berry's font naming scheme
# (See http://www.tug.org/fontname/html/)
# but it seems to be a one-way mapping, and moreover, doesn't even fit CM fonts!
Expand All @@ -60,61 +61,58 @@ my $FLAG_EMPH = 0x10;
# NOTE: This probably doesn't really belong in here...

my %font_family = (
cmr => { family => 'serif' },
cmss => { family => 'sansserif' },
cmssq => { family => 'sansserif' }, # quote style?
cmssqi => { family => 'sansserif', shape => 'italic' }, # quote style?
cmtt => { family => 'typewriter' }, cmvtt => { family => 'typewriter' },
cmt => { family => 'serif' }, # for cmti "text italic"
cmfib => { family => 'serif' },
cmfr => { family => 'serif' },
cm => { family => 'serif' },
cmdh => { family => 'serif' },
cmr => { family => 'serif' },
cmdunh => { family => 'serif' }, # like cmr10 but with tall body heights
cmu => { family => 'serif' }, # unslanted italic ??
ptm => { family => 'serif' }, ppl => { family => 'serif' },
pnc => { family => 'serif' }, pbk => { family => 'serif' },
phv => { family => 'sansserif' }, pag => { family => 'serif' },
pcr => { family => 'typewriter' }, pzc => { family => 'script' },
put => { family => 'serif' }, bch => { family => 'serif' },
psy => { family => 'symbol' }, pzd => { family => 'dingbats' },
ccr => { family => 'serif' }, ccy => { family => 'symbol' },
cmbr => { family => 'sansserif' }, cmtl => { family => 'typewriter' },
cmbrs => { family => 'symbol' }, ul9 => { family => 'typewriter' },
txr => { family => 'serif' }, txss => { family => 'sansserif' },
txtt => { family => 'typewriter' }, txms => { family => 'symbol' },
txsya => { family => 'symbol' }, txsyb => { family => 'symbol' },
pxr => { family => 'serif' }, pxms => { family => 'symbol' },
pxsya => { family => 'symbol' }, pxsyb => { family => 'symbol' },
futs => { family => 'serif' },
uaq => { family => 'serif' }, ugq => { family => 'sansserif' },
eur => { family => 'serif' }, eus => { family => 'script' },
euf => { family => 'fraktur' }, euex => { family => 'symbol' },
# The following are actually math fonts.
ms => { family => 'symbol' },
ccm => { family => 'serif', shape => 'italic' },
cmm => { family => 'math', shape => 'italic', encoding => 'OML' },
cmex => { family => 'symbol', encoding => 'OMX' }, # Not really symbol, but...
cmsy => { family => 'symbol', encoding => 'OMS' },
ccitt => { family => 'typewriter', shape => 'italic' },
cmsltt => { family => 'typewriter', shape => 'slanted' },
cmbrm => { family => 'sansserif', shape => 'italic' },
futm => { family => 'serif', shape => 'italic' },
futmi => { family => 'serif', shape => 'italic' },
txmi => { family => 'serif', shape => 'italic' },
pxmi => { family => 'serif', shape => 'italic' },
bbm => { family => 'blackboard' },
bbold => { family => 'blackboard' },
bbmss => { family => 'blackboard' },
# some ams fonts
cmmib => { family => 'italic', series => 'bold' },
cmbsy => { family => 'symbol', series => 'bold' },
msa => { family => 'symbol', encoding => 'AMSa' },
msb => { family => 'symbol', encoding => 'AMSb' },
# Are these really the same?
msx => { family => 'symbol', encoding => 'AMSa' },
msy => { family => 'symbol', encoding => 'AMSb' },
# Computer Modern
cm => { family => 'serif' }, # base for synthesizing cmbx, cmsl ...
cmr => { family => 'serif' },
cmm => { family => 'math', shape => 'italic', encoding => 'OML' }, # cmmi
cmsy => { encoding => 'OMS' },
cmex => { encoding => 'OMX' },
cmss => { family => 'sansserif' },
cmtt => { family => 'typewriter' },
cmvtt => { family => 'typewriter' },
cmssq => { family => 'sansserif' }, # quote style?
cmssqi => { family => 'sansserif', shape => 'italic' }, # quote style?
cmt => { family => 'serif' }, # for cmti "text italic"
cmmib => { family => 'italic', series => 'bold' },
cmbsy => { series => 'bold', encoding => 'OMS' },
cmfib => { family => 'serif' },
cmfr => { family => 'serif' },
cmdh => { family => 'serif' },
cmdunh => { family => 'serif' }, # like cmr10 but with tall body heights
cmu => { family => 'serif' }, # unslanted italic ??
cmsltt => { family => 'typewriter', shape => 'slanted' },
cmbrm => { family => 'sansserif', shape => 'italic' },
# Some Blackboard Bold fonts
bbm => { family => 'blackboard' },
bbold => { family => 'blackboard' },
bbmss => { family => 'blackboard' },
# Computer Concrete
ccr => { family => 'serif' },
ccm => { family => 'serif', shape => 'italic' },
cct => { family => 'serif' },
ccitt => { family => 'typewriter', shape => 'italic' },
# AMS fonts
msa => { encoding => 'AMSa' },
msb => { encoding => 'AMSb' },
msx => { encoding => 'AMSa' }, # Are these really the same? (or even real?)
msy => { encoding => 'AMSb' },
# Euler
eur => { family => 'serif' },
eus => { family => 'script' },
euf => { family => 'fraktur' },
euex => { encoding => 'OMX' },
# TX Fonts (Times Roman)
txr => { family => 'serif' },
txmi => { family => 'serif', shape => 'italic' },
txss => { family => 'sansserif' },
txtt => { family => 'typewriter' },
txsya => { encoding => 'AMSa' },
txsyb => { encoding => 'AMSb' },
# PX Fonts (Palladio)
pxr => { family => 'serif' },
pxmi => { family => 'serif', shape => 'italic' },
pxsya => { encoding => 'AMSa' },
pxsyb => { encoding => 'AMSb' },
# Pretend to recognize xy's fonts
xydash => { family => 'graphic' },
xyatip => { family => 'graphic' },
Expand All @@ -125,17 +123,44 @@ my %font_family = (
xycmbt => { family => 'graphic' },
xyluat => { family => 'graphic' },
xylubt => { family => 'graphic' },
# Fourier
futm => { family => 'serif', shape => 'italic' },
futmi => { family => 'serif', shape => 'italic' },
# More fonts that need to be better sorted, classified & labelled
# family symbol, dingbats are nonsense: We need an encoding and FontMap!!!
ptm => { family => 'serif' }, ppl => { family => 'serif' },
pnc => { family => 'serif' }, pbk => { family => 'serif' },
phv => { family => 'sansserif' }, pag => { family => 'serif' },
pcr => { family => 'typewriter' }, pzc => { family => 'script' },
put => { family => 'serif' }, bch => { family => 'serif' },
psy => { family => 'symbol' }, pzd => { family => 'dingbats' },
cmbr => { family => 'sansserif' }, cmtl => { family => 'typewriter' },
cmbrs => { family => 'symbol' }, ul9 => { family => 'typewriter' },
futs => { family => 'serif' },
uaq => { family => 'serif' }, ugq => { family => 'sansserif' },
);

# Maps the "series code" to an abstract font series name
my %font_series = (
'' => { series => 'medium' }, m => { series => 'medium' }, mc => { series => 'medium' },
b => { series => 'bold' }, bc => { series => 'bold' }, bx => { series => 'bold' },
sb => { series => 'bold' }, sbc => { series => 'bold' }, bm => { series => 'bold' });
'' => {}, # default medium
m => { series => 'medium' },
mc => { series => 'medium' },
b => { series => 'bold' },
bc => { series => 'bold' },
bx => { series => 'bold' },
sb => { series => 'bold' },
sbc => { series => 'bold' },
bm => { series => 'bold' });

# Maps the "shape code" to an abstract font shape name.
my %font_shape = ('' => { shape => 'upright' }, n => { shape => 'upright' }, i => { shape => 'italic' }, it => { shape => 'italic' },
sl => { shape => 'slanted' }, sc => { shape => 'smallcaps' }, csc => { shape => 'smallcaps' });
my %font_shape = (
'' => {}, # default upright
n => { shape => 'upright' },
i => { shape => 'italic' },
it => { shape => 'italic' },
sl => { shape => 'slanted' },
sc => { shape => 'smallcaps' },
csc => { shape => 'smallcaps' });

# These could be exported...
sub lookupFontFamily {
Expand Down Expand Up @@ -181,7 +206,7 @@ my $FONTREGEXP
sub decodeFontname {
my ($name, $at, $scaled) = @_;
if ($name =~ /^$FONTREGEXP$/o) {
my %props;
my %props = (series => 'medium', shape => 'upright', encoding => 'OT1');
my ($fam, $ser, $shp, $size) = ($1, $2, $3, $4);
if (my $ffam = lookupFontFamily($fam)) { map { $props{$_} = $$ffam{$_} } keys %$ffam; }
if (my $fser = lookupFontSeries($ser)) { map { $props{$_} = $$fser{$_} } keys %$fser; }
Expand All @@ -191,8 +216,6 @@ sub decodeFontname {
$size = $size * $scaled if defined $scaled;
$props{name} = $name;
$props{size} = $size;
# Experimental Hack !?!?!?
$props{encoding} = 'OT1' unless defined $props{encoding};
return %props; }
else {
Info('unrecognized', 'font', undef, "Unrecognized fontname '$name'");
Expand Down Expand Up @@ -251,7 +274,7 @@ sub textDefault {
sub mathDefault {
my ($self) = @_;
return $self->new_internal('math', $DEFSERIES, 'italic', DEFSIZE(),
$DEFCOLOR, $DEFBACKGROUND, $DEFOPACITY, undef, $DEFLANGUAGE, 'text', 0); }
$DEFCOLOR, $DEFBACKGROUND, $DEFOPACITY, 'OT1', $DEFLANGUAGE, 'text', 0); }

# Accessors
# Using an array here is getting ridiculous!
Expand Down
4 changes: 4 additions & 0 deletions lib/LaTeXML/Core/Definition.pm
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ use base qw(LaTeXML::Common::Object);
require LaTeXML::Core::Definition::Expandable;
require LaTeXML::Core::Definition::Conditional;
require LaTeXML::Core::Definition::Primitive;
require LaTeXML::Core::Definition::FontDef;
require LaTeXML::Core::Definition::Register;
require LaTeXML::Core::Definition::CharDef;
require LaTeXML::Core::Definition::Constructor;
Expand Down Expand Up @@ -52,6 +53,9 @@ sub isExpandable {
sub isRegister {
return ''; }

sub isFontDef { # ONLY FontDef handles this!
return ''; }

sub isPrefix {
return 0; }

Expand Down
7 changes: 4 additions & 3 deletions lib/LaTeXML/Core/Definition/CharDef.pm
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,11 @@ sub invoke {
($local ? Tokens(T_CS('\mathchar'), $value->revert, T_CS('\relax')) : $$self{cs}),
role => $$self{role}); }
else { # else text; but note defered font/encoding till digestion!
my ($char, %props) = LaTeXML::Package::FontDecode($value->valueOf);
return Box($char, undef, undef,
# Decode the codepoint using current font & encoding
my ($glyph, $adjfont) = LaTeXML::Package::FontDecode($value->valueOf);
return Box($glyph, $adjfont, undef,
($local ? Tokens(T_CS('\char'), $value->revert, T_CS('\relax')) : $$self{cs}),
%props); } }
); } }

sub equals {
my ($self, $other) = @_;
Expand Down
73 changes: 73 additions & 0 deletions lib/LaTeXML/Core/Definition/FontDef.pm
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# /=====================================================================\ #
# | LaTeXML::Core::Definition::FontDef | #
# | Representation of definitions of Fonts | #
# |=====================================================================| #
# | Part of LaTeXML: | #
# | Public domain software, produced as part of work done by the | #
# | United States Government & not subject to copyright in the US. | #
# |---------------------------------------------------------------------| #
# | Bruce Miller <bruce.miller@nist.gov> #_# | #
# | http://dlmf.nist.gov/LaTeXML/ (o o) | #
# \=========================================================ooo==U==ooo=/ #
package LaTeXML::Core::Definition::FontDef;
use strict;
use warnings;
use LaTeXML::Global;
use LaTeXML::Common::Object;
use LaTeXML::Common::Error;
use LaTeXML::Core::Token;
use LaTeXML::Core::Tokens;
use LaTeXML::Core::Box;
use base qw(LaTeXML::Core::Definition::Primitive);

# A CharDef is a specialized register;
# You can't assign it; when you invoke the control sequence, it returns
# the result of evaluating the character (more like a regular primitive).
# When $mathglyph is provided, it is the unicode corresponding to the \mathchar of $value
sub new {
my ($class, $cs, $fontid, %traits) = @_;
return bless { cs => $cs, parameters => undef,
fontID => $fontid,
locator => $STATE->getStomach->getGullet->getMouth->getLocator,
%traits }, $class; }

# Return the "font info" associated with the (TeX) font that this command selects (See \font)
sub isFontDef {
my ($self) = @_;
return $STATE->lookupValue($$self{fontID}); }

sub invoke {
my ($self, $stomach) = @_;
if (my $fontinfo = $STATE->lookupValue($$self{fontID})) {
# Temporary hack for \the\font; remember the last font def executed
$STATE->assignValue(current_FontDef => $$self{cs}, 'local');
$STATE->assignValue(font => $STATE->lookupValue('font')->merge(%$fontinfo), 'local');
}
return Box(undef, undef, undef, $$self{cs}); }

#===============================================================================
1;

__END__
=pod
=head1 NAME
C<LaTeXML::Core::Definition::FontDef> - Control sequence definitions for font symbols defined by \font.
=head1 DESCRIPTION
Representation for control sequences defined by \font.
It extends L<LaTeXML::Core::Definition::Primitive>.
=head1 AUTHOR
Bruce Miller <bruce.miller@nist.gov>
=head1 COPYRIGHT
Public domain software, produced as part of work done by the
United States Government & not subject to copyright in the US.
=cut
20 changes: 14 additions & 6 deletions lib/LaTeXML/Core/Stomach.pm
Original file line number Diff line number Diff line change
Expand Up @@ -242,8 +242,13 @@ sub invokeToken_simple {
return LaTeXML::Core::Comment->new($comment); }
else {
$STATE->clearPrefixes; # prefixes shouldn't apply here.
return Box(LaTeXML::Package::FontDecodeString($meaning->toString, undef, 1),
undef, undef, $meaning); } }
if (my $mathcode = $STATE->lookupValue('IN_MATH')
&& $STATE->lookupMathcode($meaning->toString)) {
my ($role, $glyph, $f, $reversion) = LaTeXML::Package::decodeMathChar($mathcode, $meaning);
return Box($glyph, $f, undef, $reversion, role => $role); }
else {
return Box(LaTeXML::Package::FontDecodeString($meaning->toString, undef, 1),
undef, undef, $meaning); } } }

# Regurgitate: steal the previously digested boxes from the current level.
sub regurgitate {
Expand Down Expand Up @@ -359,10 +364,13 @@ sub setMode {
# and save the text font for any embedded text.
$STATE->assignValue(savedfont => $curfont, 'local');
$STATE->assignValue(script_base_level => scalar(@{ $$self{boxing} })); # See getScriptLevel
$STATE->assignValue(font => $STATE->lookupValue('mathfont')->merge(
color => $curfont->getColor, background => $curfont->getBackground,
size => $curfont->getSize,
mathstyle => ($mode =~ /^display/ ? 'display' : 'text')), 'local'); }
my $mathfont = $STATE->lookupValue('mathfont')->merge(
color => $curfont->getColor, background => $curfont->getBackground,
size => $curfont->getSize,
mathstyle => ($mode =~ /^display/ ? 'display' : 'text'));
$STATE->assignValue(font => $mathfont, 'local');
$STATE->assignValue(initial_math_font => $mathfont, 'local');
$STATE->assignValue(fontfamily => -1, 'local'); }
else {
# When entering text mode, we should set the font to the text font in use before the math
# but inherit color and size
Expand Down
28 changes: 0 additions & 28 deletions lib/LaTeXML/Engine/Base_ParameterTypes.pool.ltxml
Original file line number Diff line number Diff line change
Expand Up @@ -283,34 +283,6 @@ DefParameterType('Variable', sub {
my $params = $defn->getParameters;
return Tokens($defn->getCS, ($params ? $params->revertArguments(@args) : ())); });

# Same, but not necessarily writable
DefParameterType('Register', sub {
my ($gullet) = @_;
my $token = $gullet->readXToken;
my $defn = $token && LookupDefinition($token);
if ((defined $defn) && $defn->isRegister) {
[$defn, ($$defn{parameters} ? $$defn{parameters}->readArguments($gullet) : ())]; }
else {
if ($token && ($token->getCatcode == CC_CS)) {
if ($token->getString eq '\font') {
# \font is a bit of a register-like exception
return [$defn]; }
Error('expected', '<register>', $gullet,
"A <register> was supposed to be here", "Got " . Stringify($token),
"Defining it now.");
DefRegisterI($token, undef, Dimension(0)); # Dimension, or what?
return [LookupDefinition($token)]; }
else {
Error('expected', '<register>', $gullet,
"A <register> was supposed to be here", "Got " . Stringify($token),
"But it is not even definable.");
return [LookupDefinition(T_CS('\lx@DUMMY@REGISTER'))]; } } },
reversion => sub {
my ($var) = @_;
my ($defn, @args) = @$var;
my $params = $defn->getParameters;
return Tokens($defn->getCS, ($params ? $params->revertArguments(@args) : ())); });

DefParameterType('TeXFileName', sub {
my ($gullet) = @_;
my ($token, $cc, @tokens) = ();
Expand Down
Loading

0 comments on commit d3d3fcc

Please sign in to comment.