-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode math properties #2425
Unicode math properties #2425
Conversation
…ning, ...) associated with a unicode char
…ning, ...) associated with a unicode char
…d with the glyph, after decoding and font adjustment
…il digestsion; use new decodeMathChar API
…lass with any already assigned grammatical role
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually quite like the direction of the PR. Maybe I don't yet understand enough to register the controversial piece?
I left a few minor comments, and I can't say I understand the full details of the TeX interplay yet, but the grammar-near pieces seem encouraging.
I especially like the changes in the tests, where the math XML markup has become significantly more enhanced.
($local ? Tokens(T_CS('\char'), $value->revert, T_CS('\relax')) : $$self{cs})); } | ||
else { # Else math mode, mathDecode! | ||
my ($glyph, $f, $rev, %props) = LaTeXML::Package::decodeMathChar($nvalue); | ||
if (!defined $props{name}) { # TEMPORARY HACK ????? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything to add to that comment? E.g. what is missing for the hack to go away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified; thanks!
$whatsit->setProperty(font => LookupValue('font')->specialize($glyph)) if $glyph; | ||
my ($glyph, $f, $rev, %props) = decodeMathChar($n); | ||
$whatsit->setProperty(glyph => $glyph) if $glyph; | ||
$whatsit->setProperties(%props); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor, but you could also guard that setter call:
$whatsit->setProperties(%props) if %props;
DefConstructor('\mathclose Digested', "?#1(<ltx:XMWrap role='CLOSE'>#1</ltx:XMWrap>)()", bounded => 1); | ||
DefConstructor('\mathpunct Digested', "?#1(<ltx:XMWrap role='PUNCT'>#1</ltx:XMWrap>)()", bounded => 1); | ||
DefConstructor('\mathinner Digested', "?#1(<ltx:XMWrap role='ATOM'>#1</ltx:XMWrap>)()", bounded => 1); | ||
our %mathclass_subclass = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This nested hash has the feeling of a mini-grammar... Interesting.
How about three other ideas:
PUNCT => { PERIOD => 1 }
ID => { NUMBER => 1 }
BIGOP => { SUMOP => 1, INTOP => 1, LIMITOP => 1, DIFFOP => 1 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!, yep, that's the idea!
my $font = $curfont->merge(%$fontinfo); | ||
$fontinfo = $defn && $defn->isFontDef; | ||
if ($$basefontinfo{size} != $curfont->getSize) { # If we've gotten an explicit font SIZE change; Adjust! | ||
$fontinfo = {%$fontinfo}; $$fontinfo{size} = $curfont->getSize; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a small theoretical chance $fontinfo
is undef
here, so maybe default it to {}
?
$fontinfo = ($defn && $defn->isFontDef) || {};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
$fontinfo = {%$fontinfo}; $$fontinfo{size} = $curfont->getSize; } } | ||
my $font = $curfont->merge(%$fontinfo); | ||
if ($downsize > 0) { $font = $curfont->merge(scripted => 1); } | ||
if ($downsize > 1) { $font = $curfont->merge(scripted => 1); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These merge calls have the same value? Do you intend it to do the same merge twice when >1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, twice for scriptscript
#====================================================================== | ||
UTF(0x5C) => { role => 'ADDOP', meaning => 'set-minus' }, # \backslash | ||
UTF(0xAC) => { role => 'BIGOP', meaning => 'not' }, # \lnot | ||
UTF(0xAC) => { role => 'BIGOP', meaning => 'not' }, # \neg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0xAC is given 3 separate times here? Unsure how that happened. There may be other cases, I see 0x5C
is also there twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
editing mistake. fixed.
"ker" => { role => 'OPFUNCTION', meaning => 'kernel' }, # \ker # | ||
"lg" => { role => 'OPFUNCTION' }, # \lg # | ||
"lim" => { role => 'LIMITOP', meaning => 'limit', need_scriptpos => 1 }, # \lim # | ||
"lim inf" => { role => 'LIMITOP', meaning => 'limit-infimum', need_scriptpos => 1 }, # \liminf # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, first I assume single characters, then single words (no spaces), but I see some entries even have a space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, these don't actually get fired, yet; it would need to be hooked into the math ligature that combines "s,i,n" into "sin"; and then looked up! I suspect "lim inf" will likely fail that test. I kinda left them all in as a reminder...
* New unicode_math_properties(char) gets the math properties (role, meaning, ...) associated with a unicode char * New unicode_math_properties(char) gets the math properties (role, meaning, ...) associated with a unicode char * Revise API for decodeMathChar to return the math properties associated with the glyph, after decoding and font adjustment * Revise CharDef->new API to take only the code, deferring decoding until digestsion; use new decodeMathChar API * Updates for improved decodeMathChar and CharDef->new API * Update fontmap for some bigops * Have \mathop and friends intelligently merge the requested TeX math class with any already assigned grammatical role * Update tests for more 'semantic' when low-level glyphs are accessed * Refinements & clarifications suggested by D.Ginev * Add guard
Here's an interesting, perhaps controversial, PR: It enhances
decodeMathChar
to also lookup math properties (role, meaning,...
) for the decoded glyph from the Unicode module. This provides some parity between symbols defined "semantically" by bindings and low-level\mathchar
, in the simple cases. This covers most of what TeX's plain defines, as it defines them!