Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] TeX-faitfhul errors on undefined command sequence expansion #1182

Closed

Conversation

dginev
Copy link
Collaborator

@dginev dginev commented Aug 6, 2019

Fixes #1180 .

And likely breaks a range of hard-to-foresee arXiv jobs, and third party raw TeX uses. I think my fixes definitely need some review and discussion/thought, as I touched bindings I am not closely familiar with.

The general concept is simple enough - expansion must always have defined command sequence arguments. And all good uses of expansion already do. The breakage in latexml happens at the fringes -- e.g. bindings that take very Perl-y approaches and leave certain command sequences undefined, or raw interpretation that silently ends up accepting invalid inputs where it should alert us with an error that latexml needs further improvement. I gave two examples in the issue of error cases, which now work just the same as in pdflatex with this PR. And tests pass with my modifications.

@dginev
Copy link
Collaborator Author

dginev commented Aug 6, 2019

Also, @brucemiller , I suspect rebasing on this branch may offer extra insight into debugging expl3 and other related raw interpretation cases, if we're lucky. Could make it very explicit that expansion is incorrect at a certain point in processing.

@dginev
Copy link
Collaborator Author

dginev commented Aug 6, 2019

I've verified the showcase Tikz images work with this branch as well, which is reassuring.

sub neutralizeTokens {
my ($self, @tokens) = @_;
my @result = ();
foreach my $token (@tokens) {
if ($$token[1] == CC_PARAM) { # Inline ->getCatcode!
push(@result, $token); }
elsif (defined(my $defn = LaTeXML::Core::State::lookupDefinition($STATE, $token))) {
elsif (!defined(my $meaning = LaTeXML::Core::State::lookupMeaning($STATE, $token)) ||
defined(my $defn = LaTeXML::Core::State::lookupDefinition($STATE, $token))) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this is best left without either check, just always add the \noexpand. We have an example in the tests that needs \noexpand added for undefined CSes.

@brucemiller
Copy link
Owner

OK, I mixed up responses to PR's. There may be some good fixes here, but there's definitely some fishy stuff (esp. calc, siunitx); it seems a regression. And the motivating issue is about "incorrect" documents, nothing indicates that the issue or patches would change the behavior of a correct document (at least not for the better). It might make debugging easier by failing earlier.

@dginev
Copy link
Collaborator Author

dginev commented Aug 7, 2019

Ok, I will try to restore at least calculus back to doing ->readXToken, but it may entail different parts of the binding need a refactor. The problem is that some of the bindings are silently leveraging the lenience of ->readXToken to return undefined tokens as a feature, so when I introduce the texy error message, that technique breaks. So some best practice refactor needs to be invented.

Also, I agree this PR shouldn't break existing correct behaviour, not my intention. Was fishing for some feedback 👍

@dginev dginev force-pushed the error-on-undefined-cs-expand branch from 9dd4b12 to f1c00b4 Compare August 8, 2019 01:13
@dginev dginev force-pushed the error-on-undefined-cs-expand branch from f1c00b4 to 52a797e Compare August 8, 2019 01:14
@dginev
Copy link
Collaborator Author

dginev commented Aug 8, 2019

Ok @brucemiller I struggled some more and partially managed to simplify the binding modifications. The calc in particular should not worry you at all anymore. For the siunitx, I found a piece of trickery as well, but got scared for alignments, and allowed them to keep the old readXToken behavior for now. To switch to consistently expanding I would need to rewrite a part of \@@open@inner@column which is above my pay grade at this instant. That said, it is probably with long-term benefit to get that bit of code closer to the TeX behavior as well (though your halign work may be touching on that already...).

Turns out there are two new best practices one can use to save failing bindings that were leveraging the "expanding undefined command sequences" behavior:

  1. Use a noexpand trick:
    DefMacro('\minof',    '\noexpand\minof');
  2. In tighter local scopes, let to the other catcode of the csname:
    Let($token, T_OTHER($token->getCSName));

I've only needed them in the files you can see changed in the PR, but we may be missing tests for some of the other bindings... I don't really feel comfortable merging this very soon, but would definitely consider it relevant for the 1.0 milestone.

@dginev
Copy link
Collaborator Author

dginev commented Aug 8, 2019

Note to self: I should add the two TeX files from the issue as tests, to avoid regressing.

@dginev dginev changed the title TeX-faitfhul errors on undefined command sequence expansion [WIP] TeX-faitfhul errors on undefined command sequence expansion Aug 8, 2019
@dginev
Copy link
Collaborator Author

dginev commented Aug 13, 2019

Closing in favor of #1188

@dginev dginev closed this Aug 13, 2019
@dginev dginev deleted the error-on-undefined-cs-expand branch August 17, 2019 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discrepancy in expansion of undefined command sequences
2 participants