Skip to content

Preliminary {math } #811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Preliminary {math } #811

wants to merge 3 commits into from

Conversation

giltho
Copy link
Contributor

@giltho giltho commented Jan 22, 2022

⚠️ work in progress ⚠️

This is just to make progress on the discussion for #123.
This is a very simple implementation, and there are several points to discuss:

  • Is it ok to reuse the target-specific markup?
  • Should KaTeX be vendored? I started doing it, but there are ~70 font files to vendor in addition to the CSS and JS files. Since, dune hardcodes the support files, and it means we'd have to sync with a dune update and add some restrictions on the versions...
  • A defer should probably be added to the css and js loading, to reduce the amount of time it takes to load the page.
  • How to differentiate inline and block math? A solution is to have two targets: math and mathi. From the katex point of view, the only difference is a displayMode: true parameter for the render function
  • It would be nice to optionally have server-side rendering, but that would require spinning a node instance to call katex which is deeply annoying. It could be an option though

@Drup
Copy link
Contributor

Drup commented Jan 23, 2022

I would tend to say that KaTeX should not be vendored, especially since it means we need a webserver to serve the content.

Regarding inline/block ... what about using the surrounding context ? If it's in a block context, it's a block, otherwise, it's inlined ?

The question of static generation remains. We should indeed make an option for that. Is there no other alternative math-to-mathml converter ? What about HeVea ?

@giltho
Copy link
Contributor Author

giltho commented Jan 24, 2022

Why would a web server be needed to serve the content more than what's currently done with highlight.pack.js or odoc.css if KaTeX was vendored? It's just having the files accessible with the right path in the end, just like any other page

For the inline/block, that sounds reasonable! Avoids syntax overload, and it's pretty clear.

I like the idea of depending on KaTeX, given that it is widely used, and maintained by people I tend trust.

HeVea seems risky, I trust the maintainers just as much as the KaTeX maintainers, but don't know how much time they can spend on it. Also, it is a quite big project, and I think it's only an executable, built with a Makefile, no dune etc.. so making it a library would be very painful, and I don't think I'm not sure it can handle tex fragments anyway.

The optimal case would be to have a reliable JS interpreter written in OCaml that could simply run KaTeX without needing node installed, but I don't think that exists.

@dbuenzli
Copy link
Contributor

For the inline/block, that sounds reasonable! Avoids syntax overload, and it's pretty clear.

It's quite heavy though, e.g. if you want to write the equivalent of $x$. Why not follow the suggestion I made here ?

@giltho
Copy link
Contributor Author

giltho commented Jan 26, 2022

Why not follow the suggestion I made here ?

As I said, it adds a lot to the parser for something that is kind of already there. Implementing {m } and {%math %} would be mostly duplicating the logic of raw_markup in the parser.
That being said, I actually agree with you because I did not take into consideration how heavy the notation is to write very small math bits.
I'm going to push that PR

@dbuenzli
Copy link
Contributor

{m } and {%math %}

Note I'm note suggesting {%math %} there. I'm suggesting {%math: %} which presumably should be easy to implement.

@giltho
Copy link
Contributor Author

giltho commented Jan 26, 2022

Yes, see the PR I drafted, the block part is actually quite trivial.
I mean, the whole PR isn't that much more complicated, the comment about using only {%math: %} was that there was no change required to the parser if we did that.

I actually prefer {%math %} to {%math: %} to avoid conflict with the target-specific syntax, but it's a character to change in the PR so whichever you prefer

@dbuenzli
Copy link
Contributor

I actually prefer {%math %} to {%math: %} to avoid conflict with the target-specific syntax

I don't see a problem with that, it makes sense to me. What I write here specifically targets "maths", which may or may not be supported by a given backend. It's easier and more regular if we don't introduce a new syntax for that.

@giltho
Copy link
Contributor Author

giltho commented Jan 26, 2022

It's the same syntax for a different node in the AST, but I'm happy changing it!

giltho added a commit to giltho/odoc-parser that referenced this pull request Jan 26, 2022
giltho added a commit to giltho/odoc-parser that referenced this pull request Jan 26, 2022
@giltho
Copy link
Contributor Author

giltho commented Feb 2, 2022

Hello, would it be possible to get a review on the syntax upgrade ? Thanks!

Copy link
Collaborator

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a small problem with Raw_markup, backends are free to ignore markup they don't recognize. I expect the math markup to always be rendered, at least as an unstyled code span.
There's potentially many backends, for example the future documentation website or the new markdown generator.

Is someone writing alternative markup for every backends (eg. the same text but formatted differently) ? I suspect this is not how it is most used in practice and this is not compatible with allowing third party backends to exist.
Should we change the meaning of Raw_markup to be rendered at least as a code span on every backends ?

I'm fine with fetching katex from a CDN (I'm also fine with either distributing it with Odoc or not providing it at all), however there's a huge problem:
I don't want Katex to be loaded if it is not used. It should be loaded dynamically if it is needed (script and style).

Just a thought: We could add options to html-generate to load arbitrary scripts and styles. This way, the katex part of this PR can be implemented separately.
For now Dune doesn't allow passing arbitrary options but this should be fixed in Dune rather than worked around here.
This works well with the change to Raw_markup I proposed above. The HTML backend could generate a class name from the raw markup's target to be able to recognize them from user scripts.

@giltho
Copy link
Contributor Author

giltho commented Feb 3, 2022

Answering the review on the syntax update here at the same time.

First of all answering these two comments:

Why is the Raw_markup syntax not enough for your use case ? Is it just to provide a nicer syntax ?

Anyway, I think the parsing of {%math: %} shouldn't be modified to allow the PR on ocaml/odoc to live whether or not this PR is accepted.
Even better, {m ... } can simply be a sugar for the raw markup syntax, to avoid introducing breaking changes.

and

There's a small problem with Raw_markup, backends are free to ignore markup they don't recognize. I expect the math markup to always be rendered, at least as an unstyled code span.
There's potentially many backends, for example the future documentation website or the new markdown generator.
Is someone writing alternative markup for every backends (eg. the same text but formatted differently) ? I suspect this is not how it is most used in practice and this is not compatible with allowing third party backends to exist.
Should we change the meaning of Raw_markup to be rendered at least as a code span on every backends ?

I do think that this is why we shouldn't be using Raw_markup actually, because it is not something that should be ignored by any backend, while raw_markup is specifically something that is only target at some backends and not others. Modifying the meaning of Raw_markup now seems a bit risky.
Plus, even in latex, `Math and `Raw_markup have different meanings.

Because odoc-parser is mostly only used by odoc itself, I don't think it's an issue to change the constraint on odoc-parser's version in this PR.

I'm fine with fetching katex from a CDN (I'm also fine with either distributing it with Odoc or not providing it at all), however there's a huge problem:
I don't want Katex to be loaded if it is not used. It should be loaded dynamically if it is needed (script and style).

That is a very good point actually! But I believe this could be decided at compile time? Because pages are generated separately, and the header is added to each page, we could simply detect the presence of `Math AST nodes and only in their presence add KaTeX to the header (cdn, script and CSS). Even better, the script would only be loaded on pages that need it (although once it's in cache, it wouldn't change much - but it would make a difference for documentations with no math at all).

@Julow
Copy link
Collaborator

Julow commented Feb 3, 2022

Indeed, changing `Raw_markup means removing a feature. I've looked at some packages (2300) and found not many (around 80) uses of raw markup.

Most of the matches are for images, which are always html-only. Some examples:

all_of_opam/chartjs-streaming.0.2.2/plugins/datalabels/chartjs_datalabels.mli:185:      alt="chartjs-datalabels"></img> %}
all_of_opam/oplot.0.50/lib/oplot.mli:44:        {%html:<img src="example.png" class="oplot" alt="oplot example">%}
all_of_opam/vg.0.9.4/src/vg.mli:1236:{%html: <img src="../_assets/doc-anz.png" style="width:90mm; height:30mm;"/> %}}
all_of_opam/obandit.0.3.4/src/obandit.mli:15:  {%html: <img src="http://freux.fr/kroliki.jpg" alt="rabbits" width="600"> %} 
all_of_opam/tree_layout.0.2/src/treemaps.mli:9:    {%html: <a href="https://drup.github.io/tree_layout/treemap.svg"><img style="margin : 0 auto; display: block; max-width:60%" src="https://drup.github.io/tree_layout/treemap.svg" /></a> %}
all_of_opam/preface.0.1.0/lib/preface_specs/preface_specs.mli:13:      </center>%} *)

A math syntax seems needed. There's dozens of instances of math notations written in code spans or other weird ways.

all_of_opam/dose3.7.0.0/src/common/edosSolver.mli:71:  (** [add_rule st l] add a disjuction to the solver of type {%html: \Bigvee l %} *)
all_of_opam/sundialsml.3.1.1p1/src/idas/idas.mli:105:      \mathtt{t} \leq t_n$%}—where $t_n$ denotes {!Ida.get_current_time}

There's literally one instance of the raw markup syntax being used to have alternative rendering in different backends. Of course, it is for math notation:

all_of_opam/coq.8.13.2/kernel/constr.mli:125:    {%html:(f t<sub>1</sub> ... t<sub>n</sub>)%}
all_of_opam/coq.8.13.2/kernel/constr.mli:126:    {%latex:$(f~t_1\dots f_n)$%}. *)

Is Opium using its own language ? There's weird things like:

{%html: <pre>
HTTP/1.1 200 
Content-Type: text/plain

Hello World </pre>%} *)`

and

    {% See <<a_manual chapter="functors"|the manual of the functorial interface>>. %} *)

I was surprised to not find many <table> there's a few instances of:

{%html:
<table>
<tr> <td>Some a <td>-> <td>return a
<tr> <td>None <td>-> <td>empty
</table>
%}

I think it's worth reconsidering this feature. Having new syntaxes for images and math would remove almost all the current uses for raw markup.

Plus, even in latex, Math and Raw_markup have different meanings.

How do they differ? There's no `Math in your current PR however.

Because odoc-parser is mostly only used by odoc itself

It is used by mdx and ocamlformat (for which an AST change is breaking). Though, if we actually want to change the parser, we can fix this by vendoring the parser in ocamlformat for example.

But I believe this could be decided at compile time?

Compiler time is even better. The runtime solution is more suitable to be implemented outside of Odoc, with my raw_markup proposal.

@giltho
Copy link
Contributor Author

giltho commented Feb 3, 2022

How do they differ? There's no `Math in your current PR however.

In what I'd want to do, writing {m x}, from the point of view of the latex generator, is the equivalent of writing {%latex: $x$}. Obviously, the latter would only be rendered in latex while the former would be rendered everywhere.
Indeed, `Math` is not in this PR, because I'd need to wait for the existence of the Math node in odoc-parser before adding it here.

Ah, I just checked mdx and ocamlformat opam files, and it is a bit annoying that they depend on ocamlformat >= 1.0.0 and not = 1.0.0. That being said, a PR on opam could fix this, and an upgrade in odoc-parser would not be too much of an issue.

I do agree that raw_markup would be worth reconsidering, but maybe simply deprecated? Maybe there's some very old code out there that uses it.

@Drup
Copy link
Contributor

Drup commented Feb 8, 2022

Raw_markup is pretty essential for lot's of things. Until we have better stories for pictures and table, let's not try to remove it. :)
And yes, the semantic of Raw_markup is precisely that each backend is free to render as it pleases, including ignoring it. In particular some of the "weird" stuff than @Julow reported come from the use of custom ocamldoc backends.

We can decide (and I'm rather in favour) that math is sufficiently important to deserve some mainline syntax and dedicated handling. cc @jonludlam :)

Compile-time detection for call to katex is a good idea.

@jonludlam
Copy link
Member

I like having a specific math-specific syntax rather than just hijacking the backend-specific syntax.

{m ... } seems good for inline (with latex math syntax). Do we definitely want {%math ...} for display syntax? What about {math ...}?

I don't think we should change the meaning Raw_markup, nor deprecate it. It seems a useful escape hatch to have, plus we still want to be able to produce the docs for all those packages that are currently using it. Perhaps though we can provide alternatives for those people still wanting it - obviously I'll be very happy to recommend people start using the specific math markup in preference, as well as any img or table that may make it into odoc-parser at some point in the future. These 'first-class' markup constructs can have well defined meanings in all backends, even if that is simply being rendered as verbatim/code blocks. We should eventually get to the point that all current users of Raw_markup have better alternatives and at that point we can deprecate it.

@dbuenzli
Copy link
Contributor

Compile-time detection for call to katex is a good idea.

+1, but please no CDN for katex or mathjax (it's unclear to me which one is better, I think both are essentially compatible).

odig's docsets have always worked offline and I don't think it should change – if only for air gapped settings like you might find yourself if you use stuff like bap.

{%math ...} for display syntax? What about {math ...}?

If the % is dropped I'm fine with {math …}. The only thing I wanted to avoid is to have a subtle syntax {%id …} vs {%id: …} syntax.

@Drup
Copy link
Contributor

Drup commented Feb 11, 2022

We discussed this yesterday. Everyone agree that we should not use Raw_markup for this, and it's more than time to add new constructs for the things we want to properly support.

Since we are making it a "proper" construct, Let's drop the % like all other first-class constructs, so {m and {math.

@giltho
Copy link
Contributor Author

giltho commented Feb 11, 2022

+1, but please no CDN for katex or mathjax (it's unclear to me which one is better, I think both are essentially compatible).

The issue is that the alternative is to vendor ~70 files for KaTeX or MathJax. In the case of odig, it would probably mean that these 70 files are copy-pasted for every library, which is probably wasted space

One work-around could be:

  • An environment variable such as ODOC__LATEX_SUPPORT_FOLDER, if set, replaces the CDN with a concrete location for the support files
  • The support-files command of odoc is extended to provide these files
  • odig makes sure to place the katex files in one place and points, using the env var, to that place
  • Dune, by default, creates the support files, but can be asked not to (in the case of odig for example)

The annoying thing is that, detecting if KaTeX is needed is easy for individual pages, so we can decide if we need the header at compile time, but it's difficult for dune to decide if the support files are needed or not... But well, it's 70 files, but they will only be made the first time you build the doc

[EDIT: I originally suggested default behaviour should be to use a CDN, but if the things are vendored anyway, the only thing we need to go around is the massive copy-pasting for odig]

@dbuenzli
Copy link
Contributor

In the case of odig, it would probably mean that these 70 files are copy-pasted for every library

Why would that be the case ? There's a single assets directory for the whole doc set.

This should simply be added to odoc support-files, all the bits are already in place. There's no need to make that more complicated than it should be.

@giltho
Copy link
Contributor Author

giltho commented Feb 11, 2022

Ah, I didn't know it was the case, then it's perfect. Sorry for over-thinking this haha

Good - then I'll add that to the PR, and I'll modify the syntax PR to use {math ...}

@dbuenzli dbuenzli changed the title Preliminary {%math: } Preliminary {math } Feb 11, 2022
jonludlam pushed a commit to ocaml-doc/odoc-parser that referenced this pull request Jun 9, 2022
@jonludlam
Copy link
Member

This work is continued in #886 - thanks, @giltho !

@jonludlam jonludlam closed this Aug 22, 2022
@giltho
Copy link
Contributor Author

giltho commented Aug 22, 2022

Gah, sorry I dropped the ball on that one again :( Thank you for continuing it :)

@jonludlam
Copy link
Member

No worries, your contributions live on in the new PR - if you felt like reviewing it that'd be useful though :-)

jonludlam pushed a commit to jonludlam/odoc that referenced this pull request Jun 19, 2023
jonludlam pushed a commit that referenced this pull request Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants