Skip to content

Experimental --as-json option for the HTML renderer #908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 30, 2022

Conversation

sabine
Copy link

@sabine sabine commented Nov 23, 2022

This adds an experimental flag --as-json to odoc html to emit HTML fragments (preamble, content) together with JSON metadata (whether the page uses katex, breadcrumbs, table of contents).

--as-json replaces the --omit-toc, --omit-breadcrumbs, and --content-only options of odoc html.

Purpose:
Make it simpler and more robust to embed the HTML output of odoc in websites, such as ocaml.org, which is currently missing all preambles and has no reliable way to distinguish between .mld pages and modules.

Notes:

  • In order to abstract breadcrumb rendering, we treat breadcrumbs the same as the table of contents: there is now an intermediate data type breadcrumb used by the HTML renderer, from which either HTML or a JSON representation is rendererd.
  • In order to avoid introducing a JSON encoder dependency to odoc, @dbuenzli contributed the JSON encoder contained in this PR. All remaining additions and changes are to be blamed on me.

This PR enables us to fix the following issues on ocaml.org:

@jonludlam
Copy link
Member

I like this, it's a lot neater than the more general approach I was initially taking. My only objection, as before, is the json encoder. I'm content to live with it though. I also am aware that the compiler is also looking to output structured data in a parsable format for error/warning reporting for tooling, so I hope that encoder ends up exposed in a reasonable way and we should aim to switch to it as soon as we can.

@sabine
Copy link
Author

sabine commented Nov 29, 2022

Indeed, it looks to me like the next best alternative to wrapping up the HTML preamble and content block together with the corresponding metadata (what this PR does) would be to emit the structured data of the document and have the embedding site/product perform all of the HTML rendering.

However, it looks to me like the HTML+metadata emitted here is likely to be more convenient to integrate into web-browser-based products.

@dbuenzli
Copy link
Contributor

Is the html-targets command updated accordingly by this PR ? (a very quick look seems to indicate that it's not the case).

@sabine
Copy link
Author

sabine commented Nov 29, 2022

Ah, sorry about html-targets, you're correct, it's not updated. I'm fixing that.

This adds an experimental flag `--as-json` to `odoc html` to
emit HTML fragments (preamble, content) together with
JSON metadata (whether the page uses katex, breadcrumbs,
table of contents).

`--as-json` replaces the `--omit-toc`, `--omit-breadcrumbs`,
and `--content-only` options of `odoc html`.

Purpose:
Make it simpler and more robust to embed the HTML output
of odoc in websites, such as ocaml.org.

Notes:
- In order to abstract breadcrumb rendering, we treat breadcrumbs
the same as the table of contents: there is now an intermediate
data type `breadcrumb` used by the HTML renderer,
from which either HTML or a JSON representation is rendererd.
- In order to avoid introducing a JSON encoder dependency to odoc,
@dbuenzli contributed the JSON encoder contained in this PR.
All remaining additions and changes are to be blamed on @sabine.

Co-authored-by: Daniel Bünzli <daniel.buenzli@erratique.ch>
@sabine
Copy link
Author

sabine commented Nov 29, 2022

I was too quick to assume a problem with html-targets. There seems to be no need to update anything, as the filenames are obtained from the renderer. In the force-push, I added a test that confirms that html-targets returns the .html.json filepaths (https://github.com/ocaml/odoc/pull/908/files#diff-8f34ee7dd9df5690704237b21e8125187f6da349ce8be06a25446635b2c5a36fR24).

Copy link
Collaborator

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the API that I dreamed to see. I think it makes the API more complicated than needed to have a tree of JSON files instead of a single structured output.
I believe a new sub-command that outputs its output on stdout would be more convenient. It would also allow to output the list of subpages, parent-child, canonical path, whether it has source code (coming soon) in a structured way.

With that said, I think the approach here is fine and I don't want to block it.

I've submitted a PR to your PR to add a test about subpages: sabine#1

Test --as-json in the presence of subpages
@sabine
Copy link
Author

sabine commented Nov 29, 2022

Thank you @Julow, your PR that adds better tests is merged.

I believe the issue of wanting to have one single structured output is orthogonal to the HTML+metadata output and could be made as a separate PR, improving on this one.

@jonludlam
Copy link
Member

That's not the API that I dreamed to see. I think it makes the API more complicated than needed to have a tree of JSON files instead of a single structured output.
I believe a new sub-command that outputs its output on stdout would be more convenient. It would also allow to output the list of subpages, parent-child, canonical path, whether it has source code (coming soon) in a structured way.

Though a benefit to the approach in this PR is that dynamically loading pages becomes an awful lot quicker, especially for large libraries that contain a lot of modules. A build of core produces 51Mb of html files, the largest of which is 400k.

@jonludlam
Copy link
Member

It's important to note that this interface is experimental, and we should feel we can change the output format without preserving any kind of backward compatibility. I've seen the warnings in the CHANGES and in the help, but I'm quite tempted to output something to stderr when running html-generate too.

@dbuenzli
Copy link
Contributor

but I'm quite tempted to output something to stderr when running html-generate too.

Please don't, that will just annoy people who are using the stuff anyways.

@Julow
Copy link
Collaborator

Julow commented Nov 29, 2022

could be made as a separate PR, improving on this one.

I'm not sure. This API will be used I suppose so such a change would either be breaking or add a new command without removing the --as-json option.

Though a benefit to the approach in this PR is that dynamically loading pages becomes an awful lot quicker, especially for large libraries that contain a lot of modules

A different format could be used. For example, parsexp allows reading concatenated sexp one by one. There would be a first object containing the list of output, then one per output.

@jonludlam
Copy link
Member

but I'm quite tempted to output something to stderr when running html-generate too.

Please don't, that will just annoy people who are using the stuff anyways.

I just really don't want people to be upset when we inevitably change something here. Maybe I'm being paranoid? Perhaps --void-warranty to enable the feature :-)

@dbuenzli
Copy link
Contributor

I just really don't want people to be upset when we inevitably change something here.

People get upset for all sorts of valid and invalid reasons. The only thing you can do is have your own peace of mind. Marking the feature as unstable both in the release notes and in the docs seems enough to counter any future anger.

@sabine
Copy link
Author

sabine commented Nov 30, 2022

To clarify about the output format: For ocaml.org, having these individual .html.json files (that directly correspond to the .html files odoc would otherwise output) is convenient to work with: every URL can be served by looking up one .html.json file and rendering the full page. So if there is a different output format, it is highly likely that our documentation pipeline would break up the output to store it in individual files.

@Julow
Copy link
Collaborator

Julow commented Nov 30, 2022

The pipeline has all the time it needs to present the data like it wants, Odoc shouldn't worry about that.
However, it is useful to consider very large outputs, the tree of json doesn't look so bad after all.

@jonludlam
Copy link
Member

The tests were broken on 4.02/3 due to ordering issues on the command line. Have pushed a fix.

@jonludlam jonludlam merged commit 08ea62d into ocaml:master Nov 30, 2022
@sabine sabine deleted the as_json branch December 1, 2022 13:34
mseri pushed a commit to ocaml/opam-repository that referenced this pull request Dec 13, 2022
CHANGES:

Additions
- New unstable option `--as-json` for the HTML renderer that emits HTML
  fragments (preamble, content) together with metadata (table of contents,
  breadcrumbs, whether katex is used) in JSON format. (@sabine, ocaml/odoc#908)
- New maths support via `{m ... }` and `{math ... }` tags. (@giltho, @gpetiot, ocaml/odoc#886)
- Various optimisations (@jonludlam, ocaml/odoc#870, ocaml/odoc#883)
- Better handling of alerts and deprecation notices. (@panglesd, ocaml/odoc#828)
- Handle language tags on code blocks (@Julow, ocaml/odoc#848)

Bugfixes
- Shadowing issues (@jonludlam, ocaml/odoc#853)
- Layout fixes and improvements (@panglesd, ocaml/odoc#832, ocaml/odoc#839, ocaml/odoc#847)
- Handle comments on class constraints and inherit (@Julow, ocaml/odoc#844)
- Disable the missing root warning (@jonludlam, ocaml/odoc#881)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants