Skip to content

Handling documentation that takes a long time to build #2881

Open
@JoeZiminski

Description

@JoeZiminski

At present there are a few examples the documentation pages that take a long time to build or require files not available to user. These are located in the examples/how_to folder. Ideally, these would use an easily obtainable dataset so the user could run the scripts alongside the example. But this a relatively easy thing to address (e.g. using SI generate recordings machinery or making specific datasets publicly available for si.download_dataset) and not the purpose of this issue.

In part because these pages (at least the handle drift page) take a long time to run they need to be handled differently to the other sphinx-gallery pages. It can take (in this PR, swapping out the file for SI generate recording machinery) around 25 minutes, so too long for everyday use. At present, these how_to pages must be built manually and the output moved to the docs/how_to folder. The benefit of this is that this part of the documentation is only built when it needs to be. However, it would be nice to extend this method such that:

  1. We can use the sphinx-gallery machinery that the rest of the docs are built with
  2. avoid having to build manually

but still:
3) build this longer-part of the docs only when required.

There are two methods suggested below, the first achieves (1), (2) but does result in some unnecessary re-builds of the long docs in the CI. I am not familiar enough with the docs building CI so would be good to get feedback. The second method acheives (1), (3) and simplifies the building to a single command, but it must still be run manually and the resulting page cannot be easily included in a sphinx-gallery.

Method 1

This way is exemplified in #2879 it would be great to get feedback on, using the 'Handle drift' how-to as an example. The page is still written as a .py file, now in sphinx-gallery style and stored with the rest of the tutorials. However, the filename is long_plot_handle_drift.py and so by default, sphinx-gallery does not run the script / build with plots. Instead it only converts it to RST and builds very quickly, there are code examples but no plots. To build with plots, a 'tag' is added to thesphinx-build command that is handled in the conf.py (e.g. sphinx-build -b html doc ./doc/_build/ -t handle_drift). When the tag is provided "long_plot_handle_drift" is added to the regexp that finds docs to fully build and the page is build with plots.

if tags.has("handle_drift") or tags.has("all_long_plot"):
     sphinx_gallery_conf["filename_pattern"] += '|/long_plot_handle_drift'

This way, for day-to-day use the docs can be built such that the long-form tutorials are not fully run. However, when building for release or working specifically on these pages, you can use the tag to build them. The problem is that in the CI, I am not sure exactly when the docs are built. If the (full-release) build time for the docs goes up to around 25 minutes, if run only on merge, or version release, is it a problem? The downside of this compared to the current method is these pages will be rebuilt in the CI at some stage, even if they have not been changed. The upside is that at least it checks they are working, as it's possible to forget if running manually.

Method 2

Method 2 is similar to method 1, it basically replicates the current way of doing it but makes it easier to run. In this case, you store long-to-build pages in a separate examples/long_tutorials folder. These will be output to another folder (docs/long_tutorials) when build with sphinx-gallery. Unlike docs/tutorials, docs/long_tutorials is not added to the .gitignore (just like at present, the docs/howto is not in the gitignore). Now, instead of using the regexp to fully build the pages, the conf.py contains:

if tags.has("handle_drift") or tags.has("all_long_plot"):
     sphinx_gallery_conf['examples_dirs'].append('../examples/long_tutorials/handle_drift'])
     sphinx_gallery_conf['gallery_dirs'].append('long_tutorials/handle_drift')

Now when building with sphinx-build -b html doc ./doc/_build/ -t handle_drift these pages are rebuilt as a sphinx-gallery and the outputs in docs/long_tutorials is overwritten (unless there are no changes since last build). The outputs permentantly sit in long_tutorials (just like they do now in how to) and can be rebuilt and pushed at will.

A downside is that they are a separate gallery, so you need to link to these pages manually (e.g. from How To). Alternatively, we could have two tutorials gallery, one is quick-to-run and the other are slower to run (as on the user end, even though they can now follow along with the tutorial it takes 25 mins to run).

I think I am slightly in favour of 'method `1' because you can integrate it fully with existing pages, it is a bit simpler and fully automated. However, maybe it will cause problems with the CI. Will be great to hear what people think.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions