Build: output `_readthedocs/` directories feedback after deploy #9931

humitos · 2023-01-24T17:51:54Z

We were able build other formats (PDF, HTMLZip) by outputting the files into _readthedocs/{pdf,htmlzip} directories: https://test-builds.readthedocs.io/en/all-formats-build-jobs/

We found some limitations:

the name of the PDF must be _readthedocs/pdf/<project-slug>.pdf
the name of the HTMLZip must be _readthedocs/htmlzip/project-slug>.zip
double-check if it happens the same for ePUB 😄
we are only checking for the existence of _readthedocs/<format> but we are not checking if it's a directory at

readthedocs.org/readthedocs/projects/tasks/builds.py

Line 539 in e1e7a18

pdf = os.path.exists(

Also, while debugging this we hit #9468

The text was updated successfully, but these errors were encountered:

humitos · 2023-01-24T18:08:27Z

I just realized that we had another issue: the build process raised an exception at Uploading state while uploading things to the storage backend and the built got stuck. This is because the upload step happens inside on_success instead of execute as the other things.

I've already moved this to the right place in #9800

Move upload to storage function inside Celery.execute handler to allow us raising UnsupportedSymlinkFileError and using on_failure to clearly communicate the error to the user

humitos · 2023-01-25T10:45:27Z

the name of the PDF must be _readthedocs/pdf/<project-slug>.pdf

the name of the HTMLZip must be _readthedocs/htmlzip/project-slug>.zip

double-check if it happens the same for ePUB smile

I'm not convinced about how to remove this restriction, yet. I don't want to rename the file on behalf the user because that would be incompatible with "Multiple pdf files for single project" since we don't want to rename all the user's PDF filenames (e.g. if they have Tutorial-Basic.pdf and Tutorial-Advanced.pdf we want to keep those names).

We may want to prefix/suffix the project/version slug, tho. In any case, we will need to save the filename in our database to know what's the PDF the user wants to download.

However, for now, we could do a listdir on S3 bucket and serve the first filename found --since it will always be one.

humitos · 2023-01-25T11:21:54Z

I think my initial proposal to solve this problem would be to return a .pdf or a .zip file depending on the number of files in the output directory:

one (1) file
- rename the file at upload time (e.g. Tutorial-Basic.pdf -> $READTHEDOCS_PROJECT_SLUG) --this is the new part
- return the .pdf with a name changed (this is the current behavior)
multiple (>1) files
- zip all the .pdf files without renaming them into a .zip file with the name we choose
- return a .zip file

This is the simpler implementation for this, since we don't have to change the UI (add one link per .pdf file, for example) nor change the filenames we are already using. At serving time, we will need to know if the version has one or multiple PDFs to know the extension of the file we have to serve (.zip or .pdf). The name will always be $READTHEDOCS_PROJECT_SLUG

It checks for `_readthedocs/<format>` output directory to be a directory. If it's not a directory, it silently skip that format for now. This behavior will change once we move out `store_build_artifacts` from `on_success` and we can communicate errors easily to the user. Related #9931

humitos · 2023-01-25T12:58:08Z

I just realized that we had another issue: the build process raised an exception at Uploading state while uploading things to the storage backend and the built got stuck. This is because the upload step happens inside on_success instead of execute as the other things.

I've done this in #9941 👍🏼

agjohnson · 2023-01-25T18:37:22Z

@humitos What about doing this more explicitly and adding a zip file output download format instead?

It feels maybe a bit hacky to do this automatically for projects. It seems like we might be building an edge case that will frustrate us at some point -- indeterminately returning non-pdf content from our pdf endpoints.

If our long term plan does not include distributing zip archives, this will be something we have to work around later too. If we do ever support multiple output files natively, we'll have single pdf builds, multiple pdf builds, and legacy zip archive pdfs too.

Supporting zip archives for downloads actually might be a fine way to solve this UX for good. Instead of supporting arbitrary numbers of arbitrary files in our UX, users can just opt to output a zip file with whatever they want.

I might lean towards this approach at this point. In the case we support multiple PDF files in output, we have to change all the UI/UX around "Download PDF" to now also list the PDFs to download. If we add a zip format, we don't, and this batch of work is infinitely easier.

Also, it's worth noting that we don't need to rush on supporting multiple PDF files. We don't support this now, and it's fine if our first implementation of _readthedocs/pdf/ doesn't yet support this too.

humitos · 2023-01-30T12:30:51Z

@agjohnson what I'm proposing here in this issue comes from a small chat we had with Eric at #9888 (comment)

humitos · 2023-01-30T12:57:11Z

the name of the PDF must be _readthedocs/pdf/<project-slug>.pdf

the name of the HTMLZip must be _readthedocs/htmlzip/project-slug>.zip

double-check if it happens the same for ePUB smile

I'm not convinced about how to remove this restriction, yet. I don't want to rename the file on behalf the user because that would be incompatible with "Multiple pdf files for single project" since we don't want to rename all the user's PDF filenames (e.g. if they have Tutorial-Basic.pdf and Tutorial-Advanced.pdf we want to keep those names).

I won't remove this restriction yet without more conversation. For now, I think it will be enough by documenting how it works. Basically, people need to use READTHEDOCS_PROJECT environment variable for the filename of them. Example:

 cd _readthedocs ; zip -r -j htmlzip/$READTHEDOCS_PROJECT.zip $READTHEDOCS_PROJECT
 mv _readthedocs/simplepdf/"Test Builds.pdf" _readthedocs/pdf/$READTHEDOCS_PROJECT.pdf

(example from https://readthedocs.org/projects/test-builds/builds/19268359/)

agjohnson · 2023-01-30T17:22:37Z

Requiring a specific filename still seems like a straightforward approach, I agree that documenting this behavior is enough for this first pass. It keeps the implementation outcome the same, but is still a step towards more user control.

I'm also still fine taking more time on multiple pdf/epub support, we don't need this feature to wrap up this change. There are UI and UX questions for both maintainers and readers with all three approaches here, and I'm not very convinced on any of them yet.

* Build: check if the output directory is a directory It checks for `_readthedocs/<format>` output directory to be a directory. If it's not a directory, it silently skip that format for now. This behavior will change once we move out `store_build_artifacts` from `on_success` and we can communicate errors easily to the user. Related #9931 * Build: refactor logic to validate artifacts output paths Wrap all the logic into a class method so it's easier to follow and read. It also remove some duplicated checks of `os.path.exists`, making sure we are always using _validated_ paths already. * Build: communicate uploading errors to users Running `store_build_artifacts` from inside `.execute()` method instead of from `.on_sucess()` allow us to raise any exception and handle it properly. We can raise a `BuildUserError` to clearly communicate the specific error to the user, or any other exception to show the generic message: "Try again or contact us". Besides, this allows us to *fail the build* instead of continuing silently without the user noticing this: - there more than 1 file of formats that don't support multiple files - the output path is not a directory - there were errors uploading the artifacts to the storage * Build: validate the output directory has at least 1 file In case the `_readthedocs/<format>` directory is created by it contains 0 files, we fail the build and communicate this to the user. * Build: constant for artifact types that don't support multiple files

humitos · 2023-01-31T12:07:49Z

This is already fixed. We've created other issues to track the missing parts. I'm closing it.

humitos added Bug A bug Accepted Accepted issue on our roadmap labels Jan 24, 2023

humitos self-assigned this Jan 24, 2023

This was referenced Jan 25, 2023

Build: check if the output directory is a directory #9939

Closed

Build: communicate uploading errors to users #9941

Merged

humitos mentioned this issue Jan 25, 2023

Build: refactor logic to validate artifacts output paths #9940

Closed

humitos closed this as completed Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build: output `_readthedocs/` directories feedback after deploy #9931

Build: output `_readthedocs/` directories feedback after deploy #9931

humitos commented Jan 24, 2023

humitos commented Jan 24, 2023 •

edited

Loading

humitos commented Jan 25, 2023

humitos commented Jan 25, 2023

humitos commented Jan 25, 2023

agjohnson commented Jan 25, 2023 •

edited

Loading

humitos commented Jan 30, 2023

humitos commented Jan 30, 2023

agjohnson commented Jan 30, 2023

humitos commented Jan 31, 2023

Build: output _readthedocs/ directories feedback after deploy #9931

Build: output _readthedocs/ directories feedback after deploy #9931

Comments

humitos commented Jan 24, 2023

humitos commented Jan 24, 2023 • edited Loading

humitos commented Jan 25, 2023

humitos commented Jan 25, 2023

humitos commented Jan 25, 2023

agjohnson commented Jan 25, 2023 • edited Loading

humitos commented Jan 30, 2023

humitos commented Jan 30, 2023

agjohnson commented Jan 30, 2023

humitos commented Jan 31, 2023

Build: output `_readthedocs/` directories feedback after deploy #9931

Build: output `_readthedocs/` directories feedback after deploy #9931

humitos commented Jan 24, 2023 •

edited

Loading

agjohnson commented Jan 25, 2023 •

edited

Loading