Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: output _readthedocs/ directories feedback after deploy #9931

Closed
humitos opened this issue Jan 24, 2023 · 9 comments
Closed

Build: output _readthedocs/ directories feedback after deploy #9931

humitos opened this issue Jan 24, 2023 · 9 comments
Assignees
Labels
Accepted Accepted issue on our roadmap Bug A bug

Comments

@humitos
Copy link
Member

humitos commented Jan 24, 2023

We were able build other formats (PDF, HTMLZip) by outputting the files into _readthedocs/{pdf,htmlzip} directories: https://test-builds.readthedocs.io/en/all-formats-build-jobs/

We found some limitations:

  • the name of the PDF must be _readthedocs/pdf/<project-slug>.pdf
  • the name of the HTMLZip must be _readthedocs/htmlzip/project-slug>.zip
  • double-check if it happens the same for ePUB 😄
  • we are only checking for the existence of _readthedocs/<format> but we are not checking if it's a directory at

Also, while debugging this we hit #9468

@humitos humitos added Bug A bug Accepted Accepted issue on our roadmap labels Jan 24, 2023
@humitos humitos self-assigned this Jan 24, 2023
@humitos
Copy link
Member Author

humitos commented Jan 24, 2023

I just realized that we had another issue: the build process raised an exception at Uploading state while uploading things to the storage backend and the built got stuck. This is because the upload step happens inside on_success instead of execute as the other things.

I've already moved this to the right place in #9800

Move upload to storage function inside Celery.execute handler to allow us raising UnsupportedSymlinkFileError and using on_failure to clearly communicate the error to the user

@humitos
Copy link
Member Author

humitos commented Jan 25, 2023

  • the name of the PDF must be _readthedocs/pdf/<project-slug>.pdf
  • the name of the HTMLZip must be _readthedocs/htmlzip/project-slug>.zip
  • double-check if it happens the same for ePUB smile

I'm not convinced about how to remove this restriction, yet. I don't want to rename the file on behalf the user because that would be incompatible with "Multiple pdf files for single project" since we don't want to rename all the user's PDF filenames (e.g. if they have Tutorial-Basic.pdf and Tutorial-Advanced.pdf we want to keep those names).

We may want to prefix/suffix the project/version slug, tho. In any case, we will need to save the filename in our database to know what's the PDF the user wants to download.

However, for now, we could do a listdir on S3 bucket and serve the first filename found --since it will always be one.

@humitos
Copy link
Member Author

humitos commented Jan 25, 2023

I think my initial proposal to solve this problem would be to return a .pdf or a .zip file depending on the number of files in the output directory:

  • one (1) file
    • rename the file at upload time (e.g. Tutorial-Basic.pdf -> $READTHEDOCS_PROJECT_SLUG) --this is the new part
    • return the .pdf with a name changed (this is the current behavior)
  • multiple (>1) files
    • zip all the .pdf files without renaming them into a .zip file with the name we choose
    • return a .zip file

This is the simpler implementation for this, since we don't have to change the UI (add one link per .pdf file, for example) nor change the filenames we are already using. At serving time, we will need to know if the version has one or multiple PDFs to know the extension of the file we have to serve (.zip or .pdf). The name will always be $READTHEDOCS_PROJECT_SLUG

humitos added a commit that referenced this issue Jan 25, 2023
It checks for `_readthedocs/<format>` output directory to be a directory.
If it's not a directory, it silently skip that format for now.
This behavior will change once we move out `store_build_artifacts` from
`on_success` and we can communicate errors easily to the user.

Related #9931
@humitos
Copy link
Member Author

humitos commented Jan 25, 2023

I just realized that we had another issue: the build process raised an exception at Uploading state while uploading things to the storage backend and the built got stuck. This is because the upload step happens inside on_success instead of execute as the other things.

I've done this in #9941 👍🏼

@agjohnson
Copy link
Contributor

agjohnson commented Jan 25, 2023

@humitos What about doing this more explicitly and adding a zip file output download format instead?

It feels maybe a bit hacky to do this automatically for projects. It seems like we might be building an edge case that will frustrate us at some point -- indeterminately returning non-pdf content from our pdf endpoints.

If our long term plan does not include distributing zip archives, this will be something we have to work around later too. If we do ever support multiple output files natively, we'll have single pdf builds, multiple pdf builds, and legacy zip archive pdfs too.

Supporting zip archives for downloads actually might be a fine way to solve this UX for good. Instead of supporting arbitrary numbers of arbitrary files in our UX, users can just opt to output a zip file with whatever they want.

I might lean towards this approach at this point. In the case we support multiple PDF files in output, we have to change all the UI/UX around "Download PDF" to now also list the PDFs to download. If we add a zip format, we don't, and this batch of work is infinitely easier.

Also, it's worth noting that we don't need to rush on supporting multiple PDF files. We don't support this now, and it's fine if our first implementation of _readthedocs/pdf/ doesn't yet support this too.

@humitos
Copy link
Member Author

humitos commented Jan 30, 2023

@agjohnson what I'm proposing here in this issue comes from a small chat we had with Eric at #9888 (comment)

@humitos
Copy link
Member Author

humitos commented Jan 30, 2023

  • the name of the PDF must be _readthedocs/pdf/<project-slug>.pdf
  • the name of the HTMLZip must be _readthedocs/htmlzip/project-slug>.zip
  • double-check if it happens the same for ePUB smile

I'm not convinced about how to remove this restriction, yet. I don't want to rename the file on behalf the user because that would be incompatible with "Multiple pdf files for single project" since we don't want to rename all the user's PDF filenames (e.g. if they have Tutorial-Basic.pdf and Tutorial-Advanced.pdf we want to keep those names).

I won't remove this restriction yet without more conversation. For now, I think it will be enough by documenting how it works. Basically, people need to use READTHEDOCS_PROJECT environment variable for the filename of them. Example:

 cd _readthedocs ; zip -r -j htmlzip/$READTHEDOCS_PROJECT.zip $READTHEDOCS_PROJECT
 mv _readthedocs/simplepdf/"Test Builds.pdf" _readthedocs/pdf/$READTHEDOCS_PROJECT.pdf 

(example from https://readthedocs.org/projects/test-builds/builds/19268359/)

@agjohnson
Copy link
Contributor

Requiring a specific filename still seems like a straightforward approach, I agree that documenting this behavior is enough for this first pass. It keeps the implementation outcome the same, but is still a step towards more user control.

I'm also still fine taking more time on multiple pdf/epub support, we don't need this feature to wrap up this change. There are UI and UX questions for both maintainers and readers with all three approaches here, and I'm not very convinced on any of them yet.

humitos added a commit that referenced this issue Jan 31, 2023
* Build: check if the output directory is a directory

It checks for `_readthedocs/<format>` output directory to be a directory.
If it's not a directory, it silently skip that format for now.
This behavior will change once we move out `store_build_artifacts` from
`on_success` and we can communicate errors easily to the user.

Related #9931

* Build: refactor logic to validate artifacts output paths

Wrap all the logic into a class method so it's easier to follow and read.
It also remove some duplicated checks of `os.path.exists`,
making sure we are always using _validated_ paths already.

* Build: communicate uploading errors to users

Running `store_build_artifacts` from inside `.execute()` method
instead of from `.on_sucess()` allow us to raise any exception and handle it
properly.

We can raise a `BuildUserError` to clearly communicate the specific error to the
user, or any other exception to show the generic message: "Try again or contact
us".

Besides, this allows us to *fail the build* instead of continuing silently
without the user noticing this:

- there more than 1 file of formats that don't support multiple files
- the output path is not a directory
- there were errors uploading the artifacts to the storage

* Build: validate the output directory has at least 1 file

In case the `_readthedocs/<format>` directory is created by it contains 0 files,
we fail the build and communicate this to the user.

* Build: constant for artifact types that don't support multiple files
@humitos
Copy link
Member Author

humitos commented Jan 31, 2023

This is already fixed. We've created other issues to track the missing parts. I'm closing it.

@humitos humitos closed this as completed Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Bug A bug
Projects
Archived in project
Development

No branches or pull requests

2 participants