Build and API: Backend changes to handle multiple PDFs #10424

benjaoming · 2023-06-12T19:53:48Z

Referencing high-level issue #2045 and especially #2045 (comment)

Adapt Sphinx builder to enable multiple PDFs with confval [latex_documents] Build: Allow multiple PDFs #10438
Detect and move all PDF files in $READTHEDOCS_OUTPUT/pdf/ for all builders Build: Allow multiple PDFs #10438
Detect and store file names in ImportedFiles Build: Allow multiple PDFs #10438

API changes:

Add new API endpoint to return multiple files w/ urls + verbose labels

El Proxito changes

New El Proxito URLs

I'll add more details once the implementation is open. LMK if there's anything missed here.

WRT @humitos question:

either by doing a listdir on S3 for that version, or by saving this data into the db

I'm leaning towards saving this data in the DB during build time. That's when we know about which files exist and can save it correctly.

Open question:

How do we store data about PDF files? DB? Or just pure S3 listdirs?
Do we want to generalize this for ePub?

The text was updated successfully, but these errors were encountered:

benjaoming · 2023-06-12T20:06:43Z

Legacy frontend support

Currently, builds fail if there are multiple PDFs in the output. Once we start supporting it in the backend (through a feature flag?), we should make sure to note that we have ONLY implemented backend support, meaning that on the API can be used to list these files.

While we are adding backend support, the frontend will still only work for 1 PDF file.

I think that's okay since we can put a direct reference in that message to a GitHub issue covering frontend support.

Meanwhile, we might continue to publish the "first" file of the output (chosen for instance by its alphanumeric order).

humitos · 2023-06-13T07:59:18Z

Currently, builds fail if there are multiple PDFs in the output. Once we start supporting it in the backend (through a feature flag?),

I think implementing a feature flag for internal usage for now is fine. Once we have the frontend UI exposed we can enable to users that have this requirement as a small rollout and then globally.

we might continue to publish the "first" file of the output

I'd avoid dealing with both implementation at the same time. That would be pretty confusing and hard to explain to users.

benjaoming · 2023-06-15T16:01:56Z

@humitos @stsewd would be great to have your inputs with regards to whether or not to store information about collected and copied files during the build, maybe in the ImportedFile model?

Is there also an aspect of performing search indexing?

From the description:

@humitos:

either by doing a listdir on S3 for that version, or by saving this data into the db

@benjaoming:
I'm leaning towards saving this data in the DB during build time. That's when we know about which files exist and can save it correctly.

stsewd · 2023-06-15T16:45:04Z

I'm +1 on having that in the DB, and yeah I could see a way of reusing the ImportedFile model for that, maybe add a new field to identify the file as downloadable PDF.

+1 on extending this to more file types.

There is also the question about how these files will be stored, I think it's fine to let users define the name, maybe don't allow nested paths.

humitos · 2023-06-15T17:25:51Z

I think the correct way is to store the filenames in the database, yeah 👍🏼

I agree we should not allow nested path. We should only upload and serve $READTHEDOCS_OUTPUT/pdf/*.pdf and nothing else.

I'm not familiarized with ImportedFile model but I thought we wanted to delete it because we weren't using it? Maybe I'm wrong, tho 🤷🏼 . So, probably whatever Santos says about that model is the best path to follow 😄

benjaoming · 2023-06-15T17:30:03Z

rclone has some really nice filtering options, so we can tell it to only sync $READTHEDOCS_OUTPUT/pdf/*.pdf 💯

benjaoming · 2023-06-15T17:32:32Z

Thanks for the inputs ❤️ This should be great to get started 🏃‍♂️

stsewd · 2023-06-15T17:38:58Z

The ImportedFile model is useful for re-indexing (this is since it has the rank and ignored field in the model, but we can have a way to rely on the config file metadata from the build for this), what we can remove are the SphinxDomain models.

benjaoming added Feature New feature Accepted Accepted issue on our roadmap labels Jun 12, 2023

benjaoming self-assigned this Jun 12, 2023

This was referenced Jun 14, 2023

Build: Allow multiple PDFs (WIP) #10437

Closed

Build: Allow multiple PDFs #10438

Open

humitos unassigned benjaoming Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build and API: Backend changes to handle multiple PDFs #10424

Build and API: Backend changes to handle multiple PDFs #10424

benjaoming commented Jun 12, 2023 •

edited

Loading

benjaoming commented Jun 12, 2023 •

edited

Loading

humitos commented Jun 13, 2023

benjaoming commented Jun 15, 2023

stsewd commented Jun 15, 2023

humitos commented Jun 15, 2023

benjaoming commented Jun 15, 2023

benjaoming commented Jun 15, 2023

stsewd commented Jun 15, 2023

Build and API: Backend changes to handle multiple PDFs #10424

Build and API: Backend changes to handle multiple PDFs #10424

Comments

benjaoming commented Jun 12, 2023 • edited Loading

benjaoming commented Jun 12, 2023 • edited Loading

Legacy frontend support

humitos commented Jun 13, 2023

benjaoming commented Jun 15, 2023

stsewd commented Jun 15, 2023

humitos commented Jun 15, 2023

benjaoming commented Jun 15, 2023

benjaoming commented Jun 15, 2023

stsewd commented Jun 15, 2023

benjaoming commented Jun 12, 2023 •

edited

Loading

benjaoming commented Jun 12, 2023 •

edited

Loading