Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a nopdf version for day to day use #29

Open
ericholscher opened this issue Mar 8, 2017 · 6 comments
Open

Create a nopdf version for day to day use #29

ericholscher opened this issue Mar 8, 2017 · 6 comments
Labels
Needed: design decision A core team decision is required

Comments

@ericholscher
Copy link
Member

Most of the image size and build time comes from the LaTeX tooling required to build PDF's. We should create a nopdf version of the images that exclude this. We should perhaps also have a no-libraries version that simply has the basic python tooling. We could use this for building sphinx docs and conda, without needing the large size.

ericholscher added a commit that referenced this issue Mar 8, 2017
This addresses #29,
and gives us a `base` image that just has the Python environment,
VCS tools,
and a couple other standard utilities.

Then it has a `full` image that adds all of the PDF/C libraries/etc.
This is where most of the time and bloat comes from,
and isn't actually required for most testing,
HTML builds,
conda builds,
and lots of other use cases.

This would allow us to interate and test base builds faster,
while keeping the environment the same in production.

NOTE: I'm mostly looking for feedback on this PR,
not specific nitpicking.
ericholscher added a commit that referenced this issue Mar 8, 2017
This addresses #29,
and gives us a `base` image that just has the Python environment,
VCS tools,
and a couple other standard utilities.

Then it has a `full` image that adds all of the PDF/C libraries/etc.
This is where most of the time and bloat comes from,
and isn't actually required for most testing,
HTML builds,
conda builds,
and lots of other use cases.

This would allow us to interate and test base builds faster,
while keeping the environment the same in production.

NOTE: I'm mostly looking for feedback on this PR,
not specific nitpicking.
@agjohnson agjohnson added the Needed: design decision A core team decision is required label Oct 19, 2017
@agjohnson
Copy link
Contributor

We ran into issues splitting things up, so this is on hold. I still think its a good idea, perhaps someone more familiar with Dockerfiles can offer some guidance on how to modularize the containers.

@eine
Copy link

eine commented Dec 24, 2017

Hi @agjohnson!

On the one hand, the base is huge, too huge indeed, astonishing huge, so huge that play-with-docker breaks when trying to pull it, because it reaches the disk quota (4GB). I know you have very heavy dependencies, which is hard to manage, but I think you are pushing docker images to the limit of the concept behind containers, which is modularization. Indeed, I think that the default limit for images is 10GB and this requires 8.9 GB. Luckily, there are two approaches you can take to somehow alleviate this:

The first one is straighforward: use docker multi-stage builds to get the nopdf or other versions at no cost. Note that the main feature of multi-stage builds is that intermediate images are cached, so the total build time to generate multiple staged images is the same you would need to build a single one. I cloned this repo and made minimum modifications to show the concept: https://github.com/1138-4EB/readthedocs-docker-images/blob/multi-stage/Dockerfile

Now, to have both images built:

docker build -t readthedocs/build:nolatex --target rtd-base .
docker build -t readthedocs/build --target rtd-latex .

You can see the output in this travis build: https://travis-ci.org/1138-4EB/readthedocs-docker-images/jobs/321011442

The second one might be a little trickier: make a lightweight base image that acts as an orchestrator, and make it execute every task in it's own container (one for LaTeX, a different one for python, another one for js...). This can be done mounting the docker socket in the orchestrator and using named volumes. It is difficult to make a specific enhancement proposal here, because I don't really know when or how you use the tools installed in the base image.

@humitos
Copy link
Member

humitos commented Oct 20, 2020

I opened a design document that talks a little about this at readthedocs/readthedocs.org#7566

@eine
Copy link

eine commented Oct 20, 2020

@humitos, you might find buildthedocs/docker and/or buildthedocs/btd inspiring.

@humitos
Copy link
Member

humitos commented Sep 2, 2021

Our new Docker image is about ~5Gb (#166) which is still big. I did a small test by building a ubuntu20-nopdf version of that PR and it ended up being ~1.5Gb 😮 and it built in 1m30s 😮

So, considering that we will have only 1 image per OS version supported and it shouldn't be rebuilt too often, we can definitely expose a -nopdf image to developers using the Local Environment.

@eine
Copy link

eine commented Sep 2, 2021

@humitos, nice work! Keep up!

Note that the trick we use in buildthedocs it's not the -nopdf only, but a complementary latex image (https://hub.docker.com/r/btdi/latex/tags?page=1&ordering=last_updated). The point is that we use one container for running Sphinx (which needs to contain sphinx and the user's extensions/dependencies only), and then we run a different container for building the PDF (which does not need sphinx or python, but just LaTeX packages). Overall, we are decoupling source generation and document compilation.

Since you seem to be reevaluating your current stack, I hope that might be inspiring for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needed: design decision A core team decision is required
Projects
None yet
Development

No branches or pull requests

4 participants