Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Think about multistage docker build #1096

Open
VincentGuyader opened this issue Aug 10, 2023 · 4 comments
Open

Think about multistage docker build #1096

VincentGuyader opened this issue Aug 10, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@VincentGuyader
Copy link
Member

VincentGuyader commented Aug 10, 2023

today, the add_dockerfile_with_renv fonction create 2 Dockerfile tu be able to reuse the renv cache from the first image, in se second one.

like this :

docker build -f Dockerfile_base --progress=plain -t wootwoot3_base .
docker build -f Dockerfile --progress=plain -t wootwoot3:latest .

we can set instead an unique Dockerfile with multistage :


FROM rocker/verse:4.2.2 AS base
RUN apt-get update -y && apt-get install -y  make zlib1g-dev git && rm -rf /var/lib/apt/lists/*
RUN mkdir -p /usr/local/lib/R/etc/ /usr/lib/R/etc/
RUN echo "options(renv.config.pak.enabled = TRUE, repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl', Ncpus = 4)" | tee /usr/local/lib/R/etc/Rprofile.site | tee /usr/lib/R/etc/Rprofile.site
RUN R -e 'install.packages(c("renv","remotes"))'
COPY renv.lock.prod renv.lock
RUN R -e 'renv::restore()'


FROM base AS final
COPY renv.lock.prod renv.lock
RUN R -e 'renv::restore()'
COPY wootwoot3_*.tar.gz /app.tar.gz
RUN R -e 'remotes::install_local("/app.tar.gz",upgrade="never")'
RUN rm /app.tar.gz
EXPOSE 3838
CMD R -e "options('shiny.port'=3838,shiny.host='0.0.0.0');library(wootwoot3);wootwoot3::run_app()"

it works, but as far as i know if you change your renv.lock.prod fie it's not possible to only rerun final.

I have tried --target, --cache-from .. without succes.

some other people have the same need as us , see in python here :

https://pythonspeed.com/articles/faster-multi-stage-builds/

but the solution is not clean at all.

so I think that we will keed the actual solution,
if anyone else wants to try looking, I'd be happy to since I have no experience with docker multistage

https://stackoverflow.com/questions/52697948/artifact-caching-for-multistage-docker-builds

@VincentGuyader VincentGuyader added the enhancement New feature or request label Aug 10, 2023
@dijitali
Copy link

dijitali commented Oct 10, 2023

it works, but as far as i know if you change your renv.lock.prod fie it's not possible to only rerun final.

Is that a problem? If you change which packages your project uses, you want docker to install those changes, right?

If you have the previous docker image pulled then it should only run subsequent layers from the COPY renv.lock.prod renv.lock command onwards (not the entire base stage)

@VincentGuyader
Copy link
Member Author

it works, but as far as i know if you change your renv.lock.prod fie it's not possible to only rerun final.

Is that a problem? If you change which packages your project uses, you want docker to install those changes, right?

If you have the previous docker image pulled then it should only run subsequent layers from the COPY renv.lock.prod renv.lock command onwards (not the entire base stage)

HI, COPY renv.lock.prod renv.lock is in both stage ... :(

@LDSamson
Copy link

I feel like the following lines are not needed in the second layer:

COPY renv.lock.prod renv.lock
RUN R -e 'renv::restore()'

Is there a specific reason why you added these lines twice? If you remove these, shouldn't it give the intended behavior, by only one time installing the packages (in the first layer)? The second layer will still be invalidated after an renv.lock update but should not take too long to run, I think.

@VincentGuyader
Copy link
Member Author

Hi, the first renv::restore() in the base image sets up the foundation by installing all the dependencies, which takes a long time (~1hour in this case). Once this lengthy step is done, it is cached.

Then, during development or adjustments (for example, when updating your golem package to version 2), the second renv::restore() in the final Dockerfile allows you to take advantage of this cache. This step will be much faster because it doesn’t reinstall everything from scratch. It only adjusts the environment by installing any missing packages or applying necessary updates. This way, you can efficiently iterate on just step 2 without having to restart the entire restore process each time.

In summary, this two-step process allows for continuous deployment of the app while making adjustments, without wasting too much time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants