Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix github resource issue when build latest docker #7450

Merged
merged 26 commits into from
Feb 23, 2024

Conversation

KumoLiu
Copy link
Contributor

@KumoLiu KumoLiu commented Feb 7, 2024

Fixes #7449

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Copy link
Member

@ericspod ericspod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to back to what we had before with deleting the whole directory so it should be fine. Deleting the whole hostedtoolcache directory won't have the Python version issue you addressed earlier?

@KumoLiu
Copy link
Contributor Author

KumoLiu commented Feb 7, 2024

Hi @ericspod,

Deleting the whole hostedtoolcache directory won't have the Python version issue you addressed earlier?

Yes, since this task is based on the base PyTorch image, I believe we could be able to remove the entire directory without worrying about the Python version inside the Docker image.

But delete whole dir still can't solve the issue, I guess it may due to the update of the PyTorch.
https://github.com/Project-MONAI/MONAI/actions/runs/7816004608/job/21320679116?pr=7450#step:4:519

Do you have any suggestion?

@ericspod
Copy link
Member

ericspod commented Feb 7, 2024

We might have to poke around in the image to see what else we can delete to reduce its size. Within docker run -ti --rm nvcr.io/nvidia/pytorch:23.08-py3 you can install ncdu:

apt update
apt install ncdu
ncdu /

You can use this to browse around and see what can be removed. Maybe we can delete everything in /opt?

Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
@KumoLiu
Copy link
Contributor Author

KumoLiu commented Feb 19, 2024

After a deeper investigation, there is no more space to free up.
https://github.com/Project-MONAI/MONAI/actions/runs/7901892556/job/21566379926?pr=7450#step:4:23

I discussed this with @YanxuanLiu offline, and he will help move the build docker workflow from GitHub to Blossom.
There are two differences following the move. First, we may need to manually trigger the workflow on the blossom after each PR is merged. Second, users cannot trigger the workflow themselves, which is currently supported.

cc @ericspod, do you have any concerns about this update?

@ericspod
Copy link
Member

It would be easier to keep the docker build on github of course. Is this actually an issue with the space in the runner itself? The Dockerfile could use optimising to produce fewer layers than is present now, or use a multi-stage build process. This should save space and make a smaller final image.

@ericspod
Copy link
Member

This may reduce the number of layers. I haven't checked it thoroughly but it may help to update the Dockerfile with something like this:

# Copyright (c) MONAI Consortium
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#     http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# To build with a different base image
# please run `docker build` using the `--build-arg PYTORCH_IMAGE=...` flag.
ARG PYTORCH_IMAGE=nvcr.io/nvidia/pytorch:23.08-py3
FROM ${PYTORCH_IMAGE} as build

LABEL maintainer="monai.contact@gmail.com"

WORKDIR /opt/monai

COPY LICENSE CHANGELOG.md CODE_OF_CONDUCT.md CONTRIBUTING.md README.md versioneer.py setup.py setup.cfg runtests.sh \
  MANIFEST.in requirements-min.txt requirements.txt requirements-dev.txt ./
COPY tests ./tests
COPY monai ./monai

# install full deps
RUN awk '!/torch/' requirements.txt > /tmp/tmp \
  && mv /tmp/tmp requirements.txt\
  && python -m pip install --upgrade --no-cache-dir pip \
  && python -m pip install --no-cache-dir -r requirements-dev.txt

# compile ext and remove temp files
# TODO: remark for issue [revise the dockerfile #1276](https://github.com/Project-MONAI/MONAI/issues/1276)
# please specify exact files and folders to be copied -- else, basically always, the Docker build process cannot cache
# this or anything below it and always will build from at most here; one file change leads to no caching from here on...

RUN BUILD_MONAI=1 FORCE_CUDA=1 python setup.py develop \
  && rm -rf build __pycache__

# NGC Client
ARG NGC_CLI_URI="https://ngc.nvidia.com/downloads/ngccli_linux.zip"
WORKDIR /opt/tools
RUN wget -q ${NGC_CLI_URI} && unzip ngccli_linux.zip && chmod u+x ngc-cli/ngc && \
    find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5 && \
    rm -rf ngccli_linux.zip ngc-cli.md5

RUN apt-get update \
  && DEBIAN_FRONTEND="noninteractive" apt-get install -y libopenslide0 zip \
  && rm -rf /var/lib/apt/lists/*
# append /opt/tools to runtime path for NGC CLI to be accessible from all file system locations
ENV PATH=${PATH}:/opt/tools:/opt/tools/ngc-cli

WORKDIR /opt
RUN zip -r opt.zip ./monai ./tools/ngc-cli

# create a final stage with fewer layers
FROM ${PYTORCH_IMAGE} as final

WORKDIR /opt

COPY --from=build /opt/opt.zip .

RUN apt-get update \
  && DEBIAN_FRONTEND="noninteractive" apt-get install -y libopenslide0 zip \
  && unzip opt.zip \
  && python -m pip install --upgrade --no-cache-dir pip \
  && (cd monai && python -m pip install --no-cache-dir -r requirements-dev.txt) \
  && rm -rf opt.zip /var/lib/apt/lists/*

WORKDIR /opt/monai

KumoLiu and others added 2 commits February 20, 2024 11:01
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
@KumoLiu
Copy link
Contributor Author

KumoLiu commented Feb 20, 2024

Hi @ericspod, thanks for your suggestions, I tried but it's still not working.
Looks like the first line build from the base PyTorch image caused the error, as our final image only required 10G, despite the PyTorch image growing to be larger than 20G.

@ericspod
Copy link
Member

Looks like the first line build from the base PyTorch image caused the error, as our final image only required 10G, despite the PyTorch image growing to be larger than 20G.

Thanks @KumoLiu, I guess there's not much that we can do if the first operation creates this much data. Even using --no-cache with the build command line probably wouldn't save us here if it gets that big immediately. There doesn't appear to be a "slim" version of the base Pytorch image either so let's go with blossom.

KumoLiu and others added 4 commits February 22, 2024 14:33
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Copy link
Member

@ericspod ericspod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as a fix, we will come back to this later if possible.

@KumoLiu
Copy link
Contributor Author

KumoLiu commented Feb 23, 2024

/build

@KumoLiu KumoLiu enabled auto-merge (squash) February 23, 2024 02:25
@KumoLiu KumoLiu merged commit 9c77a04 into Project-MONAI:dev Feb 23, 2024
28 checks passed
@KumoLiu KumoLiu deleted the ci branch February 23, 2024 05:56
juampatronics pushed a commit to juampatronics/MONAI that referenced this pull request Mar 25, 2024
Fixes Project-MONAI#7449

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Integration tests passed locally by running `./runtests.sh -f -u
--net --coverage`.
- [ ] Quick tests passed locally by running `./runtests.sh --quick
--unittests --disttests`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated, tested `make html` command in the `docs/`
folder.

---------

Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Juan Pablo de la Cruz Gutiérrez <juampatronics@gmail.com>
Yu0610 pushed a commit to Yu0610/MONAI that referenced this pull request Apr 11, 2024
Fixes Project-MONAI#7449

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Integration tests passed locally by running `./runtests.sh -f -u
--net --coverage`.
- [ ] Quick tests passed locally by running `./runtests.sh --quick
--unittests --disttests`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated, tested `make html` command in the `docs/`
folder.

---------

Signed-off-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Yu0610 <612410030@alum.ccu.edu.tw>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Github resource issue when build latest docker
2 participants