Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a python environment within the docker image for the viz workflow #30

Closed
julietcohen opened this issue Feb 9, 2024 · 3 comments
Closed
Labels
enhancement New feature or request

Comments

@julietcohen
Copy link
Collaborator

Initial progress in the docker & kubernetes workflow includes creating a first draft of the Dockerfile that installs the requirements for the workflow from a requirements.txt file that is copied into the container and run:

COPY requirements.txt .
RUN pip install -r requirements.txt

When I build with a Dockerfile that installs requirements with that approach, a warning is output:

Running pip as the 'root' user can result in broken permissions
and conflicting behavior with the system package manager. 
It is recommended to use a virtual environment instead

The workflow can be improved by creating a new environment within the container and then installing the requirements into that.

The feature detection team working on MAPLE developed a draft of their Dockerfile that creates a conda environment and installs the requirements:

MAPLE Dockerfile
FROM ubuntu:22.04
ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"
RUN apt-get update
RUN apt-get install -y sudo

RUN apt-get install -y wget && rm -rf /var/lib/apt/lists/*

RUN mkdir -p ~/miniconda3
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
RUN bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
RUN rm -rf ~/miniconda3/miniconda.sh
RUN conda --version

RUN conda clean -a

RUN echo $CONDA_PREFIX

COPY environment_maple.yml .

COPY config.py .

COPY hyp_best_train_weights_final.h5 .

COPY maple_workflow.py .

COPY mpl_clean_inference.py .

COPY mpl_divideimg_234_water_new.py .

COPY mpl_infer_tiles_GPU_new.py .

COPY mpl_process_shapefile.py .

COPY mpl_stitchshpfile_new.py .

COPY mpl_config.py .

COPY utils.py .

COPY model.py .

RUN ls

RUN conda env create -f environment_maple.yml

RUN sudo apt-get update

RUN apt-get clean

RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 libxrender-dev libgl1-mesa-glx -y

SHELL ["conda", "run", "-n", "maple_py310", "/bin/bash", "-c"]

CMD ["conda", "run", "--no-capture-output", "-n", "maple_v310", "python","-u", "maple_workflow_create_dir_struct.py"]

CMD ["conda", "run", "--no-capture-output", "-n", "maple_v310", "python","-u", "maple_workflow.py"]

One consideration is whether we want to use conda or virtualenv. Code that may be a starting point for for creating a venv within the container:

create venv within Dockerfile
RUN apt-get update && apt-get install \
  -y --no-install-recommends python3 python3-virtualenv

# set an environment variable called VIRTUAL_ENV to the path /opt/venv
ENV VIRTUAL_ENV=/opt/venv

# creates the virtual environment 
RUN python3 -m virtualenv --python=/usr/bin/python3 $VIRTUAL_ENV

# set an environmental variable called PATH to be /opt/venv/bin
ENV PATH="$VIRTUAL_ENV/bin:$PATH"


This issue is a sub-task towards the ultimate goal of issue#1

@julietcohen
Copy link
Collaborator Author

julietcohen commented Jun 4, 2024

I integrated setup for a conda env in the container, along with a user with the same permissions as me on the Datateam server. See issue#39.

Dockerfile
FROM python:3.9
SHELL ["/bin/bash", "-c"]
# metadata info:
LABEL org.opencontainers.image.source https://github.com/permafrostdiscoverygateway/viz-workflow

WORKDIR /home/pdgk8suser

RUN apt update && apt -y install wget sudo vim nano iproute2 tree
# pip should already be installed after installing python, so no need to install here

# Create new group called pdgk8sgroup and add new user to that group
# both with same ID number as jcohen for permissions after container runs.
# Do this before miniconda operations because want to install miniconda in the 
# user's homedir
RUN groupadd --gid 1040 -r pdgk8sgroup && useradd --uid 1040 -r -g pdgk8sgroup pdgk8suser
# make dir that matches WORKDIR
RUN mkdir -p /home/pdgk8suser && chown pdgk8suser:pdgk8sgroup /home/pdgk8suser
# make dir that matches the PV to store output data
RUN mkdir -p /mnt/k8s-dev-pdg && chown pdgk8suser:pdgk8sgroup /mnt/k8s-dev-pdg

# actviate that user account
USER pdgk8suser:pdgk8sgroup

# define miniconda installation path based on WORKDIR
ENV CONDA_HOME="/home/pdgk8suser/miniconda3"
ENV PATH="${CONDA_HOME}/bin:${PATH}"

RUN mkdir -p ${CONDA_HOME} && \
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ${CONDA_HOME}/miniconda.sh && \
    bash ${CONDA_HOME}/miniconda.sh -b -u -p ${CONDA_HOME} && \
    rm -rf ${CONDA_HOME}/miniconda.sh && \
    conda --version

# create new conda env
RUN conda create -n pdg_k8s_env python=3.9 && \
    conda init && \
    conda activate pdg_k8s_env

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY workflow_config.py .
COPY iwp_2_files .
COPY parsl_workflow.py .
COPY parsl_config.py .

The build for this image worked well until the very end, when I got an error:
image

I included the line conda init in that dockerfile before but I need to make it a full command like conda init bash or something similar. I also am considering starting the RUN command like suggested here or here

@julietcohen
Copy link
Collaborator Author

julietcohen commented Jun 4, 2024

The following dockerfile built with no warnings or errors. This is in the pushed version 0.2.5 of the package.

Dockerfile
# for parsl_workflow.py:

# FROM ubuntu:22.04 is also an option but this would make the image larger and would need to install python too
FROM python:3.9
SHELL ["/bin/bash", "-c"]
# metadata info:
LABEL org.opencontainers.image.source https://github.com/permafrostdiscoverygateway/viz-workflow

# WORKDIR /usr/local/share/app # a generalized option
# Keep in mind WORKDIR can use environment variables previously set
# using ENV, like ENV DIRPATH=/path followed by WORKDIR $DIRPATH/$DIRNAME
WORKDIR /home/pdgk8suser

RUN apt update && apt -y install wget sudo vim nano iproute2 tree
# pip should already be installed after installing python, so no need to install here

# Create new group called pdgk8sgroup and add new user to that group
# both with same ID number as jcohen for permissions after container runs.
# Do this before miniconda operations because want to install miniconda in the 
# user's homedir?
RUN groupadd --gid 1040 -r pdgk8sgroup && useradd --uid 1040 -r -g pdgk8sgroup pdgk8suser
# make dir that matches WORKDIR
RUN mkdir -p /home/pdgk8suser && chown pdgk8suser:pdgk8sgroup /home/pdgk8suser
# make dir that matches the PV to store output data
RUN mkdir -p /mnt/k8s-dev-pdg && chown pdgk8suser:pdgk8sgroup /mnt/k8s-dev-pdg

# actviate that user account
USER pdgk8suser:pdgk8sgroup

# define miniconda installation path based on WORKDIR
ENV CONDA_HOME="/home/pdgk8suser/miniconda3"
ENV PATH="${CONDA_HOME}/bin:${PATH}"

RUN mkdir -p ${CONDA_HOME} && \
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ${CONDA_HOME}/miniconda.sh && \
    bash ${CONDA_HOME}/miniconda.sh -b -u -p ${CONDA_HOME} && \
    rm -rf ${CONDA_HOME}/miniconda.sh && \
    conda --version

# create new conda env
RUN conda create -n pdg_k8s_env python=3.9

SHELL ["conda", "run", "-n", "pdg_k8s_env", "/bin/bash", "-c"]

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY workflow_config.py .
COPY iwp_2_files .
COPY parsl_workflow.py .
COPY parsl_config.py .

# maybe we don't want to run a command bc we need to use the terminal to
# do it now that we are using parsl and k8s
# CMD [ "python", "./parsl_workflow.py" ]


# ------------------------------------------------------------

# # for simple_workflow.py:

# # base image
# FROM python:3.9

# WORKDIR /home/jcohen/viz-worflow/docker-parsl_workflow/

# # python script to run
# ADD simple_workflow.py .
# # add the input data
# COPY data/test_polygons.gpkg .
# COPY requirements.txt .

# # packages to install
# RUN pip install -r requirements.txt

# CMD [ "python", "./simple_workflow.py" ]

@julietcohen
Copy link
Collaborator Author

This issue has been resolved, and can be re-opened if a deep dive determines that the way it was achieved in the Dockerfile could be improved or generalized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

No branches or pull requests

1 participant