Skip to content

Updated Dockerfiles For 2.22.0 Release #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
FROM public.ecr.aws/docker/library/ubuntu:22.04

LABEL dlc_major_version="1"

Check failure on line 3 in docker/jax/training/0.5/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3048 style: Invalid label key.
LABEL maintainer="Amazon AI"

# Neuron SDK components version numbers
ARG NEURONX_RUNTIME_LIB_VERSION=2.23.112.0-9b5179492
ARG NEURONX_COLLECTIVES_LIB_VERSION=2.23.135.0-3e70920f2
ARG NEURONX_TOOLS_VERSION=2.20.204.0
ARG NEURONX_CC_VERSION=2.16.372.0
ARG NEURONX_JAX_TRAINING_VERSION=0.1.2
ARG NEURONX_RUNTIME_LIB_VERSION=2.24.53.0-f239092cc
ARG NEURONX_COLLECTIVES_LIB_VERSION=2.24.59.0-838c7fc8b
ARG NEURONX_TOOLS_VERSION=2.22.61.0
ARG NEURONX_CC_VERSION=2.17.194.0
ARG NEURONX_JAX_TRAINING_VERSION=0.1.3

ARG PYTHON=python3.10
ARG PYTHON_VERSION=3.10.12
Expand All @@ -31,7 +31,7 @@
ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/amazon/openmpi/lib64"
ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/lib"

RUN apt-get update \

Check failure on line 34 in docker/jax/training/0.5/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3008 warning: Pin versions in apt get install. Instead of `apt-get install <package>` use `apt-get install <package>=<version>`
&& apt-get upgrade -y \
&& apt-get install -y --no-install-recommends \
build-essential \
Expand Down Expand Up @@ -74,7 +74,7 @@
&& apt-get clean

# Install Open MPI
RUN mkdir -p /tmp/openmpi \

Check failure on line 77 in docker/jax/training/0.5/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3003 warning: Use WORKDIR to switch to a directory

Check failure on line 77 in docker/jax/training/0.5/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

SC2046 warning: Quote this to prevent word splitting.
&& cd /tmp/openmpi \
&& wget --quiet https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-${OMPI_VERSION}.tar.gz \
&& tar zxf openmpi-${OMPI_VERSION}.tar.gz \
Expand All @@ -86,7 +86,7 @@
&& rm -rf /tmp/openmpi

# Install packages and configure SSH for MPI operator in k8s
RUN apt-get update && apt-get install -y openmpi-bin openssh-server \

Check failure on line 89 in docker/jax/training/0.5/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3008 warning: Pin versions in apt get install. Instead of `apt-get install <package>` use `apt-get install <package>=<version>`

Check failure on line 89 in docker/jax/training/0.5/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3015 info: Avoid additional packages by specifying `--no-install-recommends`
&& mkdir -p /var/run/sshd \
&& echo " UserKnownHostsFile /dev/null" >> /etc/ssh/ssh_config \
&& echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config \
Expand All @@ -95,7 +95,7 @@
&& apt-get clean

# install Python
RUN wget -q https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz \

Check failure on line 98 in docker/jax/training/0.5/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

SC2046 warning: Quote this to prevent word splitting.

Check failure on line 98 in docker/jax/training/0.5/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL3003 warning: Use WORKDIR to switch to a directory
&& tar -xzf Python-$PYTHON_VERSION.tgz \
&& cd Python-$PYTHON_VERSION \
&& ./configure --enable-shared --prefix=/usr/local \
Expand All @@ -114,13 +114,13 @@
# ompi_info to fail. This is only observed in CPU containers
ENV PATH="$PATH:/home/.openmpi/bin"
ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/.openmpi/lib/"
RUN ompi_info --parsable --all | grep mpi_built_with_cuda_support:value

Check failure on line 117 in docker/jax/training/0.5/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL4006 warning: Set the SHELL option -o pipefail before RUN with a pipe in it. If you are using /bin/sh in an alpine image or if your shell is symlinked to busybox then consider explicitly setting your SHELL to /bin/ash, or disable this check

RUN mkdir -p /etc/pki/tls/certs && cp /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt

# Install Neuron Driver, Runtime and Tools
RUN echo "deb https://apt.repos.neuron.amazonaws.com focal main" > /etc/apt/sources.list.d/neuron.list
RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add -

Check failure on line 123 in docker/jax/training/0.5/Dockerfile.neuronx

View workflow job for this annotation

GitHub Actions / dockerfile-linter

DL4006 warning: Set the SHELL option -o pipefail before RUN with a pipe in it. If you are using /bin/sh in an alpine image or if your shell is symlinked to busybox then consider explicitly setting your SHELL to /bin/ash, or disable this check

RUN apt-get update \
&& apt-get install -y \
Expand Down
100 changes: 48 additions & 52 deletions docker/pytorch/inference/2.5.1/Dockerfile.neuronx
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,11 @@ LABEL dlc_major_version="1"
LABEL maintainer="Amazon AI"
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true

# Neuron SDK components version numbers
ARG NEURONX_CC_VERSION=2.16.372.0
ARG NEURONX_FRAMEWORK_VERSION=2.5.1.2.4.0
ARG NEURONX_TRANSFORMERS_VERSION=0.13.380
ARG NEURONX_COLLECTIVES_LIB_VERSION=2.23.135.0-3e70920f2
ARG NEURONX_RUNTIME_LIB_VERSION=2.23.112.0-9b5179492
ARG NEURONX_TOOLS_VERSION=2.20.204.0
ARG NEURONX_DISTRIBUTED_VERSION=0.10.1
ARG NEURONX_DISTRIBUTED_INFERENCE_VERSION=0.1.1

ARG PIP=pip3
ARG PYTHON=python3.10
ARG PYTHON_VERSION=3.10.12
ARG TORCHSERVE_VERSION=0.11.0
ARG SM_TOOLKIT_VERSION=2.0.21
ARG SM_TOOLKIT_VERSION=2.0.25
ARG MAMBA_VERSION=23.1.0-4

# See http://bugs.python.org/issue19846
Expand All @@ -37,7 +28,6 @@ RUN apt-get update \
curl \
emacs \
git \
gnupg2 \
gpg-agent \
jq \
libgl1-mesa-glx \
Expand All @@ -56,18 +46,6 @@ RUN apt-get update \
&& rm -rf /tmp/tmp* \
&& apt-get clean

RUN echo "deb https://apt.repos.neuron.amazonaws.com focal main" > /etc/apt/sources.list.d/neuron.list
RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add -

RUN apt-get update \
&& apt-get install -y \
aws-neuronx-tools=$NEURONX_TOOLS_VERSION \
aws-neuronx-collectives=$NEURONX_COLLECTIVES_LIB_VERSION \
aws-neuronx-runtime-lib=$NEURONX_RUNTIME_LIB_VERSION \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /tmp/tmp* \
&& apt-get clean

# https://github.com/docker-library/openjdk/issues/261 https://github.com/docker-library/openjdk/pull/263/files
RUN keytool -importkeystore -srckeystore /etc/ssl/certs/java/cacerts -destkeystore /etc/ssl/certs/java/cacerts.jks -deststoretype JKS -srcstorepass changeit -deststorepass changeit -noprompt; \
mv /etc/ssl/certs/java/cacerts.jks /etc/ssl/certs/java/cacerts; \
Expand Down Expand Up @@ -100,9 +78,10 @@ RUN conda install -c conda-forge \
&& ln -s /opt/conda/bin/pip /usr/local/bin/pip3 \
&& pip install packaging \
enum-compat \
ipython
ipython \
&& rm -rf ~/.cache/pip/*

RUN pip install --no-cache-dir -U \
RUN ${PIP} install --no-cache-dir -U \
opencv-python>=4.8.1.78 \
"numpy<1.24,>1.21" \
"scipy>=1.8.0" \
Expand All @@ -111,43 +90,30 @@ RUN pip install --no-cache-dir -U \
"awscli<2" \
pandas==1.* \
boto3 \
cryptography

RUN pip install -U --extra-index-url https://pip.repos.neuron.amazonaws.com \
neuronx-cc==$NEURONX_CC_VERSION \
torch-neuronx==$NEURONX_FRAMEWORK_VERSION \
transformers-neuronx==$NEURONX_TRANSFORMERS_VERSION \
&& pip install -U "protobuf>=3.18.3,<4" \
cryptography \
"protobuf>=3.18.3,<4" \
"transformers==4.45.*" \
torchserve==${TORCHSERVE_VERSION} \
torch-model-archiver==${TORCHSERVE_VERSION} \
&& pip install --no-deps --no-cache-dir -U torchvision==0.20.* \
&& pip install --no-deps -U --extra-index-url https://pip.repos.neuron.amazonaws.com neuronx_distributed==$NEURONX_DISTRIBUTED_VERSION \
&& pip install -U --extra-index-url https://pip.repos.neuron.amazonaws.com neuronx_distributed_inference==$NEURONX_DISTRIBUTED_INFERENCE_VERSION
&& ${PIP} install --no-deps --no-cache-dir -U torchvision==0.20.* \
&& rm -rf ~/.cache/pip/*

RUN useradd -m model-server \
&& mkdir -p /home/model-server/tmp /opt/ml/model \
&& chown -R model-server /home/model-server /opt/ml/model

COPY neuron-entrypoint.py /usr/local/bin/dockerd-entrypoint.py
COPY neuron-monitor.sh /usr/local/bin/neuron-monitor.sh
COPY torchserve-neuron.sh /usr/local/bin/entrypoint.sh
COPY --chmod=755 neuron-entrypoint.py /usr/local/bin/dockerd-entrypoint.py
COPY --chmod=755 neuron-monitor.sh deep_learning_container.py /usr/local/bin/
COPY --chmod=755 torchserve-neuron.sh /usr/local/bin/entrypoint.sh
COPY config.properties /home/model-server

RUN chmod +x /usr/local/bin/dockerd-entrypoint.py \
&& chmod +x /usr/local/bin/neuron-monitor.sh \
&& chmod +x /usr/local/bin/entrypoint.sh

ADD https://raw.githubusercontent.com/aws/deep-learning-containers/master/src/deep_learning_container.py /usr/local/bin/deep_learning_container.py

RUN chmod +x /usr/local/bin/deep_learning_container.py

RUN pip install --no-cache-dir "sagemaker-pytorch-inference==${SM_TOOLKIT_VERSION}"

# patch default_pytorch_inference_handler.py to import torch_neuronx
RUN DEST_DIR=$(python -c "import os.path, sagemaker_pytorch_serving_container; print(os.path.dirname(sagemaker_pytorch_serving_container.__file__))") \
RUN ${PIP} install --no-cache-dir "sagemaker-pytorch-inference==${SM_TOOLKIT_VERSION}" \
# patch default_pytorch_inference_handler.py to import torch_neuronx
&& DEST_DIR=$(python -c "import os.path, sagemaker_pytorch_serving_container; print(os.path.dirname(sagemaker_pytorch_serving_container.__file__))") \
&& DEST_FILE=${DEST_DIR}/default_pytorch_inference_handler.py \
&& sed -i "s/import torch/import torch, torch_neuronx/" ${DEST_FILE}
&& sed -i "s/import torch/import torch, torch_neuronx/" ${DEST_FILE} \
&& rm -rf ~/.cache/pip/*

RUN HOME_DIR=/root \
&& curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip \
Expand All @@ -162,9 +128,39 @@ RUN HOME_DIR=/root \

RUN curl -o /license.txt https://aws-dlc-licenses.s3.amazonaws.com/pytorch-2.5/license.txt

# Neuron SDK pre-release packages
ARG NEURON_ARTIFACT_PATH=/root/neuron_artifacts
ARG NEURONX_RUNTIME_LIB_VERSION=2.24.53.0-f239092cc
ARG NEURONX_COLLECTIVES_LIB_VERSION=2.24.59.0-838c7fc8b
ARG NEURONX_TOOLS_VERSION=2.22.61.0

RUN --mount=type=bind,source=apt,target=${NEURON_ARTIFACT_PATH}/apt \
apt-get install -y \
${NEURON_ARTIFACT_PATH}/apt/${NEURONX_TOOLS_VERSION} \
${NEURON_ARTIFACT_PATH}/apt/${NEURONX_COLLECTIVES_LIB_VERSION} \
${NEURON_ARTIFACT_PATH}/apt/${NEURONX_RUNTIME_LIB_VERSION} \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /tmp/tmp* \
&& apt-get clean

ARG NEURONX_FRAMEWORK_VERSION=2.5.1.2.6.0
ARG NEURONX_TRANSFORMERS_VERSION=0.13.470
ARG NEURONX_CC_VERSION=2.17.194.0
ARG NEURONX_DISTRIBUTED_VERSION=0.11.0
ARG NEURONX_DISTRIBUTED_INFERENCE_VERSION=0.2.0

RUN --mount=type=bind,source=pip,target=${NEURON_ARTIFACT_PATH}/pip \
${PIP} install --no-cache-dir --find-links ${NEURON_ARTIFACT_PATH}/pip \
${NEURON_ARTIFACT_PATH}/pip/${NEURONX_CC_VERSION} \
${NEURON_ARTIFACT_PATH}/pip/${NEURONX_FRAMEWORK_VERSION} \
${NEURON_ARTIFACT_PATH}/pip/${NEURONX_TRANSFORMERS_VERSION} \
&& ${PIP} install --no-deps --find-links -U ${NEURON_ARTIFACT_PATH}/pip/${NEURONX_DISTRIBUTED_VERSION} \
&& ${PIP} install --no-deps --find-links -U ${NEURON_ARTIFACT_PATH}/pip/${NEURONX_DISTRIBUTED_INFERENCE_VERSION} \
&& rm -rf ~/.cache/pip/*

EXPOSE 8080 8081

ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]
CMD ["/usr/local/bin/entrypoint.sh"]

HEALTHCHECK CMD curl --fail http://localhost:8080/ping || exit 1
HEALTHCHECK CMD curl --fail http://localhost:8080/ping || exit 1
Loading