Skip to content

Commit

Permalink
Use CircleCI docker+nvidia capable images (pytorch#1885)
Browse files Browse the repository at this point in the history
Use CircleCI docker+nvidia capable ubuntu-16.04 image
Kill nvidia driver and docker installation and rely on docker runtime provided by CircleCI and install only `expect-dev` and `moreutils` for `ts` and `unbuffer` tools

This is a preparatory change for Ubuntu-20.04 update
  • Loading branch information
malfet authored Apr 12, 2022
1 parent aab1dae commit 7b2d5dd
Showing 1 changed file with 6 additions and 37 deletions.
43 changes: 6 additions & 37 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ setup_linux_system_environment: &setup_linux_system_environment
pytorch_tutorial_build_defaults: &pytorch_tutorial_build_defaults
machine:
image: ubuntu-1604:201903-01
image: ubuntu-1604-cuda-10.2:202012-01
steps:
- checkout
- run:
Expand All @@ -72,45 +72,14 @@ pytorch_tutorial_build_defaults: &pytorch_tutorial_build_defaults
command: |
set -e
# Set up NVIDIA docker repo
curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get -y update
sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic docker-ce
# WARNING: Docker version is hardcoded here; you must update the
# version number below for docker-ce and nvidia-docker2 to get newer
# versions of Docker. We hardcode these numbers because we kept
# getting broken CI when Docker would update their docker version,
# and nvidia-docker2 would be out of date for a day until they
# released a newer version of their package.
#
# How to figure out what the correct versions of these packages are?
# My preferred method is to start a Docker instance of the correct
# Ubuntu version (e.g., docker run -it ubuntu:16.04) and then ask
# apt what the packages you need are. Note that the CircleCI image
# comes with Docker.
sudo apt-get -y install \
linux-headers-$(uname -r) \
linux-image-generic \
moreutils \
docker-ce=5:18.09.4~3-0~ubuntu-xenial \
nvidia-container-runtime=2.0.0+docker18.09.4-1 \
nvidia-docker2=2.0.3+docker18.09.4-1 \
expect-dev
sudo pkill -SIGHUP dockerd
sudo apt-get -y install expect-dev moreutils
sudo pip -q install awscli==1.16.35
if [ -n "${CUDA_VERSION}" ]; then
DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"
wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
nvidia-smi
fi
if [ -n "${CUDA_VERSION}" ]; then
nvidia-smi
fi
# This IAM user only allows read-write access to ECR
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_ONLY}
Expand Down Expand Up @@ -138,7 +107,7 @@ pytorch_tutorial_build_defaults: &pytorch_tutorial_build_defaults
echo "DOCKER_IMAGE: "${DOCKER_IMAGE}
docker pull ${DOCKER_IMAGE} >/dev/null
if [ -n "${CUDA_VERSION}" ]; then
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
export id=$(docker run --gpus all -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
else
export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
fi
Expand Down

0 comments on commit 7b2d5dd

Please sign in to comment.