These instructions are intended to set up a deep learning environment for GPU-powered tensorflow.
See here for pytorch GPU install instructions
After following these instructions you'll have:
- Ubuntu 16.04.
- Cuda 9.0 drivers installed.
- A conda environment with python 3.6.
- The latest tensorflow version with gpu support.
Before you begin, you may need to disable the opensource ubuntu NVIDIA driver called nouveau.
Option 1: Modify modprobe file
-
After you boot the linux system and are sitting at a login prompt, press ctrl+alt+F1 to get to a terminal screen. Login via this terminal screen.
-
Create a file: /etc/modprobe.d/nouveau-blacklist.conf e.g. by
sudo touch /etc/modprobe.d/nouveau-blacklist.conf
- Put the following in the above file...
blacklist nouveau
options nouveau modeset=0
- Regenerate the kernel initramfs
sudo update-initramfs -u
- reboot system
reboot
- On reboot, verify that noveau drivers are not loaded
lsmod | grep nouveau
If nouveau
driver(s) are still loaded do not proceed with the installation guide and troubleshoot why it's still loaded.
Option 2: Modify Grub load command
From this stackoverflow solution
- When the GRUB boot menu appears : Highlight the Ubuntu menu entry and press the E key. Add the nouveau.modeset=0 parameter to the end of the linux line ... Then press F10 to boot.
- When login page appears press [ctrl + ALt + F1]
- Enter username + password
- Uninstall every NVIDIA related software:
sudo apt-get purge nvidia*
sudo reboot
- update apt-get
sudo apt-get update
- Install apt-get deps
sudo apt-get install openjdk-8-jdk git python-dev python3-dev python-numpy python3-numpy build-essential python-pip python3-pip python-virtualenv swig python-wheel libcurl3-dev curl
- install nvidia drivers
# The 16.04 installer works with 16.10.
# download drivers
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
# download key to allow installation
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
# install actual package
sudo dpkg -i ./cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
# install cuda (but it'll prompt to install other deps, so we try to install twice with a dep update in between
sudo apt-get update
sudo apt-get install cuda-9-0
2a. reboot Ubuntu
sudo reboot
2b. check nvidia driver install
nvidia-smi
# you should see a list of gpus printed
# if not, the previous steps failed.
- Install cudnn
wget https://s3.amazonaws.com/open-source-william-falcon/cudnn-9.0-linux-x64-v7.3.1.20.tgz
sudo tar -xzvf cudnn-9.0-linux-x64-v7.3.1.20.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
- Add these lines to end of ~/.bashrc:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda
export PATH="$PATH:/usr/local/cuda/bin"
4a. Reload bashrc
source ~/.bashrc
- Install miniconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# press s to skip terms
# Do you approve the license terms? [yes|no]
# yes
# Miniconda3 will now be installed into this location:
# accept the location
# Do you wish the installer to prepend the Miniconda3 install location
# to PATH in your /home/ghost/.bashrc ? [yes|no]
# yes
5a. Reload bashrc
source ~/.bashrc
- Create python 3.6 conda env to install tf
conda create -n tensorflow python=3.6
# press y a few times
- Activate env
source activate tensorflow
- update pip (might already be up to date, but just in case...)
pip install --upgrade pip
- Install stable tensorflow with GPU support for python 3.6
pip install --upgrade tensorflow-gpu
# If the above fails, try the part below
# pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp36-cp36m-linux_x86_64.whl
- Test tf install
# start python shell
python
# run test script
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
# when you run sess, you should see a bunch of lines with the word gpu in them (if install worked)
# otherwise, not running on gpu
sess = tf.Session()
print(sess.run(hello))
or alternatively
tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"