While this guide mentions about installing Driverless AI from scratch on Azure, it can be used on bare-metal machine or on any other cloud VM from the
Install Nvidia driver
step onwards.
- Select OS
- Login to Azure console and create a new compute instance.
- Select Ubuntu Server provided by Canonical.
- Select Ubuntu version as 16.04 LTS and Deployment Method as Resource Manager.
- Select Azure VM Size
- Provide necessary details like VM Name, Regions etc. as applicable to your Azure subscription.
- Most important selection here is the Instance Size. For this exercise I selected the least costly instance with a GPU card NC6 as seen in the image below. Consider the proper Azure instance sizing recommendation based on your use case.
- Configure authentication
- Configure authentication settings either using password or public-private key pair.
- Configure Storage
- SSD is recommended persistance store for Driverless AI.
- For this setup, I installed DAI on the same disk where the OS is installed.
- By default, Azure VMs are configured with OS disk size of 30GB approx. This is not sufficient for DAI.
- To increase the OS disk size, once the VM is running you will need to stop it. Once stopped, resize the OS disk partition to at-least 500GB and then restart the server.
- If you are doing the DEB install, DAI will put the bulk of its data in the
/opt/h2oai/dai
directory. So if you are attaching an additional drive, ensure that you mount it at/opt
. - If you are going with a docker based approach, you can mount the disk to any mount point as you will be mapping the host directories as volumes in the docker container.
- For real use cases it is strongly recommended to not persist any application information on the OS drive, but to attach a data disk to the Azure VM and to use this data drive for persisting DAI information. Premium SSD are recommended.
- Configure Networking
- Configure networking as needed.
- At a minimum, ensure that your compute instance would have a public IP.
- Configure the Network Security Group to allow incoming connections to port 22 (for SSH connection) and 12345 (for Driverless AI web UI).
- I accepted defaults for Monitoring and Management options, but you can configure as your need.
- Guest Configs
- Azure provides capability to install Nvidia drivers and CUDA libraries as a guest extension.
- I decided to install Nvidia drivers and CUDA manually to ensure that everything is compatible and per our preference.
H2O Driverless AI uses Tensorflow 1.11 built against CUDA 9.0, hence this is the recommended CUDA version to use. Per Nvidia Compatibility Matrix, Nvidia driver version 384.XX is the minimum version needed and was the default when CUDA 9.0 was shipped. Per Nvidia Hardware Support, driver 384.xx does not support the latest Turing architecture cards. The latest Nvidia Driver we have tested to work with CUDA 9.0 and Driverless AI is the 418.39+ branch. We install 418.XX in the steps below.
- Once the server is up, ssh to it.
- Run the following commands to get it up to date
sudo apt-get update
sudo apt-get upgrade
- Check if
nouveau
nvidia driver is installed on the systemsudo lsmod | grep nouveau
. - If the above command does not return anything, then you dont have Nouveau drivers installed and can proceed to install Nvidia drivers.
- On the other hand, if Nouveau drivers are installed and loaded, then you need to follow the steps for your Linux version to Disable Nouveau Drivers. For Ubuntu, the steps are
sudo touch /etc/modprobe.d/blacklist-nouveau.conf
sudo echo 'blacklist nouveau' >> /etc/modprobe.d/blacklist-nouveau.conf
sudo echo 'options nouveau modeset=0' >> /etc/modprobe.d/blacklist-nouveau.conf
sudo update-initramfs -u
- Ensure to restart the server before proceeding ahead.
- Once the server is up, ssh to it.
- Navigate to Nvidia CUDA download archive, and select
Linux
>x86_64
>Ubuntu
>16.04
>deb (network)
. - Copy the link to the
cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
file. - On the ssh session, download the file using
wget <put the copied link here>
to the server. - Install the downloaded package
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
- Add the apt key
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
- Update the apt package list
sudo apt-get update
- We install CUDA 9.0 and the requisite Nvidia driver using CUDA Meta-packages. To install needed meta-packages issue command
sudo apt install cuda-9-0
- The above step would install CUDA libraries in
/usr/local/cuda
directory, wherecuda
is a soft link to the currently used CUDA version. This means that one can install more than one CUDA versions on the same machine. - At this point you will need to restart the machine. This will ensure that nvidia drivers are correctly loaded to the kernel.
- Once the machine restarts, check if Nvidia drivers are correctly loaded in the kernel
lsmod | grep nvidia
should result in some output. Also, issue the commandnvidia-smi
to check if the GPUs are correctly detected. If you get a response for both these commands, your Nvidia driver is vaildated to be installed correctly. - We update
$PATH
to include the CUDAbin
directory. Issue the commandexport PATH=/usr/local/cuda/bin:$PATH
. - To validate CUDA installation we will install CUDA sample code in
$HOME
directory, compile a CUDA program and test if it works. In the below steps we compile thedeviceQuery
sample and execute it. If it displays details about the CUDA interface and GPU details then we have successfully installed CUDA library
cd $HOME
cuda-install-samples-9.0.sh .
cd NVIDIA_Sam*
cd 1_Utilities/deviceQuery
make
./deviceQuery
- If the output of
deviceQuery
shows the GPUs installed on the system then CUDA is validated to be installed as expected.
- To install cuDNN issue the below commands
cd $HOME
wget https://s3-us-west-2.amazonaws.com/h2o-internal-release/libcudnn7_7.3.1.20-1%2Bcuda9.0_amd64.deb
wget https://s3-us-west-2.amazonaws.com/h2o-internal-release/libcudnn7-doc_7.3.1.20-1%2Bcuda9.0_amd64.deb
wget https://s3-us-west-2.amazonaws.com/h2o-internal-release/libcudnn7-dev_7.3.1.20-1%2Bcuda9.0_amd64.deb
sudo dpkg -i libcudnn7_7.3.1.20-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-dev*.deb
sudo dpkg -i libcudnn7-doc*.deb
- Driverless AI requires the persistance mode to be enabled on each GPU that would be used with DAI
- To manually enable persistance mode on all GPUs issue the command
sudo nvidia-smi -pm 1
- To validate, issue the command
nvidia-smi
and verify that persistance mode setting is turned ON. - We recommend setting up Nvidia Persistance daemon to manage the persistance mode setting so that you do not need to set it up after each restart. This would require you to setup a service based on the init system on your machine. For systemd, which should be available on Ubuntu 16.04 the steps are as below
- Create a file
/etc/systemd/system/nvidia-persistenced.service
- The contents of the above file should be from https://raw.githubusercontent.com/NVIDIA/nvidia-persistenced/master/init/systemd/nvidia-persistenced.service.template
- Replace text
__USER__
in the line that starts with ExecStart ...... withnvidia-persistenced
. - Save the file.
- Reload the changes in systemd
sudo systemctl daemon-reload
. - Enable the service to start during server startup
sudo systemctl enable nvidia-persistenced
- Start the service
sudo systemctl start nvidia-persistenced
- If you want to test this, you can restart the server at this point and check for the value of persistance mode using
nvidia-smi
command.
- Create a file
- OpenCL is required for LightGBM to run on GPUs.
sudo apt-get install opencl-headers clinfo ocl-icd-opencl-dev
mkdir -p /etc/OpenCL/vendors && \
echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
At this point your system setup tasks are completed. You can now proceed with a native deb package install of Driverless AI or proceed to install
Docker CE
andnvidia-docker
for a docker based installation of Driverless AI.
- If you want docker container based Driverless AI install, skip this section and proceed from Install Docker CE onwards.
- If you want a deb based install, follow the steps in this section and do not follow any of the docker installation sections below.
- Download latest Driverless AI deb package from https://www.h2o.ai/download/#driverless-ai. You can get the URL and issue the command
wget <paste URL here>
to download the file. - Issue the command
sudo dpkg -i <dai file downloaded>.deb
to install Driverless AI. - Proceed to Driverless AI documentation to understand the steps to manage Driverless AI i.e. start, stop, uninstall, update
Great, you should be done with native installation of Driverless AI.
Follow on from here in case you are doing a Docker install for H2O Driverless AI
- Update the system
sudo apt-get update
- Install needed packages
sudo apt install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
- Add docker GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
- Verify fingerprint is of docker
sudo apt-key fingerprint 0EBFCD88
- Add repository
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
- Update the packages again
sudo apt update
- Install docker
sudo apt-get install docker-ce docker-ce-cli containerd.io
- To execute docker commands the user needs to be part of the
docker
group. To add the user to thedocker
group issue the commandusermod -aG docker $USER
- Exit your shell and reconnect.
- Issue the command
id
, and verify the user is part ofdocker
group. - To verify all is ok issue the command
docker run --rm hello-world
. It will pull a docker image from the docker hub and finally display aHello World
message
- To install nvidia-docker2, we need to get the repository added to the apt list Reference
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
- Next, install nvidia-docker2 using the command
sudo apt install nvidia-docker2
- Restart the docker daemon using the command
sudo pkill -SIGHUP dockerd
- To validate, execute the command
nvidia-docker2 run --rm nvidia/cuda nvidia-smi
and this should give you desired output. It is critical to make note of the value of persistance mode being detected withing the docker environment. Ensure that the value isON
and notOFF
.
nvidia-docker2 install reference
- Download latest Driverless AI docker image from https://www.h2o.ai/download/#driverless-ai
- Load the download image to docker using command
docker load < dai_image_name.tar.gz
. Substitute the correct file name. - Proceed with installing Driverless AI following the directions step 5 onwards on that page.