- Containerized Nvidia Mellanox drivers
This repository provides means to build driver containers for various distributions.
Driver containers offered:
- Mellanox OFED driver container : Mellanox out of tree networking driver
- NV Peer Memory driver container : Nvidia Peer memory client driver for GPU-Direct
Driver containers are containers that allow provisioning of a driver on the host. They provide several benefits over a standard driver installation, for example:
- Ease of deployment
- Fast installation
This container is intended to be used as an alternative to host installation by simply deploying the container image on the host the container will:
- Reload Kernel modules provided by Mellanox OFED
- Mount the container's root fs to
/run/mellanox/drivers/
. Should this directory be mapped to the host, the content of this container will be made available to be shared with host or other containers. A use-case for it would be compilation of Nvidia Peer Memory client modules.
It is required to build the image on the same OS and kernel as it will be deployed.
The provided Dockerfiles provide several build arguments to provide the flexibility to build a container image for various driver version and platforms.
D_OFED_VERSION
: Mellanox OFED version as appears in Mellanox OFED download page, e.g5.0-2.1.8.0
D_OS
: Operating System version as appears in Mellanox OFED downlload page, e.gubuntu20.04
D_ARCH
: CPU architecture as appears in Mellanox OFED download page, e.gx86_64
D_BASE_IMAGE
: Base image to be used for driver container image build. Default:ubuntu:20.04
# docker build -t ofed-driver \
--build-arg D_BASE_IMAGE=ubuntu:20.04 \
--build-arg D_OFED_VERSION=5.0-2.1.8.0 \
--build-arg D_OS=ubuntu20.04 \
--build-arg D_ARCH=x86_64 \
ubuntu/
Coming soon...
# docker run --rm -it \
-v /run/mellanox/drivers:/run/mellanox/drivers:shared \
-v /etc/network:/etc/network \
-v /etc:/host/etc \
-v /lib/udev:/host/lib/udev \
--net=host --privileged ofed-driver
This container is intended to be used as an alternative to host installation by simply deploying the container image on the host the container will:
- Compile
nv_peer_mem
kernel module - Reload
nv_peer_mem
kernel module
As Nvidia peer memory client module requires to be compiled against Mellanox OFED and Nvidia drivers currently installed
on the machine, it expects the root fs where Mellanox OFED drivers are installed to be mounted at /run/mellanox/drivers
And the root fs where Nvidia drivers are installed to be mounted at /run/nvidia/drivers
.
This is best suited when both Mellanox NIC and Nvidia GPU drivers are provisioned via driver containers as they offer to expose their container rootfs.
D_BASE_IMAGE
Base image to be used when building the container image (Default:ubuntu:20.04
)D_NV_PEER_MEM_BRANCH
Branch/Tag of nv_peer_memory repositroy (Default:master
)
# docker build -t nv-peer-mem \
--build-arg D_BASE_IMAGE=ubuntu:20.04 \
--build-arg D_NV_PEER_MEM_BRANCH=1.0-9 \
gpu-direct/ubuntu/
Coming soon...
In the example below, Mellanox driver container rootfs is mounted on the host at /run/mellanox/drivers
and Nvidia driver container rootfs is mounted on the host at /run/nvidia/driver
# docker run --rm -it \
-v /run/mellanox/drivers:/run/mellanox/drivers \
-v /run/nvidia/driver:/run/nvidia/drivers \
--privileged nv-peer-mem
A driver container load kernel modules into the running kernel preceded by a possible compilation step.
The process is not atomic as:
- A driver is often composed of multiple modules which are loaded sequentially into the kernel.
- Compilation (if it takes place) takes time.
To mark the completion of the driver loading phase by the driver container,
a file is created at the container's root directory: /.driver-ready
.
Its existence indicates that the driver has been successfully loaded into the running kernel.
This can be used by a container orchestrator to probe for readiness of a driver container.
Having rdma-core
package installed on the host may prevent Mellanox OFED driver container
to properly load drivers. This is due to the fact that rdma-core
places udev rules that trigger
driver module load from the host as well as load storage modules on system startup.