Skip to content

NVIDIA GPU support #23917

@3XX0

Description

@3XX0

Hello, author of nvidia-docker here.

As many of you may know, we recently released our NVIDIA Docker project in our effort to enable GPUs in containerized applications (mostly Deep Learning). This project is currently comprised of two parts:

  • A Docker volume plugin to mount NVIDIA driver files inside containers at runtime.
  • A small wrapper around Docker to ease the deployment of our images.

More information on this here

While it has been working great so far, now that Docker 1.12 is coming out with configurable runtime and complete OCI support, we would like to move away from this approach (which is admittedly hacky) and work on something which is better integrated with Docker.

The way I see it would be to provide a prestart OCI hook which would effectively trigger our implementation and configure the cgroups/namespaces correctly.

However, there are several things we need to solve first, specifically:

  1. How to detect if a given image needs GPU support
    Currently, we are using a special label com.nvidia.volumes.needed, but it is not exported as an OCI annotation (see Clarify how OCI configuration ("config.json") will be handled #21324)
  2. How to pass down to the hook which GPU should be isolated
    Currently, we are using an environment variable NV_GPU
  3. How to check whether the image is compatible with the current driver or not.
    Currently, we are using a special label XXX_VERSION

All of the above could be solved using environment variables but I'm not particularly fond of this idea (e.g. docker run -e NVIDIA_GPU=0,1 nvidia/cuda)

So is there a way to pass runtime/hook parameters from the docker command line and if not would it be worth it? (e.g. --runtime-opt)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions