-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Please provide us with the following information:
This issue is a: (mark with an x)
- bug report -> please search issues before submitting
- documentation issue or request
- regression (a behavior that used to work and stopped in a new release)
Issue description
ACA containers with serverless GPU support don't have vulkan drivers injected (by Nvidia Container Runtime) even when graphics driver capabilties are requested via NVIDIA_DRIVER_CAPABILITIES=all or NVIDIA_DRIVER_CAPABILITIES=graphics
This is preventing us from running rendering and visualisation workloads.
Steps to reproduce
- Create an ACA app using the Microsoft sample image as per this tutorial https://learn.microsoft.com/en-us/azure/container-apps/gpu-image-generation?pivots=azure-portal
- Add the NVIDIA_DRIVER_CAPABILITIES=all environment variable
- Start or restart the app
- Use the console to install vulkan-utils (apt update && apt-get -y install vulkan-tools)
- Run nvidia-smi to confirm GPU is mounted
- Run vulkaninfo to confirm Vulkan client drivers are working
Expected behavior [What you expected to happen.]
nvidia-smi enumerates T4 GPU
and
vulkaninfo displays information about the T4 GPU
Actual behavior [What actually happened.]
nvidia-smi enumerates the T4
but
vulkaninfo shows that no driver can be loaded
===========
VULKAN INFO
===========
Vulkan Instance Version: 1.1.70
Cannot create Vulkan instance.
/build/vulkan-UL09PJ/vulkan-1.1.70+dfsg1/demos/vulkaninfo.c:768: failed with VK_ERROR_INCOMPATIBLE_DRIVER
Additional context
Our ultimate aim is to deploy this as an app container job, but since apps and jobs share the same environment I assume whatever is preventing nvidia container runtime correctly injecting the vulkan drivers is the same in both cases.