Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when restarting minikube (v1.32.0-beta0) and using gpu #17405

Closed
rafariossaa opened this issue Oct 11, 2023 · 4 comments · Fixed by #17488
Closed

Error when restarting minikube (v1.32.0-beta0) and using gpu #17405

rafariossaa opened this issue Oct 11, 2023 · 4 comments · Fixed by #17488
Assignees

Comments

@rafariossaa
Copy link

What Happened?

When I try to stop and then start minikube v1.32.0-beta0 it gives me an error:

$ minikube start --driver docker --container-runtime docker --gpus all --cpus=8 --memory=8G
😄  minikube v1.31.2 on Ubuntu 23.04
✨  Using the docker driver based on user configuration
📌  Using Docker driver with root privileges
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
🔥  Creating docker container (CPUs=8, Memory=8192MB) ...
❗  Using GPUs with the Docker driver is experimental, if you experience any issues please report them at: https://github.com/kubernetes/minikube/issues/new/choose
🛠   Installing the NVIDIA Container Toolkit...
🐳  Preparing Kubernetes v1.28.2 on Docker 24.0.6 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
    ▪ Using image nvcr.io/nvidia/k8s-device-plugin:v0.14.1
🔎  Verifying Kubernetes components...
🌟  Enabled addons: storage-provisioner, nvidia-device-plugin, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default


$ minikube stop
✋  Stopping node "minikube"  ...
🛑  Powering off "minikube" via SSH ...
🛑  1 node stopped.


$ minikube start --driver docker --container-runtime docker --gpus all --cpus=8 --memory=8G
😄  minikube v1.31.2 on Ubuntu 23.04
✨  Using the docker driver based on existing profile
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
🔄  Restarting existing docker container for "minikube" ...
❗  Using GPUs with the Docker driver is experimental, if you experience any issues please report them at: https://github.com/kubernetes/minikube/issues/new/choose
🛠   Installing the NVIDIA Container Toolkit...

❌  Exiting due to RUNTIME_ENABLE: Failed to enable container runtime: failed installing the NVIDIA Container Toolkit: /bin/bash -c "curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg": Process exited with status 2
stdout:

stderr:
gpg: cannot open '/dev/tty': No such device or address
curl: (23) Failed writing body


╭───────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                           │
│    😿  If the above advice does not help, please let us know:                             │
│    👉  https://github.com/kubernetes/minikube/issues/new/choose                           │
│                                                                                           │
│    Please run `minikube logs --file=logs.txt` and attach logs.txt to the GitHub issue.    │
│                                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────╯

In the last step, it happens the same if I just run minikube start or if I use the rest of parameters that you can find above.
To make it work I need to delete it and then start it again.
There is not problems if I don't use the --gpu parameter.

This comes from #17380

Attach the log file

--

Operating System

Ubuntu

Driver

Docker

@doker78
Copy link

doker78 commented Oct 22, 2023

Hi here
Have success started and stopped minikube with experimental set up minikube config set driver none and minikube config set contaner-runtime=containerd everything work fine getting tests passed from gpu workflows on it

Whn switching to 'driver=docker' after stopping getting the same error - found the following bug in Debian distro

gpg: cannot open '/dev/tty': No such device or address

More information here:
debian bug #913614

How to fix: Adding --no-tty flag to gpg command

so i guess the fix will be here to run the new build with the following flag included

/bin/bash -c "curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --no-tty --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

@spowelljr
Copy link
Member

Thanks for the fix suggestion @doker78, I'm going to try this out later today and update the command if it resolves it for me as well.

@wings2020
Copy link

wings2020 commented Oct 25, 2023

Hi @doker78 ,

I use Redhat 7.5 and my server is in an air-gap network, I tried your solution:

/bin/bash -c "cat gpgkey" | sudo gpg --no-tty --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

But still, get the failed result:

$ minikube start --driver docker --container-runtime docker --gpus all --cpus=8 --memory=8G
😄 minikube v1.31.2 on Redhat 7.5
✨ Using the docker driver based on existing profile
👍 Starting control plane node minikube in cluster minikube
🚜 Pulling base image ...
🔄 Restarting existing docker container for "minikube" ...
❗ Using GPUs with the Docker driver is experimental, if you experience any issues please report them at: https://github.com/kubernetes/minikube/issues/new/choose
🛠 Installing the NVIDIA Container Toolkit...

❌ Exiting due to RUNTIME_ENABLE: Failed to enable container runtime: failed installing the NVIDIA Container Toolkit: /bin/bash -c "curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg": Process exited with status 2
stdout:

stderr:
gpg: cannot open '/dev/tty': No such device or address
curl: (6) Could not resolve host: nvidia.github.io

do you have any other solution?
I also have done the command of this page:
https://minikube.sigs.k8s.io/docs/tutorials/nvidia/

Operating System
Redhat 7.5

Driver
Docker

NVIDIA-SMI
515.65.01

CUDA Version
11.7

@rafariossaa
Copy link
Author

This worked with build #17488 .
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants