Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA GPU no longer working on Jailmaker v1.1.0 #95

Closed
dalgibbard opened this issue Mar 2, 2024 · 3 comments · Fixed by #96
Closed

NVIDIA GPU no longer working on Jailmaker v1.1.0 #95

dalgibbard opened this issue Mar 2, 2024 · 3 comments · Fixed by #96

Comments

@dalgibbard
Copy link
Contributor

As per title; I've just updated the jailmaker script from v1.0.1 to v1.1.0, and the NVIDIA GPU on my system is no longer accessible inside the nspawn jail/container which was working prior to the update.

To recreate the issue with a new jail:

# jlmkr create -gn 1 -gi 0 --docker_compatible 1 --distro ubuntu --release jammy  test
Creating jail test with default config.
Overriding distro config value with ubuntu.
Overriding docker_compatible config value with 1.
Overriding gpu_passthrough_nvidia config value with 1.
Overriding release config value with jammy.
The cached copy has expired, re-downloading...
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs

---
You just created an Ubuntu jammy amd64 (20240302_07:42) container.

Starting the container on v1.1.0 looks like:

Starting jail test with the following command:

systemd-run --collect --property=Delegate=yes --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=TasksMax=infinity --property=Type=notify --setenv=SYSTEMD_NSPAWN_LOCK=0 --property=KillMode=mixed --unit=jlmkr-test --working-directory=./jails/test '--description=My nspawn jail test [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --bind-ro=/sys/module --boot --inaccessible=/sys/module/apparmor --quiet --keep-unit --machine=test --directory=rootfs --capability=all '--property=DeviceAllow=char-drm rw'

I don't see any NVIDIA-related mounts here? eg. nvidia-smi is not available in the container either etc. If i mount it in manually, it errors about missing libraries.

Reverting back to v1.0.1 script and running it again, the appropriate NVIDIA mounts are back:

Starting jail test with the following command:

systemd-run --collect --property=Delegate=yes --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=TasksMax=infinity --property=Type=notify --setenv=SYSTEMD_NSPAWN_LOCK=0 --property=KillMode=mixed --unit=jlmkr-test --working-directory=./jails/test '--description=My nspawn jail test [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --bind-ro=/sys/module --bind=/usr/bin/nvidia-smi --boot --inaccessible=/sys/module/apparmor --quiet --keep-unit --machine=test --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-nvvm.so.535.54.03 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.535.54.03 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.535.54.03 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.535.54.03 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.535.54.03 --bind-ro=/usr/bin/nvidia-persistenced --bind=/dev/nvidia0 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.535.54.03 --bind=/dev/nvidia-uvm-tools --bind=/dev/nvidiactl --bind=/dev/nvidia-uvm --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.535.54.03 --bind-ro=/usr/bin/nvidia-smi --bind-ro=/usr/lib/nvidia/current/nvidia-smi

This behaviour is confirmed on TrueNAS Scale 23.10.2 and 23.10.1.1 with an RTX3060 12GB

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 2, 2024

Sorry about that! Can you test the latest commit on the develop branch please? It should have fixed this mistake.

@dalgibbard
Copy link
Contributor Author

Yup! Fixed, thanks for the quick turnaround!

@mooglestiltzkin
Copy link

Reverting back to v1.0.1 script and running it again, the appropriate NVIDIA mounts are back:

Starting jail test with the following command:

systemd-run --collect --property=Delegate=yes --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=TasksMax=infinity --property=Type=notify --setenv=SYSTEMD_NSPAWN_LOCK=0 --property=KillMode=mixed --unit=jlmkr-test --working-directory=./jails/test '--description=My nspawn jail test [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --bind-ro=/sys/module --bind=/usr/bin/nvidia-smi --boot --inaccessible=/sys/module/apparmor --quiet --keep-unit --machine=test --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-nvvm.so.535.54.03 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.535.54.03 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.535.54.03 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.535.54.03 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.535.54.03 --bind-ro=/usr/bin/nvidia-persistenced --bind=/dev/nvidia0 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.535.54.03 --bind=/dev/nvidia-uvm-tools --bind=/dev/nvidiactl --bind=/dev/nvidia-uvm --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.535.54.03 --bind-ro=/usr/bin/nvidia-smi --bind-ro=/usr/lib/nvidia/current/nvidia-smi

This behaviour is confirmed on TrueNAS Scale 23.10.2 and 23.10.1.1 with an RTX3060 12GB

looked at your config and noticed this

--capability=all

isn't that dangerous? i thought someone here had some serious networking issue because of that setting. thought i should warn you
#119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants