-
Notifications
You must be signed in to change notification settings - Fork 117
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io> Signed-off-by: Noel Georgi <git@frezbo.dev>
- Loading branch information
1 parent
495cabb
commit 215aa82
Showing
26 changed files
with
707 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,12 @@ | ||
# syntax = ghcr.io/talos-systems/bldr:v0.2.0-alpha.6-frontend | ||
# syntax = ghcr.io/siderolabs/bldr:v0.2.0-alpha.7-1-g9d49478-frontend | ||
|
||
format: v1alpha2 | ||
|
||
vars: | ||
TOOLS_IMAGE: ghcr.io/talos-systems/tools:v0.10.0-alpha.0-5-g8197edb | ||
LINUX_FIRMWARE_IMAGE: ghcr.io/talos-systems/linux-firmware:v0.9.0-2-g447ce75 | ||
TOOLS_IMAGE: ghcr.io/siderolabs/tools:v1.1.0-alpha.0-2-gbfc99ca | ||
LINUX_FIRMWARE_IMAGE: ghcr.io/siderolabs/linux-firmware:v1.0.0-5-g615d1a0 | ||
NVIDIA_DRIVER_VERSION_MAJOR: 510 | ||
NVIDIA_DRIVER_VERSION_MINOR: 54 | ||
|
||
labels: | ||
org.opencontainers.image.source: https://github.com/talos-systems/extensions | ||
org.opencontainers.image.source: https://github.com/siderolabs/extensions |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
[runsc_config] | ||
# See https://github.com/talos-systems/extensions/issues/4 | ||
# See https://github.com/siderolabs/extensions/issues/4 | ||
ignore-cgroups = "true" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
module github.com/talos-systems/hello-world | ||
module github.com/siderolabs/hello-world | ||
|
||
go 1.17 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# development | ||
|
||
This document is intended as a guide to updating the `nvidia-container-toolkit` dependencies. | ||
|
||
## Components | ||
|
||
### [nvidia-container-cli](./nvidia-container-cli/) | ||
|
||
`nvidia-container-cli` is called by the `nvidia-container-runtime` to setup the required NVIDIA library mounts and NVIDIA device files for a workload container | ||
|
||
### [nvidia-container-runtime](./nvidia-container-runtime/) | ||
|
||
`nvidia-container-runtime` is the runtime used by `containerd` to run workload containers. It's mostly a wrapper around `runc` | ||
|
||
It also ships a tool called `nvidia-container-runtime-hook` which is used to setup OCI hooks, it's a symlink to `nvidia-container-toolkit`, which eventually calls `nvidia-container-cli` | ||
|
||
### [nvidia-device-create](./nvidia-device-create/) | ||
|
||
This is used to create the required NVIDIA device files under `/dev`. This required udev rules. | ||
|
||
### [glibc](./glibc/) | ||
|
||
`nvidia-container-cli` is fully dependent on `glibc` to be able to access the NVIDIA shared objects. | ||
|
||
## Updating the nvidia driver version | ||
|
||
- Update the driver version in `pkgs` repo [here](https://github.com/siderolabs/pkgs/blob/master/nonfree/kmod-nvidia/pkg.yaml) | ||
- Update the driver version [here](../Pkgfile) | ||
|
||
## Updating the nvidia-container-toolkit version | ||
|
||
- Update the `libnvidia-container` version [here](./nvidia-container-cli/pkg.yaml) | ||
- Update the `container-toolkit` version [here](./nvidia-container-runtime/pkg.yaml) | ||
|
||
Make sure to also update the `nvidia-device-create` [here](./nvidia-device-create/pkg.yaml) | ||
|
||
### Patches | ||
|
||
- [nvidia-container-cli](./nvidia-container-cli/patches/libnvidia-container/) | ||
- `common.h.patch` - use custom glibc interpreter path | ||
- `Makefile.patch` - build statically linked with `libcap` and `libseccomp` | ||
- `nvc_ldcache.c.patch` - use the standard `ld.so.cache` path inside the container | ||
- [container-runtime](./nvidia-container-runtime/patches/nvidia-container-runtime/) | ||
- `main.go.patch` - use custom path for the nvidia-container-runtime config | ||
- [container-runtime](./nvidia-container-runtime/patches/nvidia-container-toolkit/) | ||
- `hook_config.go.patch` - use custom path for the nvidia-container-runtime config | ||
- [nvidia-device-create](./nvidia-device-create/patches/nvidia-graphics-drivers-build/) | ||
- Makefile.patch - build statically linked with `libpciaccess` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# NVIDIA Container toolkit extension | ||
|
||
## Usage | ||
|
||
Enable the extension in the machine configuration before installing Talos: | ||
|
||
```yaml | ||
machine: | ||
install: | ||
extensions: | ||
- image: ghcr.io/siderolabs/nvidia-container-toolkit:<VERSION> | ||
``` | ||
The following NVIDIA modules needs to be loaded, so add this to the talos config: | ||
```yaml | ||
machine: | ||
kernel: | ||
modules: | ||
- name: nvidia | ||
- name: nvidia_uvm | ||
- name: nvidia_drm | ||
- name: nvidia_modeset | ||
``` | ||
`nvidia-container-cli` loads BPF programs and requires relaxed KSPP setting for [bpf_jit_harden](https://sysctl-explorer.net/net/core/bpf_jit_harden/), so Talos default setting | ||
should be overridden: | ||
|
||
```yaml | ||
machine: | ||
sysctls: | ||
net.core.bpf_jit_harden: 1 | ||
``` | ||
|
||
> Warning! This disables [KSPP best practices](https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings#sysctls) setting. | ||
|
||
## Testing | ||
|
||
Apply the following manifest to create a runtime class that uses the extension: | ||
|
||
```yaml | ||
--- | ||
apiVersion: node.k8s.io/v1 | ||
kind: RuntimeClass | ||
metadata: | ||
name: nvidia | ||
handler: nvidia | ||
``` | ||
|
||
Install the NVIDIA device plugin: | ||
|
||
```bash | ||
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin | ||
helm repo update | ||
helm install nvidia-device-plugin nvdp/nvidia-device-plugin --version=0.11.0 --set=runtimeClassName=nvidia | ||
``` | ||
|
||
Apply the following manifest to run CUDA pod via nvidia runtime: | ||
|
||
```yaml | ||
--- | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: cuda-vector-add | ||
spec: | ||
restartPolicy: OnFailure | ||
runtimeClassName: nvidia | ||
containers: | ||
- name: cuda-vector-add | ||
image: "quay.io/giantswarm/nvidia-gpu-demo:latest" | ||
resources: | ||
limits: | ||
nvidia.com/gpu: 1 | ||
``` | ||
|
||
The pod should be up and running: | ||
|
||
```bash | ||
❯ kubectl get pods | ||
NAME READY STATUS RESTARTS AGE | ||
cuda-vector-add 0/1 Completed 0 17m | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# libc default configuration | ||
/usr/local/lib | ||
|
||
/usr/local/glibc/lib | ||
/usr/lib | ||
/lib |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
name: glibc | ||
variant: scratch | ||
shell: /bin/bash | ||
dependencies: | ||
- image: ubuntu:22.04 | ||
steps: | ||
- sources: | ||
- url: https://ftpmirror.gnu.org/libc/glibc-2.35.tar.gz | ||
destination: glibc.tar.gz | ||
sha256: 3e8e0c6195da8dfbd31d77c56fb8d99576fb855fafd47a9e0a895e51fd5942d4 | ||
sha512: 45bf782aeda508e17fd51b45cf5ad96bd1067cf96b758b5c2d5def681af713df15e75c253d9c85de047f0a1dd22cf4f2239d70ae392cdb9291092e6570734d43 | ||
env: | ||
DEBIAN_FRONTEND: noninteractive | ||
prepare: | ||
- | | ||
apt update && \ | ||
apt install -y \ | ||
bison \ | ||
build-essential \ | ||
gawk \ | ||
gettext \ | ||
openssl \ | ||
python3 \ | ||
texinfo | ||
- | | ||
mkdir -p glibc glibc-build | ||
tar -xzf glibc.tar.gz --strip-components=1 -C glibc | ||
build: | ||
- | | ||
# unset the variables bldr sets by default | ||
unset CXXFLAGS | ||
unset LDFLAGS | ||
unset CFLAGS | ||
unset TARGET | ||
unset HOST | ||
cd glibc-build | ||
../glibc/configure \ | ||
--prefix=/usr/local/glibc \ | ||
--libdir=/usr/local/glibc/lib \ | ||
--libexecdir=/usr/local/glibc/lib \ | ||
--enable-stack-protector=strong | ||
make -j $(nproc) | ||
install: | ||
- | | ||
mkdir -p /rootfs | ||
cd glibc-build | ||
make install DESTDIR=/rootfs | ||
cp /pkg/ld.so.conf /rootfs/usr/local/glibc/etc/ld.so.conf | ||
# cleanup include, var and share | ||
rm -rf /rootfs/usr/local/glibc/include | ||
rm -rf /rootfs/usr/local/glibc/share | ||
rm -rf /rootfs/usr/local/glibc/var | ||
finalize: | ||
- from: /rootfs | ||
to: /rootfs | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
version: v1alpha1 | ||
metadata: | ||
name: nvidia-container-toolkit | ||
# the first part is the driver version and the second the container-toolkit version | ||
version: 510.54-v1.9.0 | ||
author: Andrew Rynhard | ||
description: | | ||
This system extension provides nvidia runtime and it's dependencies using NVIDIA's runtime handler. | ||
compatibility: | ||
talos: | ||
version: "> v0.15.0-alpha.0" |
13 changes: 13 additions & 0 deletions
13
nvidia-container-toolkit/nvidia-container-cli/patches/libnvidia-container/Makefile.patch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
diff --git Makefile Makefile | ||
index 6fb6976..c7b9ffa 100644 | ||
--- Makefile | ||
+++ Makefile | ||
@@ -184,7 +184,7 @@ LIB_LDLIBS = $(LIB_LDLIBS_STATIC) $(LIB_LDLIBS_SHARED) | ||
BIN_CPPFLAGS = -include $(BUILD_DEFS) $(CPPFLAGS) | ||
BIN_CFLAGS = -I$(SRCS_DIR) -fPIE -flto $(CFLAGS) | ||
BIN_LDFLAGS = -L. -pie $(LDFLAGS) -Wl,-rpath='$$ORIGIN/../$$LIB' | ||
-BIN_LDLIBS = -l:$(LIB_SHARED) -ldl -lcap $(LDLIBS) | ||
+BIN_LDLIBS = -l:$(LIB_STATIC) -ldl -l:libcap.a -l:libseccomp.a $(LDLIBS) | ||
|
||
$(word 1,$(LIB_RPC_SRCS)): RPCGENFLAGS=-h | ||
$(word 2,$(LIB_RPC_SRCS)): RPCGENFLAGS=-c |
22 changes: 22 additions & 0 deletions
22
nvidia-container-toolkit/nvidia-container-cli/patches/libnvidia-container/common.h.patch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
diff --git src/common.h src/common.h | ||
index c91d349..461b2a5 100644 | ||
--- src/common.h | ||
+++ src/common.h | ||
@@ -24,7 +24,7 @@ | ||
#define LDCONFIG_PATH "/sbin/ldconfig" | ||
#define LDCONFIG_ALT_PATH "/sbin/ldconfig.real" | ||
|
||
-#define LIB_DIR "/lib64" | ||
+#define LIB_DIR "/usr/local/glibc/lib" | ||
#define USR_BIN_DIR "/usr/bin" | ||
#define USR_LIB_DIR "/usr/lib64" | ||
#define USR_LIB32_DIR "/usr/lib32" | ||
@@ -33,7 +33,7 @@ | ||
#if defined(__x86_64__) | ||
# define LIB_ARCH LD_X8664_LIB64 | ||
# define LIB32_ARCH LD_I386_LIB32 | ||
-# define USR_LIB_MULTIARCH_DIR "/usr/lib/x86_64-linux-gnu" | ||
+# define USR_LIB_MULTIARCH_DIR "/usr/local/lib" | ||
# define USR_LIB32_MULTIARCH_DIR "/usr/lib/i386-linux-gnu" | ||
# if !defined(__NR_execveat) | ||
# define __NR_execveat 322 |
13 changes: 13 additions & 0 deletions
13
...ia-container-toolkit/nvidia-container-cli/patches/libnvidia-container/nvc_ldcache.c.patch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
diff --git src/nvc_ldcache.c src/nvc_ldcache.c | ||
index d73d0f1..c28e982 100644 | ||
--- src/nvc_ldcache.c | ||
+++ src/nvc_ldcache.c | ||
@@ -349,7 +349,7 @@ nvc_ldcache_update(struct nvc_context *ctx, const struct nvc_container *cnt) | ||
if (validate_args(ctx, cnt != NULL) < 0) | ||
return (-1); | ||
|
||
- argv = (char * []){cnt->cfg.ldconfig, cnt->cfg.libs_dir, cnt->cfg.libs32_dir, NULL}; | ||
+ argv = (char * []){cnt->cfg.ldconfig, cnt->cfg.libs_dir, cnt->cfg.libs32_dir, "-C", "/etc/ld.so.cache", NULL}; | ||
if (*argv[0] == '@') { | ||
/* | ||
* We treat this path specially to be relative to the host filesystem. |
Oops, something went wrong.