-
Notifications
You must be signed in to change notification settings - Fork 10
Handle nvidia boards in kairos-init #229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
8cc1eb8
aed7839
e9e5c8e
07fc136
42f89cc
dd76241
ad6357c
8586435
1f8991b
904401d
29fa43f
fd4a46f
7d5eb72
8d497f5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -62,17 +62,37 @@ func GetInstallStage(sis values.System, logger types.KairosLogger) ([]schema.Sta | |
|
|
||
| // Read the NVIDIA env variables, use defaults if not set | ||
| nvidiaRelease := os.Getenv("NVIDIA_RELEASE") | ||
| if nvidiaRelease == "" { | ||
| nvidiaRelease = "35" | ||
| } | ||
| if nvidiaRelease == "" { | ||
| // This was just introduced in PR #211, however if you check the | ||
| // Dockerfile.nvidia-orin-nx it says 36 :shrug:, do we actually need a | ||
| // default or should the user always set it? if we have a default, should it | ||
| // ever change? | ||
| nvidiaRelease = "35" | ||
| } | ||
|
|
||
| nvidiaVersion := os.Getenv("NVIDIA_VERSION") | ||
| if nvidiaVersion == "" { | ||
| // This was just introduced in PR #211, however if you check the | ||
| // Dockerfile.nvidia-orin-nx it says 4.4 :shrug:, do we actually need a | ||
| // default or should the user always set it? if we have a default, should it | ||
| // ever change? | ||
| nvidiaVersion = "3.1" | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jordankrp I see that the default here is 3.1 but in https://github.com/kairos-io/kairos/blob/c3011d352d0648df6829352a06eb9e4b96ff1892/images/Dockerfile.nvidia-orin-nx#L12 it's 4.4, do you know why there's a difference?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same, see my comment above. |
||
| } | ||
|
|
||
| l4tVersion := os.Getenv("L4T_VERSION") | ||
| if l4tVersion == "" { | ||
| l4tVersion = "36.4" | ||
| } | ||
|
|
||
| nvidiaVersion := os.Getenv("NVIDIA_VERSION") | ||
| if nvidiaVersion == "" { | ||
| nvidiaVersion = "3.1" | ||
| } | ||
| // Get board model from environment or config | ||
| boardModel := os.Getenv("BOARD_MODEL") | ||
| if boardModel == "" { | ||
| // Does it make sense that both AGX Orin and Orin NX use the same board model? | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Eventually the main difference is how we flash and the partition layout, so it probably wouldn't matter to use the same at this stage, as long as the URL request later on doesn't complain. |
||
| boardModel = "t234" | ||
| } | ||
|
|
||
| // Prepare NVIDIA L4T extraction script | ||
| l4tScript := fmt.Sprintf(`#!/bin/bash | ||
| l4tScript := fmt.Sprintf(`#!/bin/bash | ||
| set -e | ||
|
|
||
| NVIDIA_RELEASE="%s" | ||
|
|
@@ -149,6 +169,72 @@ func GetInstallStage(sis values.System, logger types.KairosLogger) ([]schema.Sta | |
| l4tScript, | ||
| }, | ||
| }, | ||
| { | ||
| Name: "Setup NVIDIA L4T repositories", | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Itxaka I'm not sure if I'm doing this properly, can you validate? Basically, I'm trying to mimic the steps in https://github.com/kairos-io/kairos/blob/master/images/Dockerfile.nvidia-orin-nx but I couldn't find a way to say e.g.
nevertheless this seems to build, so maybe it's ok? @jordankrp how strict does the order in that dockerfile need to be? 😅
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the order is more about optimising docker caching when building a new OS image. |
||
| If: fmt.Sprintf(`[ "%s" = "nvidia-jetson-agx-orin" ] || [ "%s" = "nvidia-jetson-orin-nx" ]`, config.DefaultConfig.Model, config.DefaultConfig.Model), | ||
| Commands: []string{ | ||
| // Clean up existing NVIDIA repository files | ||
| "rm -rf /etc/apt/sources.list.d/nvidia-l4t-apt-source.list", | ||
| // Create NVIDIA L4T packages directory | ||
| "mkdir -p /opt/nvidia/l4t-packages", | ||
| "touch /opt/nvidia/l4t-packages/.nv-l4t-disable-boot-fw-update-in-preinstall", | ||
| // Add NVIDIA GPG keys | ||
| "curl -fSsL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub | gpg --dearmor | tee /usr/share/keyrings/nvidia-drivers-2004.gpg > /dev/null 2>&1", | ||
| "curl -fSsL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub | gpg --dearmor | tee /usr/share/keyrings/nvidia-drivers-2204.gpg > /dev/null 2>&1", | ||
| "curl -fSsL https://repo.download.nvidia.com/jetson/jetson-ota-public.asc | gpg --dearmor | tee /usr/share/keyrings/jetson-ota.gpg > /dev/null 2>&1", | ||
| // Add NVIDIA repositories | ||
| "echo 'deb [signed-by=/usr/share/keyrings/nvidia-drivers-2204.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /' | tee -a /etc/apt/sources.list.d/nvidia-drivers.list", | ||
| "echo 'deb [signed-by=/usr/share/keyrings/nvidia-drivers-2004.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /' | tee -a /etc/apt/sources.list.d/nvidia-drivers.list", | ||
| fmt.Sprintf("echo 'deb [signed-by=/usr/share/keyrings/jetson-ota.gpg] https://repo.download.nvidia.com/jetson/common/ r%s main' | tee -a /etc/apt/sources.list.d/nvidia-drivers.list", l4tVersion), | ||
| fmt.Sprintf("echo 'deb [signed-by=/usr/share/keyrings/jetson-ota.gpg] https://repo.download.nvidia.com/jetson/%s/ r%s main' | tee -a /etc/apt/sources.list.d/nvidia-drivers.list", boardModel, l4tVersion), | ||
| }, | ||
| }, | ||
| { | ||
| Name: "Setup OpenCV symlink for NVIDIA devices", | ||
| If: fmt.Sprintf(`[ "%s" = "nvidia-jetson-agx-orin" ] || [ "%s" = "nvidia-jetson-orin-nx" ]`, config.DefaultConfig.Model, config.DefaultConfig.Model), | ||
| Commands: []string{ | ||
| "ln -s /usr/include/opencv4/opencv2 /usr/include/opencv2", | ||
| }, | ||
| }, | ||
| { | ||
| Name: "Configure CUDA paths for NVIDIA devices", | ||
| If: fmt.Sprintf(`[ "%s" = "nvidia-jetson-agx-orin" ] || [ "%s" = "nvidia-jetson-orin-nx" ]`, config.DefaultConfig.Model, config.DefaultConfig.Model), | ||
| Commands: []string{ | ||
| // Move CUDA out of the way to /opt so kairos can occupy /usr/local without workarounds | ||
| "update-alternatives --remove-all cuda || true", | ||
| "update-alternatives --remove-all cuda-12 || true", | ||
| "mv /usr/local/cuda-12.6 /opt/cuda-12.6 || true", | ||
| "update-alternatives --install /opt/cuda cuda /opt/cuda-12.6 1 || true", | ||
| "update-alternatives --install /opt/cuda-12 cuda-12 /opt/cuda-12.6 1 || true", | ||
| }, | ||
| }, | ||
| { | ||
| Name: "Configure NVIDIA L4T USB device mode for NVIDIA devices", | ||
| If: fmt.Sprintf(`[ "%s" = "nvidia-jetson-agx-orin" ] || [ "%s" = "nvidia-jetson-orin-nx" ]`, config.DefaultConfig.Model, config.DefaultConfig.Model), | ||
| Commands: []string{ | ||
| // Change mountpoint for l4t usb device mode, as rootfs is mounted ro | ||
| // /srv/data is made through cloud-config | ||
| "sed -i -e 's|mntpoint=\"/mnt|mntpoint=\"/srv/data|' /opt/nvidia/l4t-usb-device-mode/nv-l4t-usb-device-mode-start.sh || true", | ||
| }, | ||
| }, | ||
| { | ||
| Name: "Disable ISCSI for NVIDIA devices", | ||
| If: fmt.Sprintf(`[ "%s" = "nvidia-jetson-agx-orin" ] || [ "%s" = "nvidia-jetson-orin-nx" ]`, config.DefaultConfig.Model, config.DefaultConfig.Model), | ||
| Files: []schema.File{ | ||
| { | ||
| Path: "/etc/dracut.conf.d/iscsi.conf", | ||
| Content: "omit_dracutmodules+=\" iscsi \"", | ||
| }, | ||
| }, | ||
| }, | ||
| { | ||
| Name: "Disable ISCSI services for NVIDIA devices", | ||
| If: fmt.Sprintf(`[ "%s" = "nvidia-jetson-agx-orin" ] || [ "%s" = "nvidia-jetson-orin-nx" ]`, config.DefaultConfig.Model, config.DefaultConfig.Model), | ||
| Commands: []string{ | ||
| // iscsid causes delays on the login shell, and we don't need it, so we'll disable it | ||
| "systemctl disable iscsi open-iscsi iscsid.socket || true", | ||
| }, | ||
| }, | ||
| } | ||
| return stage, nil | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jordankrp I see that the default here is 35, but in https://github.com/kairos-io/kairos/blob/master/images/Dockerfile.nvidia-orin-nx#L11 it's 36, do you know why there's a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were in the midst of upgrading our JetPack version from L4T 35.3.1 to 36.4.4. So the former variables are probably leftovers from our initial setup. We re now using
nvidiaRelease = "36"andnvidiaVersion = "4.4"which we can update here.