Skip to content
6 changes: 6 additions & 0 deletions launcher/image/preload.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
readonly OEM_PATH='/usr/share/oem'
readonly CS_PATH="${OEM_PATH}/confidential_space"
readonly EXPERIMENTS_BINARY="confidential_space_experiments"
readonly GPU_INSTALLER_IMAGE_REF="cos_gpu_installer_image_reference"

copy_launcher() {
cp launcher "${CS_PATH}/cs_container_launcher"
Expand All @@ -19,6 +20,10 @@ setup_launcher_systemd_unit() {
cp exit_script.sh "${CS_PATH}/exit_script.sh"
}

set_cos_gpu_installer_image_reference() {
cos-extensions list -- --gpu-installer >> "${CS_PATH}"/"${GPU_INSTALLER_IMAGE_REF}"
}
Comment on lines +23 to +25
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about an alternative approach:
You're basically sharing a config to the launcher process, is it possible to pass it as a kernel cmd arg instead of writing to a file? This also makes gpu installation more measurable in some sense. Awaiting for @alexmwu and @jkl73 's inputs on this.

Copy link
Contributor

@jkl73 jkl73 Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to pass it as a kernel cmd arg instead

This means we determine the version of gpu installer during the image building time. This is my original thought as well, but I remember @meetrajvala said there was a reason to do this during the runtime, because the version may change? @meetrajvala could you elaborate more here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yawangwang Command line will have length constraint of 4096 chars, but currently we are at less than 50% (~1900 chars) so it should not be a big concern. But is it ok to add config required only for userspace program (and not for the customizing the kernel behavior) to kernel command line ? I agree that it would make the gpu driver installation bit measurable in some sense, though current implementation also addresses it by putting file under oem partition which will be sealed in the later step.

@jkl73 earlier we were discussing about hardcoding the installer version, which would not be very appropriate as we need to keep updating installer version when base COS image changes (via image-family flag). But now we are deriving it using the cos-extensions command (which would always return the specific supported installer image reference corresponds to current base COS image), so we are good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have existing examples of adding userspace program configs to kernel command line (e.g., systemd, launcher), so it should be ok to add gpu installer config as well. Also the main purpose of storing files under oem partition is to encrypt/protect it using dm-crypt from tampering by the operator. IMO it doesn't provide a very explicit way of measurement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole oem partition is sealed and measured, and its hash is part of the cmdline. So I think technically the gpu ref file is still measured and bind to the specific CS image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your info. My original thought was to provide an explicit way of measurement (the presence of gpu installer ref in kernel cmd), but as you pointed out, since the hash of oem partition is also measured into kernel cmd, I'm also okay with the current approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to know exactly what installer we run for a given version of CS. Explicit measurement is not required, since we won't surface what version of the installer we are using to customers. It's mostly nice to have for debugging purposes, but I don't see a huge reason to change this implementation.


append_cmdline() {
local arg="$1"
if [[ ! -d /mnt/disks/efi ]]; then
Expand Down Expand Up @@ -116,6 +121,7 @@ main() {
append_cmdline "cos.protected_stateful_partition=m"
# Increase wait timeout of the protected stateful partition.
append_cmdline "systemd.default_timeout_start_sec=900s"
set_cos_gpu_installer_image_reference

if [[ "${IMAGE_ENV}" == "debug" ]]; then
configure_systemd_units_for_debug
Expand Down
2 changes: 2 additions & 0 deletions launcher/internal/gpu/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,6 @@ const (
InstallationHostDir = "/var/lib/nvidia"
// InstallationContainerDir is the directory where gpu drivers will be available on the workload container.
InstallationContainerDir = "/usr/local/nvidia"
// GpuInstallerImageRefFilepath is a filename which has the container image reference of cos_gpu_installer.
GpuInstallerImageRefFilepath = "/usr/share/oem/confidential_space/cos_gpu_installer_image_reference"
)
20 changes: 10 additions & 10 deletions launcher/internal/gpu/driverinstaller.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ package gpu
import (
"context"
"fmt"
"log"
"os"
"os/exec"
"strings"
Expand All @@ -14,6 +13,7 @@ import (
"github.com/containerd/containerd/cio"
"github.com/containerd/containerd/namespaces"
"github.com/containerd/containerd/oci"
"github.com/google/go-tpm-tools/launcher/internal/logging"
"github.com/google/go-tpm-tools/launcher/spec"
"github.com/opencontainers/runtime-spec/specs-go"
)
Expand All @@ -35,11 +35,11 @@ var supportedGpuTypes = []deviceinfo.GPUType{
type DriverInstaller struct {
cdClient *containerd.Client
launchSpec spec.LaunchSpec
logger *log.Logger
logger logging.Logger
}

// NewDriverInstaller instanciates an object of driver installer
func NewDriverInstaller(cdClient *containerd.Client, launchSpec spec.LaunchSpec, logger *log.Logger) *DriverInstaller {
func NewDriverInstaller(cdClient *containerd.Client, launchSpec spec.LaunchSpec, logger logging.Logger) *DriverInstaller {
return &DriverInstaller{
cdClient: cdClient,
launchSpec: launchSpec,
Expand Down Expand Up @@ -69,11 +69,11 @@ func (di *DriverInstaller) InstallGPUDrivers(ctx context.Context) error {
ctx = namespaces.WithNamespace(ctx, namespaces.Default)
installerImageRef, err := getInstallerImageReference()
if err != nil {
di.logger.Printf("failed to get the installer container image reference: %v", err)
di.logger.Info(fmt.Sprintf("failed to get the installer container image reference: %v", err))
return err
}

di.logger.Printf("cos gpu installer version : %s", installerImageRef)
di.logger.Info(fmt.Sprintf("cos gpu installer version : %s", installerImageRef))
image, err := di.cdClient.Pull(ctx, installerImageRef, containerd.WithPullUnpack)
if err != nil {
return fmt.Errorf("failed to pull installer image: %v", err)
Expand All @@ -95,7 +95,7 @@ func (di *DriverInstaller) InstallGPUDrivers(ctx context.Context) error {

hostname, err := os.Hostname()
if err != nil {
di.logger.Printf("cannot get hostname: %v", err)
di.logger.Info(fmt.Sprintf("cannot get hostname: %v", err))
}

container, err := di.cdClient.NewContainer(
Expand Down Expand Up @@ -138,20 +138,20 @@ func (di *DriverInstaller) InstallGPUDrivers(ctx context.Context) error {
code, _, _ := status.Result()

if code != 0 {
di.logger.Printf("Gpu driver installation task ended and returned non-zero status code %d", code)
di.logger.Info(fmt.Sprintf("Gpu driver installation task ended and returned non-zero status code %d", code))
return fmt.Errorf("gpu driver installation task ended with non-zero status code %d", code)
}

di.logger.Println("Gpu driver installation task exited with status: 0")
di.logger.Info("Gpu driver installation task exited with status: 0")
return nil
}

func getInstallerImageReference() (string, error) {
installerImageRefBytes, err := exec.Command("cos-extensions", "list", "--", "--gpu-installer").Output()
imageRefBytes, err := os.ReadFile(GpuInstallerImageRefFilepath)
if err != nil {
return "", fmt.Errorf("failed to get the cos-gpu-installer version: %v", err)
}
installerImageRef := strings.TrimSpace(string(installerImageRefBytes))
installerImageRef := strings.TrimSpace(string(imageRefBytes))
return installerImageRef, nil
}

Expand Down
4 changes: 2 additions & 2 deletions launcher/launcher/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -192,12 +192,12 @@ func startLauncher(launchSpec spec.LaunchSpec, serialConsole *os.File) error {
if launchSpec.InstallGpuDriver {
if launchSpec.Experiments.EnableGpuDriverInstallation {
installer := gpu.NewDriverInstaller(containerdClient, launchSpec, logger)
err = installer.InstallGPUDrivers(ctx)
err = installer.InstallGPUDrivers(context.Background())
if err != nil {
return fmt.Errorf("failed to install gpu drivers: %v", err)
}
} else {
logger.Println("Gpu installation experiment flag is not enabled for this project. Ensure that it is enabled when tee-install-gpu-driver is set to true")
logger.Info("Gpu installation experiment flag is not enabled for this project. Ensure that it is enabled when tee-install-gpu-driver is set to true")
return fmt.Errorf("gpu installation experiment flag is not enabled")
}
}
Expand Down
Loading