Plugin GPU Checker

Docker CLI plugin for verifying NVIDIA GPU availability and configuration for Docker containers.

Overview

Plugin: docker-smi (System Management Interface)
Location: ~/.docker/cli-plugins/docker-smi

docker smi checks:

Docker daemon accessibility
NVIDIA Container Toolkit installation
GPU availability by running nvidia-smi in test container

Installation

sudo cp /srv/compose/docker/cli-plugins/docker-smi ~/.docker/cli-plugins/
chmod +x ~/.docker/cli-plugins/docker-smi
docker smi --help

Usage

# Run GPU check
docker smi

# With options
docker smi [options]

Output

Successful Check

Checking Docker GPU configuration...

✓ Docker daemon is accessible
✓ NVIDIA Container Toolkit is installed
✓ Testing GPU access...

GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-...)
  Driver Version: 535.154.05
  CUDA Version: 12.2

✓ GPU is accessible from Docker containers

Failed Check

Checking Docker GPU configuration...

✓ Docker daemon is accessible
✗ NVIDIA Container Toolkit is NOT installed or not configured

Error: nvidia-container-runtime not found in Docker
Please install NVIDIA Container Toolkit:
  https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/

Checks Performed

1. Docker Daemon Access

Verifies Docker daemon is running and accessible.

Pass: Docker commands execute successfully
Fail: Connection refused, permission denied

2. NVIDIA Container Toolkit

Checks if NVIDIA Container Toolkit is installed.

Pass: nvidia-container-runtime exists
Fail: Runtime not found or not configured

3. GPU Test Container

Runs test container with GPU access:

docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

Pass: nvidia-smi output shows GPU
Fail: No GPU detected, permission issues

Use Cases

Initial Setup Verification

# After installing NVIDIA Container Toolkit
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

# Verify it works
docker smi

Troubleshooting GPU Issues

# Service can't access GPU
composectl logs genai-ollama | grep -i gpu

# Check Docker GPU config
docker smi

# If fails, check nvidia-smi on host
nvidia-smi

Pre-Flight Check

# Before starting GPU services
docker smi

# If successful, start services
sudo composectl start genai-ollama genai-swarmui

CI/CD Pipeline

#!/bin/bash
# Ensure GPU is available before deployment
if docker smi; then
    echo "GPU check passed, deploying AI services"
    sudo composectl start genai-ollama genai-openwebui
else
    echo "GPU check failed, skipping AI services"
    exit 1
fi

Troubleshooting

Plugin Not Found

# Check installation
ls -la ~/.docker/cli-plugins/docker-smi

# Make executable
chmod +x ~/.docker/cli-plugins/docker-smi

# Verify Docker sees it
docker --help | grep smi

Docker Daemon Not Accessible

✗ Docker daemon is not accessible

Solutions:

# Check Docker is running
sudo systemctl status docker

# Start Docker
sudo systemctl start docker

# Check user in docker group
groups | grep docker

# Add user to docker group
sudo usermod -aG docker $USER
# Log out and back in

NVIDIA Toolkit Not Found

✗ NVIDIA Container Toolkit is NOT installed

Solutions:

# Install NVIDIA Container Toolkit (Ubuntu/Debian)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker

# Restart Docker
sudo systemctl restart docker

# Test again
docker smi

GPU Not Detected in Container

✗ GPU test failed: no GPU detected

Solutions:

# Check GPU on host
nvidia-smi

# If GPU works on host, check Docker config
cat /etc/docker/daemon.json

# Should have:
{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

# Restart Docker after config change
sudo systemctl restart docker

# Test again
docker smi

Common Issues

Outdated NVIDIA Drivers

Error: CUDA version mismatch

Solution: Update NVIDIA drivers to 450+ for container support

Docker Runtime Not Configured

Error: unknown runtime specified nvidia

Solution: Run sudo nvidia-ctk runtime configure --runtime=docker

Permission Denied

Error: permission denied while trying to connect

Solution: Add user to docker group or run with sudo

Best Practices

Run after driver updates - Verify GPU still works with Docker
Use in setup scripts - Automated verification
Check before GPU services - Prevent startup failures
Test after Docker changes - Config or daemon.json updates
Include in documentation - Help users verify GPU setup
Use in health checks - CI/CD pipeline verification

Integration Examples

Startup Script

#!/bin/bash
# Safe GPU service startup

if ! docker smi; then
    echo "GPU not available, cannot start AI services"
    exit 1
fi

echo "GPU available, starting services..."
sudo composectl start genai-ollama genai-swarmui genai-embedding

Health Check Script

#!/bin/bash
# Periodic GPU health check

if docker smi > /dev/null 2>&1; then
    echo "GPU OK"
else
    echo "GPU PROBLEM - Restarting Docker"
    sudo systemctl restart docker
    sleep 5
    docker smi
fi

Service Wrapper

#!/bin/bash
# Wrap service start with GPU check

service=$1

if [[ "$service" == genai-* ]] || [[ "$service" == "swarmui" ]]; then
    if ! docker smi > /dev/null 2>&1; then
        echo "Warning: GPU not detected, service may not function correctly"
        read -p "Continue anyway? (y/n) " -n 1 -r
        echo
        [[ ! $REPLY =~ ^[Yy]$ ]] && exit 1
    fi
fi

sudo composectl start "$service"

Quick Reference

# Check GPU availability
docker smi

# Compare with host GPU
nvidia-smi

# Use in scripts
if docker smi; then
    echo "GPU OK"
else
    echo "GPU NOT OK"
fi

# Check specific output
docker smi | grep "Driver Version"

Plugin GPU Checker

Overview

Installation

Usage

Output

Successful Check

Failed Check

Checks Performed

1. Docker Daemon Access

2. NVIDIA Container Toolkit

3. GPU Test Container

Use Cases

Initial Setup Verification

Troubleshooting GPU Issues

Pre-Flight Check

CI/CD Pipeline

Troubleshooting

Plugin Not Found

Docker Daemon Not Accessible

NVIDIA Toolkit Not Found

GPU Not Detected in Container

Common Issues

Outdated NVIDIA Drivers

Docker Runtime Not Configured

Permission Denied

Best Practices

Integration Examples

Startup Script

Health Check Script

Service Wrapper

Related Documentation

Quick Reference

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Quick Links

Getting Started

Basic Concepts

System Management

Docker CLI Plugins

Core Services

AI/ML Services

Media Services

Reference

Clone this wiki locally