-
Notifications
You must be signed in to change notification settings - Fork 1
Plugin GPU Checker
Docker CLI plugin for verifying NVIDIA GPU availability and configuration for Docker containers.
Plugin: docker-smi (System Management Interface)
Location: ~/.docker/cli-plugins/docker-smi
docker smi checks:
- Docker daemon accessibility
- NVIDIA Container Toolkit installation
- GPU availability by running nvidia-smi in test container
sudo cp /srv/compose/docker/cli-plugins/docker-smi ~/.docker/cli-plugins/
chmod +x ~/.docker/cli-plugins/docker-smi
docker smi --help# Run GPU check
docker smi
# With options
docker smi [options]Checking Docker GPU configuration...
✓ Docker daemon is accessible
✓ NVIDIA Container Toolkit is installed
✓ Testing GPU access...
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-...)
Driver Version: 535.154.05
CUDA Version: 12.2
✓ GPU is accessible from Docker containers
Checking Docker GPU configuration...
✓ Docker daemon is accessible
✗ NVIDIA Container Toolkit is NOT installed or not configured
Error: nvidia-container-runtime not found in Docker
Please install NVIDIA Container Toolkit:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/
Verifies Docker daemon is running and accessible.
Pass: Docker commands execute successfully
Fail: Connection refused, permission denied
Checks if NVIDIA Container Toolkit is installed.
Pass: nvidia-container-runtime exists
Fail: Runtime not found or not configured
Runs test container with GPU access:
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smiPass: nvidia-smi output shows GPU
Fail: No GPU detected, permission issues
# After installing NVIDIA Container Toolkit
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
# Verify it works
docker smi# Service can't access GPU
composectl logs genai-ollama | grep -i gpu
# Check Docker GPU config
docker smi
# If fails, check nvidia-smi on host
nvidia-smi# Before starting GPU services
docker smi
# If successful, start services
sudo composectl start genai-ollama genai-swarmui#!/bin/bash
# Ensure GPU is available before deployment
if docker smi; then
echo "GPU check passed, deploying AI services"
sudo composectl start genai-ollama genai-openwebui
else
echo "GPU check failed, skipping AI services"
exit 1
fi# Check installation
ls -la ~/.docker/cli-plugins/docker-smi
# Make executable
chmod +x ~/.docker/cli-plugins/docker-smi
# Verify Docker sees it
docker --help | grep smi✗ Docker daemon is not accessible
Solutions:
# Check Docker is running
sudo systemctl status docker
# Start Docker
sudo systemctl start docker
# Check user in docker group
groups | grep docker
# Add user to docker group
sudo usermod -aG docker $USER
# Log out and back in✗ NVIDIA Container Toolkit is NOT installed
Solutions:
# Install NVIDIA Container Toolkit (Ubuntu/Debian)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
# Restart Docker
sudo systemctl restart docker
# Test again
docker smi✗ GPU test failed: no GPU detected
Solutions:
# Check GPU on host
nvidia-smi
# If GPU works on host, check Docker config
cat /etc/docker/daemon.json
# Should have:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
# Restart Docker after config change
sudo systemctl restart docker
# Test again
docker smiError: CUDA version mismatch
Solution: Update NVIDIA drivers to 450+ for container support
Error: unknown runtime specified nvidia
Solution: Run sudo nvidia-ctk runtime configure --runtime=docker
Error: permission denied while trying to connect
Solution: Add user to docker group or run with sudo
- Run after driver updates - Verify GPU still works with Docker
- Use in setup scripts - Automated verification
- Check before GPU services - Prevent startup failures
- Test after Docker changes - Config or daemon.json updates
- Include in documentation - Help users verify GPU setup
- Use in health checks - CI/CD pipeline verification
#!/bin/bash
# Safe GPU service startup
if ! docker smi; then
echo "GPU not available, cannot start AI services"
exit 1
fi
echo "GPU available, starting services..."
sudo composectl start genai-ollama genai-swarmui genai-embedding#!/bin/bash
# Periodic GPU health check
if docker smi > /dev/null 2>&1; then
echo "GPU OK"
else
echo "GPU PROBLEM - Restarting Docker"
sudo systemctl restart docker
sleep 5
docker smi
fi#!/bin/bash
# Wrap service start with GPU check
service=$1
if [[ "$service" == genai-* ]] || [[ "$service" == "swarmui" ]]; then
if ! docker smi > /dev/null 2>&1; then
echo "Warning: GPU not detected, service may not function correctly"
read -p "Continue anyway? (y/n) " -n 1 -r
echo
[[ ! $REPLY =~ ^[Yy]$ ]] && exit 1
fi
fi
sudo composectl start "$service"- Docker CLI Plugins - Plugin overview
- Docker Configuration - GPU setup
- Prerequisites - NVIDIA requirements
- GenAI Overview - GPU-using services
# Check GPU availability
docker smi
# Compare with host GPU
nvidia-smi
# Use in scripts
if docker smi; then
echo "GPU OK"
else
echo "GPU NOT OK"
fi
# Check specific output
docker smi | grep "Driver Version"Next: Troubleshooting - Common issues and solutions →