A reproducible, production-minded guide for running GPU-accelerated containers on Ubuntu using Docker + NVIDIA Container Toolkit.
This repository focuses exclusively on containerized GPU workflows and assumes a correctly configured host system.
For host-level Ubuntu performance tuning and native CUDA / PyTorch validation, see:
π https://github.com/vikram2327/ubuntu-performance-ml-setup
- Installing Docker Engine on Ubuntu
- Installing and configuring NVIDIA Container Toolkit
- Enabling GPU passthrough into Docker containers
- Verifying GPU access inside containers (
nvidia-smi) - Building and running a CUDA-enabled PyTorch container
- Minimal, explicit verification scripts for correctness
This guide prioritizes correctness, reproducibility, and debuggability over maximum optimization.
- This repository does not tune the host system
- GPU access is treated as explicit and verifiable
- All steps are written to be:
- Observable
- Repeatable
- Easy to debug
Design decisions and trade-offs are documented rather than hidden.
Clone the repository and run:
bash scripts/setup.sh
bash scripts/verify.shsetup.shinstalls Docker and configures NVIDIA GPU supportverify.shvalidates GPU access inside containers and runs a PyTorch CUDA test
β οΈ If you add your user to thedockergroup, log out and log back in before running verification.
docker-nvidia-gpu-ml/
βββ README.md
βββ scripts/
β βββ setup.sh # Install Docker + NVIDIA Container Toolkit
β βββ verify.sh # Validate GPU access inside containers
β βββ cleanup.sh # Optional cleanup of test artifacts
βββ docker/
β βββ Dockerfile # CUDA + PyTorch base image
β βββ run.sh # Example GPU-enabled run command
βββ examples/
β βββ pytorch_gpu_test.py # Minimal PyTorch CUDA verification
β βββ cuda_smoke_test.sh # nvidia-smi smoke test
βββ docs/
βββ design-decisions.md # Architectural and design choices
βββ troubleshooting.md # Common failure modes and fixes
Running GPU workloads inside containers adds an additional abstraction layer.
In practice, failures often stem from:
- Missing runtime configuration
- Implicit assumptions about GPU availability
- Silent CPU fallbacks
- Driver / runtime mismatches
This repository exists to make those interactions explicit, observable, and reproducible.
This guide may be useful if you:
- Use NVIDIA GPUs on Ubuntu
- Run ML or compute workloads inside Docker
- Want a reliable GPU container baseline
- Care about system correctness and debuggability
- Prefer explicit verification over implicit assumptions
Vikram Pratap Singh
- GitHub: https://github.com/vikram2327
- LinkedIn: https://www.linkedin.com/in/vikrampratapsingh2
This repository is intentionally conservative:
- It uses officially supported NVIDIA tooling
- It avoids runtime hacks or undocumented flags
- It favors clarity over aggressive optimization
The goal is a containerized GPU workflow that behaves predictably and can be reasoned about when things go wrong.