Self-contained dashboard for monitoring NVIDIA GPUs on remote servers. Access utilization and health metrics from a browser without SSH.
Runs in a single container on one port. No configuration required - start the container and open a browser.
docker run -d --name gpu-hot --gpus all -p 1312:1312 ghcr.io/psalias2006/gpu-hot:latest
Open http://localhost:1312
docker-compose up --build
Open http://localhost:1312
Requirements: Docker, NVIDIA Container Toolkit (install guide)
nvidia-smi CLI:
- Requires SSH access
- No historical data or charts
- Manual refresh only
- Hard to compare multiple GPUs
prometheus/grafana:
- Complex setup (exporters, databases, dashboard configs)
- Overkill for simple monitoring needs
- Higher resource usage
This is the middle ground: web interface with charts, zero configuration.
7 Charts per GPU:
- Utilization, Temperature, Memory, Power Draw
- Fan Speed, Clock Speeds (graphics/SM/memory), Power Efficiency
Monitoring:
- Automatic multi-GPU detection
- GPU process tracking (PID, memory usage)
- System CPU/RAM monitoring
- Threshold indicators (temp: 75Β°C/85Β°C, util: 80%, memory: 90%)
Metrics Collected:
Core Metrics
- GPU & Memory Utilization (%)
- Temperature - GPU core & memory (Β°C)
- Memory - used/free/total (MB)
- Power - draw & limits (W)
- Fan Speed (%)
- Clock Speeds - graphics, SM, memory, video (MHz)
Advanced Metrics
- PCIe Generation & Lane Width (current/max)
- Performance State (P-State)
- Compute Mode
- Encoder/Decoder sessions & statistics
- Driver & VBIOS versions
- Throttle status
docker run -d --name gpu-hot --gpus all -p 1312:1312 ghcr.io/psalias2006/gpu-hot:latest
git clone https://github.com/psalias2006/gpu-hot
cd gpu-hot
docker-compose up --build
pip install -r requirements.txt
python app.py
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
If this fails, install NVIDIA Container Toolkit first.
None required. Optional customization:
Environment Variables:
NVIDIA_VISIBLE_DEVICES=0,1 # Specific GPUs (default: all)
Application (app.py
):
eventlet.sleep(2) # Update interval (seconds)
socketio.run(app, port=1312) # Port
Charts (static/js/charts.js
):
if (data.labels.length > 30) // History length (data points)
GET / # Dashboard UI
GET /api/gpu-data # JSON metrics
socket.on('gpu_data', (data) => {
// Real-time updates every 2s
// data.gpus, data.processes, data.system
});
1. Backend (app.py
):
def parse_nvidia_smi(self):
result = subprocess.run([
'nvidia-smi',
'--query-gpu=index,name,your.new.metric',
'--format=csv,noheader,nounits'
], ...)
2. Frontend (static/js/gpu-cards.js
):
// Add to createGPUCard() template
<div class="metric-value" id="new-metric-${gpuId}">
${gpuInfo.new_metric}
</div>
3. Chart (optional static/js/charts.js
):
chartConfigs.newMetric = {
type: 'line',
data: { ... },
options: { ... }
};
gpu-hot/
βββ app.py # Flask + WebSocket server
βββ static/js/
β βββ charts.js # Chart configuration
β βββ gpu-cards.js # UI rendering
β βββ socket-handlers.js # WebSocket events
β βββ ui.js # View switching
β βββ app.js # Bootstrap
βββ templates/index.html # Dashboard
βββ Dockerfile # nvidia/cuda:12.1-devel-ubuntu22.04
βββ docker-compose.yml
GPU not detected:
# Verify drivers
nvidia-smi
# Test Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
# Restart Docker daemon
sudo systemctl restart docker
Debug logging:
# app.py
socketio.run(app, debug=True)
Pull requests welcome. For major changes, open an issue first.
git checkout -b feature/NewFeature
git commit -m 'Add NewFeature'
git push origin feature/NewFeature
MIT - see LICENSE