Skip to content

psalias2006/gpu-hot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GPU Hot

Real-Time NVIDIA GPU Monitoring Dashboard

Single-container web dashboard for NVIDIA GPU monitoring with real-time charts.

Python Docker License: MIT NVIDIA

GPU Hot Dashboard

Overview

Self-contained dashboard for monitoring NVIDIA GPUs on remote servers. Access utilization and health metrics from a browser without SSH.

Runs in a single container on one port. No configuration required - start the container and open a browser.


Quick Start

Using Pre-built Docker Image (Recommended)

docker run -d --name gpu-hot --gpus all -p 1312:1312 ghcr.io/psalias2006/gpu-hot:latest

Open http://localhost:1312

Building from Source

docker-compose up --build

Open http://localhost:1312

Requirements: Docker, NVIDIA Container Toolkit (install guide)


Why Not Just Use...

nvidia-smi CLI:

  • Requires SSH access
  • No historical data or charts
  • Manual refresh only
  • Hard to compare multiple GPUs

prometheus/grafana:

  • Complex setup (exporters, databases, dashboard configs)
  • Overkill for simple monitoring needs
  • Higher resource usage

This is the middle ground: web interface with charts, zero configuration.


Features

7 Charts per GPU:

  • Utilization, Temperature, Memory, Power Draw
  • Fan Speed, Clock Speeds (graphics/SM/memory), Power Efficiency

Monitoring:

  • Automatic multi-GPU detection
  • GPU process tracking (PID, memory usage)
  • System CPU/RAM monitoring
  • Threshold indicators (temp: 75Β°C/85Β°C, util: 80%, memory: 90%)

Metrics Collected:

Core Metrics
  • GPU & Memory Utilization (%)
  • Temperature - GPU core & memory (Β°C)
  • Memory - used/free/total (MB)
  • Power - draw & limits (W)
  • Fan Speed (%)
  • Clock Speeds - graphics, SM, memory, video (MHz)
Advanced Metrics
  • PCIe Generation & Lane Width (current/max)
  • Performance State (P-State)
  • Compute Mode
  • Encoder/Decoder sessions & statistics
  • Driver & VBIOS versions
  • Throttle status

Installation

Pre-built Image (Easiest)

docker run -d --name gpu-hot --gpus all -p 1312:1312 ghcr.io/psalias2006/gpu-hot:latest

Build from Source

git clone https://github.com/psalias2006/gpu-hot
cd gpu-hot
docker-compose up --build

Local Development

pip install -r requirements.txt
python app.py

Verify GPU Access

docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

If this fails, install NVIDIA Container Toolkit first.


Configuration

None required. Optional customization:

Environment Variables:

NVIDIA_VISIBLE_DEVICES=0,1    # Specific GPUs (default: all)

Application (app.py):

eventlet.sleep(2)              # Update interval (seconds)
socketio.run(app, port=1312)   # Port

Charts (static/js/charts.js):

if (data.labels.length > 30)   // History length (data points)

API

HTTP

GET /                    # Dashboard UI
GET /api/gpu-data        # JSON metrics

WebSocket

socket.on('gpu_data', (data) => {
  // Real-time updates every 2s
  // data.gpus, data.processes, data.system
});

Extending

Add New Metric

1. Backend (app.py):

def parse_nvidia_smi(self):
    result = subprocess.run([
        'nvidia-smi',
        '--query-gpu=index,name,your.new.metric',
        '--format=csv,noheader,nounits'
    ], ...)

2. Frontend (static/js/gpu-cards.js):

// Add to createGPUCard() template
<div class="metric-value" id="new-metric-${gpuId}">
    ${gpuInfo.new_metric}
</div>

3. Chart (optional static/js/charts.js):

chartConfigs.newMetric = {
    type: 'line',
    data: { ... },
    options: { ... }
};

Project Structure

gpu-hot/
β”œβ”€β”€ app.py                      # Flask + WebSocket server
β”œβ”€β”€ static/js/
β”‚   β”œβ”€β”€ charts.js               # Chart configuration
β”‚   β”œβ”€β”€ gpu-cards.js            # UI rendering
β”‚   β”œβ”€β”€ socket-handlers.js      # WebSocket events
β”‚   β”œβ”€β”€ ui.js                   # View switching
β”‚   └── app.js                  # Bootstrap
β”œβ”€β”€ templates/index.html        # Dashboard
β”œβ”€β”€ Dockerfile                  # nvidia/cuda:12.1-devel-ubuntu22.04
└── docker-compose.yml

Troubleshooting

GPU not detected:

# Verify drivers
nvidia-smi

# Test Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

# Restart Docker daemon
sudo systemctl restart docker

Debug logging:

# app.py
socketio.run(app, debug=True)

Contributing

Pull requests welcome. For major changes, open an issue first.

git checkout -b feature/NewFeature
git commit -m 'Add NewFeature'
git push origin feature/NewFeature

License

MIT - see LICENSE