Skip to content

A comprehensive toolkit for monitoring and testing NVIDIA NVLink bandwidth and status.

License

Notifications You must be signed in to change notification settings

staryxchen/NvLinkMonitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ NVLink Monitor

A comprehensive toolkit for monitoring and testing NVIDIA NVLink bandwidth and status. This project consists of two main components:

  • πŸ“Š NVLink Monitor: A real-time monitoring tool for NVLink bandwidth
  • ⚑ NVLink Bandwidth Test: A performance testing tool for NVLink bandwidth measurement

Project Structure

NvLinkMonitor/
β”œβ”€β”€ monitor/                        # NVLink monitoring tool
β”‚   β”œβ”€β”€ nvlink_monitor.cpp          # Main monitoring implementation
β”‚   └── nvlink_monitor.h            # Monitoring tool headers
β”œβ”€β”€ example/                        # NVLink bandwidth testing tool
β”‚   └── nvlink_bw_test.cpp          # Bandwidth test implementation
β”œβ”€β”€ build/                          # Build output directory
β”œβ”€β”€ Makefile                        # Main build configuration
β”œβ”€β”€ install-deps.sh                 # Dependency installation script
└── README.md                       # This file

πŸ“‹ Prerequisites

Before building the project, you need to install the required dependencies.

πŸ”§ Automatic Installation

Run the dependency installation script:

./install-deps.sh

This script will automatically detect your operating system and install the appropriate dependencies.

πŸ› οΈ Manual Installation

If the automatic script doesn't work for your system, you can install dependencies manually:

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install -y build-essential libnvidia-ml-dev

CentOS/RHEL/Fedora:

sudo yum groupinstall -y "Development Tools"
sudo yum install -y nvidia-devel

πŸ”¨ Building

After installing dependencies, build the project:

# πŸ—οΈ Build both components
make

# πŸ“Š Build only the monitor
make monitor

# ⚑ Build only the example
make example

The executables will be created in the build/ directory:

  • build/nvlink_monitor - NVLink monitoring tool
  • build/nvlink_bw_test - NVLink bandwidth test tool

🧩 Components

1. πŸ“Š NVLink Monitor (monitor/)

A real-time monitoring tool for NVLink bandwidth and status.

πŸš€ Basic usage (continuous mode):

./build/nvlink_monitor

Continuous monitoring:

./build/nvlink_monitor -continuous true

Single monitoring:

./build/nvlink_monitor -continuous false

Custom interval (e.g., 0.5 seconds):

./build/nvlink_monitor -interval 0.5

Detailed NvLink output:

./build/nvlink_monitor -verbose

Combined options:

./build/nvlink_monitor -continuous false -interval 0.5 -verbose

Output to file:

./build/nvlink_monitor -o output.log
./build/nvlink_monitor -v -o detailed.log

πŸ“‹ Available options:

  • -c, --continuous [true|false]: Run in continuous mode (default: true)
  • -i, --interval <seconds>: Set custom monitoring interval in seconds (supports decimals, default: 1.0)
  • -v, --verbose: Enable detailed NvLink output (shows individual link bandwidth)
  • -o, --output <filename>: Redirect output to file
  • -h, --help: Show help information

Note: The interval parameter supports decimal values (e.g., 0.5 for 500ms, 0.1 for 100ms). The minimum practical interval is 1 microsecond (0.000001s), but very small intervals may affect system performance.

2. ⚑ NVLink Bandwidth Test (example/)

A performance testing tool for measuring NVLink bandwidth between GPUs.

Basic usage:

./build/nvlink_bw_test

βš™οΈ Custom parameters:

./build/nvlink_bw_test -i 200 -b 2000 -s 0 -d 1

πŸ“‹ Available options:

  • -i, --iterations NUM: Number of iterations (default: 100)
  • -b, --buffer-size NUM: Buffer size in MB (default: 1000)
  • -s, --src-gpu NUM: Source GPU ID (default: 0)
  • -d, --dst-gpu NUM: Destination GPU ID (default: 1)
  • -h, --help: Show help message

Example:

./build/nvlink_bw_test -i 200 -b 2000 -s 0 -d 1

✨ Features

πŸ“Š NVLink Monitor Features:

  • Real-time NVLink bandwidth monitoring
  • Individual link bandwidth tracking
  • Continuous and single-shot monitoring modes
  • Configurable monitoring intervals
  • File output support

⚑ NVLink Bandwidth Test Features:

  • Inter-GPU memory copy performance testing
  • Configurable buffer sizes and iteration counts
  • Source and destination GPU selection
  • Performance statistics calculation

πŸ”§ Technical Details

πŸ“¦ Dependencies:

  • NVML (NVIDIA Management Library): For GPU monitoring and NVLink data access
  • CUDA Runtime: For GPU memory operations and device management
  • C++11: For modern C++ features

βš™οΈ Supported Operations:

  • NVLink Monitoring: Real-time bandwidth monitoring across all NVLink links
  • Peer-to-Peer Access: Automatic P2P access setup between GPUs
  • Memory Operations: Device-to-device memory copies for bandwidth testing
  • Performance Measurement: High-precision timing and bandwidth calculation

🧹 Cleaning

To clean build artifacts:

make clean

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

⭐ Star This Project!

If this toolkit helped you squeeze every last bit of bandwidth out of your NVLink connections (or just saved you from pulling your hair out debugging GPU-to-GPU transfers), please give it a star! πŸš€

About

A comprehensive toolkit for monitoring and testing NVIDIA NVLink bandwidth and status.

Resources

License

Stars

Watchers

Forks

Releases

No releases published