A comprehensive toolkit for monitoring and testing NVIDIA NVLink bandwidth and status. This project consists of two main components:
- π NVLink Monitor: A real-time monitoring tool for NVLink bandwidth
- β‘ NVLink Bandwidth Test: A performance testing tool for NVLink bandwidth measurement
NvLinkMonitor/
βββ monitor/ # NVLink monitoring tool
β βββ nvlink_monitor.cpp # Main monitoring implementation
β βββ nvlink_monitor.h # Monitoring tool headers
βββ example/ # NVLink bandwidth testing tool
β βββ nvlink_bw_test.cpp # Bandwidth test implementation
βββ build/ # Build output directory
βββ Makefile # Main build configuration
βββ install-deps.sh # Dependency installation script
βββ README.md # This file
Before building the project, you need to install the required dependencies.
Run the dependency installation script:
./install-deps.shThis script will automatically detect your operating system and install the appropriate dependencies.
If the automatic script doesn't work for your system, you can install dependencies manually:
sudo apt-get update
sudo apt-get install -y build-essential libnvidia-ml-devsudo yum groupinstall -y "Development Tools"
sudo yum install -y nvidia-develAfter installing dependencies, build the project:
# ποΈ Build both components
make
# π Build only the monitor
make monitor
# β‘ Build only the example
make exampleThe executables will be created in the build/ directory:
build/nvlink_monitor- NVLink monitoring toolbuild/nvlink_bw_test- NVLink bandwidth test tool
A real-time monitoring tool for NVLink bandwidth and status.
./build/nvlink_monitor./build/nvlink_monitor -continuous true./build/nvlink_monitor -continuous false./build/nvlink_monitor -interval 0.5./build/nvlink_monitor -verbose./build/nvlink_monitor -continuous false -interval 0.5 -verbose./build/nvlink_monitor -o output.log
./build/nvlink_monitor -v -o detailed.log-c, --continuous [true|false]: Run in continuous mode (default: true)-i, --interval <seconds>: Set custom monitoring interval in seconds (supports decimals, default: 1.0)-v, --verbose: Enable detailed NvLink output (shows individual link bandwidth)-o, --output <filename>: Redirect output to file-h, --help: Show help information
Note: The interval parameter supports decimal values (e.g., 0.5 for 500ms, 0.1 for 100ms). The minimum practical interval is 1 microsecond (0.000001s), but very small intervals may affect system performance.
A performance testing tool for measuring NVLink bandwidth between GPUs.
./build/nvlink_bw_test./build/nvlink_bw_test -i 200 -b 2000 -s 0 -d 1-i, --iterations NUM: Number of iterations (default: 100)-b, --buffer-size NUM: Buffer size in MB (default: 1000)-s, --src-gpu NUM: Source GPU ID (default: 0)-d, --dst-gpu NUM: Destination GPU ID (default: 1)-h, --help: Show help message
./build/nvlink_bw_test -i 200 -b 2000 -s 0 -d 1- Real-time NVLink bandwidth monitoring
- Individual link bandwidth tracking
- Continuous and single-shot monitoring modes
- Configurable monitoring intervals
- File output support
- Inter-GPU memory copy performance testing
- Configurable buffer sizes and iteration counts
- Source and destination GPU selection
- Performance statistics calculation
- NVML (NVIDIA Management Library): For GPU monitoring and NVLink data access
- CUDA Runtime: For GPU memory operations and device management
- C++11: For modern C++ features
- NVLink Monitoring: Real-time bandwidth monitoring across all NVLink links
- Peer-to-Peer Access: Automatic P2P access setup between GPUs
- Memory Operations: Device-to-device memory copies for bandwidth testing
- Performance Measurement: High-precision timing and bandwidth calculation
To clean build artifacts:
make cleanThis project is licensed under the MIT License - see the LICENSE file for details.
If this toolkit helped you squeeze every last bit of bandwidth out of your NVLink connections (or just saved you from pulling your hair out debugging GPU-to-GPU transfers), please give it a star! π