A toolkit for profiling and performance evaluation of heterogeneous CPU-GPU applications across NVIDIA and Intel GPU systems.
XPU-Point provides the necessary tools and scripts to analyze heterogeneous applications through three main components:
- XPU-Profiler - Profiles heterogeneous applications and collects basic block vectors for similarity analysis
- XPU-Timer - Evaluates performance of selected regions
- Performance Extrapolation - Extrapolates performance results and generates visualization plots
This repository includes an example benchmark (GROMACS) to demonstrate the end-to-end methodology.
Hardware Requirements:
- Linux x86 system with NVIDIA GPU
- CUDA version: >= 8.0 && <= 11.x
- CUDA driver version: <= 495.xx
- Linux x86 system with Intel GPU
- At least 500 GB of free disk space
Software Dependencies:
- Docker
- NVIDIA drivers
- Intel drivers
- CUDA
- oneAPI
- Clone the repository:
git clone https://github.com/nus-comparch/xpupoint && cd xpupoint- Set up the environment using Docker:
# Build the docker image
make docker.build
# Run the docker image
make docker.run
# Compile XPU-Point tools and benchmarks
make- Navigate to the one of the benchmark directories (for example, GROMACS):
cd benchmarks/gromacs- Run XPU-Point analysis:
# Run all test cases (NOTE: The full test suite can take a long time to complete)
./run-xpupoint all
# Or run individual tests (--help shows the available tests)
./run-xpupoint <test_directory_name>- Generate visualization:
./make-graphsThe run-xpupoint script automates the complete analysis workflow:
- XPU-Profiler identifies representative regions in the application
- XPU-Timer measures performance of both the full application and individual representative regions
- Results are processed and can be visualized as tables or graphs
| Component | Requirement |
|---|---|
| Programs | C++ programs, Python/Shell scripts |
| Compilation | CUDA, oneAPI, Make, GCC |
| Binaries | Pin 3.30, GTPin 4.5.0, NVBit 1.5.5 |
| Runtime | NVIDIA and Intel GPU Drivers |
| Hardware | NVIDIA GPU systems and Intel GPU systems |
| Metrics | Cycles, RDTSC, Runtime |
| Disk Space | ~500 GB |
| Setup Time | ~1 day |
| Experiment Time | ~1 week |
Before running experiments, verify your GPU platforms are functioning:
# For NVIDIA GPUs
nvidia-smi
# For Intel GPUs
sycl-lsPerformance Considerations:
- Results are closely tied to the specific execution environment
- XPU-Timer results may be affected by other jobs running on the same machine
- Profiling runs for larger applications can take several hours
- GitHub Repository: nus-comparch/xpupoint
- Archived Version: Zenodo DOI: 10.5281/zenodo.16801115
- License: Open source (see repository for specific license details)
This project is publicly available and contributions are welcome. Please check the repository for contribution guidelines.
For issues and questions, please use the GitHub Issues section of this repository.