RATELProf is a powerful and lightweight profiler designed specifically for AMD GPU applications. It enables detailed profiling by tracing GPU activities, and runtime API calls from HSA, HIP, and OpenMP. This tool is ideal for developers and researchers aiming to optimize GPU-accelerated applications.
- Features
- Getting Started
- Comparison with ROCprof v3
- Future Work and Improvement
- Project Structure
- Contribution
RATELProf provides a comprehensive suite of tools to profile, analyze, and optimize AMD GPU applications. Key features include:
- Monitor kernel dispatches, barriers and memory transfers to identify bottlenecks and optimize GPU performance.
- Trace calls from HSA, HIP, MPI, and OpenMP runtimes, offering deep insights into application behavior.
RATELProf includes four core commands to streamline your profiling workflow:
-
profile- Profiles the application and generates a detailed .rprof-rep report.
- This command captures kernel execution, memory transfers, and runtime API details.
-
stats- Computes statistical summaries from a .rprof-rep report created by the
profilecommand. - Output includes detailed metrics similar to the
statcommand from NVIDIA's Nsight Systems.
- Computes statistical summaries from a .rprof-rep report created by the
-
analyze- Analyze profiling results from a .rprof-rep report created by the
profilecommand. - Output advices to optimize your CPU/GPU code.
- Analyze profiling results from a .rprof-rep report created by the
-
summarize- Output global metrics and plot to get insight of your application.
-
visualize- Generates an interactive HTML timeline report for the .rprof-rep profile report created by the
profilecommand. - Perfect for visualizing application details.
- Generates an interactive HTML timeline report for the .rprof-rep profile report created by the
-
inspect- Inspects the application binary and outputs a CSV/JSON report containing detailed kernel information.
- Use this command to analyze static kernel properties.
-
export- Export the .rprof-rep report to another type of report (json, arg-info, ...).
Installing RATELProf is simple and requires running the provided set_install.lua script.
Ensure you have the following installed on your system before proceeding:
- CMake (version 3.10 or later)
- Lua (version 5.1)
- AMD ROCm (download from ROCm's official site)
- Clone the repository:
git clone https://github.com/Kassouley/RATELProf
cd RATELProf- Run the installation script:
./sett_install.luaBy default, the tool will be installed to $HOME/.local. If you want to install it to a custom directory, specify it in the sett.config file:
./sett_install.lua /path/to/sett.config| Attribute | ROCprof v3 | RATELProf |
|---|---|---|
| GPU Architecture Support | AMD RDNA, CDNA (ROCm-compatible GPUs) | AMD RDNA, CDNA (ROCm-compatible GPUs) |
| HIP Tracing | ✅ | ✅ |
| HSA Tracing | ✅ | ✅ |
| rocBLAS Tracing | ❌ | ❌ But can be easily implemented with GILDA |
| RCCL Tracing | ✅ | ❌ But can be easily implemented with GILDA |
| Marker Tracing | ✅ (ROCTx) | ✅ (ROCTx) |
| OpenMP Routine Tracing | ❌ | ✅ |
| OpenMP Target RTL Tracing | ❌ | ✅ |
| OMPT Integration | ✅ | ✅ |
| MPI Tracing | ❌ | ✅ |
| Scratch Memory Tracing | ✅ | ❌ |
| Memory transfers Profiling | ✅ | ✅ |
| Kernel dispatch Profiling | ✅ | ✅ |
| Barrier dispatch Profiling | ❌ | ✅ |
| Trace filtering | ❌ | ✅ |
| PC Sampling | ✅ (Beta) | ❌ |
| HW Counter | ✅ | ❌ (WIP) |
| Statistical post processing | ✅ but really simple post processing | ✅ |
| Post processing analysis | ❌ | ✅ |
| Output Formats | CSV, JSON | Binary (rprof-rep), CSV, TSV, JSON, TXT |
| Output Size | Large | Small (msgpack binary format) |
| Visualization Tools | External (Perfetto) | Integrated |
| Ease of Use | Medium (requires scripting for deeper analysis) | Easy, run and play |
While the current version provides a functional profiling workflow, there are several areas identified for future enhancement:
-
Hardware Counter Support: Support for hardware performance counters. These metrics are crucial for low-level performance analysis and are planned to be implemented.
-
Barrier Dispatch Reliability: Certain applications may encounter issues with barrier dispatch tracking. Investigating edge cases is on the roadmap.
-
Documentation: A full and detailed documentation is work in progress.
Community feedback and contributions are welcome to help guide and accelerate these improvements.
└── RATELProf/
├── CMakeLists.txt
├── install.sh
├── README.md
├── bin/
│ ├── lua/
│ ├── ratelprof.sh
├── share/
│ ├── modules/
│ │ ├── lua/
│ │ ├── html/
│ ├── visualize/
└── src/
├── lua/
├── tools/
├── core/
├── ext/
├── wrappers/
├── common/
└── plugins/-
bin/
Contains executable scripts and lua command scripts:lua/: Lua scripts used for tooling.ratelprof.sh: Main shell script to launch RATELProf.
-
share/
Contains shared assets used by the tool:modules/: Modular components of the tool.lua/: Built-in and user Lua modules for Analyze and Stats scripts.html/: Minified and unified HTML report from visualize directory.
visualize/: HTML code used for the visualizing report.
-
src/
Core source code of RATELProf:lua/: C Stub for Lua modules.tools/: Main tool src file.core/: Core logic of the tool (auto generated by GILDA).ext/: Extension logic of the tool (source of GPU profiling logic).wrappers/: API wrappers for hooking into applications or libraries (e.g., HIP, HSA, ...).plugins/: Built-in plugins for callbacks definition.common/: Common source code to lua stub libraries and RATELProf
Contributions to RATELProf are welcome and appreciated! Whether you're fixing bugs, improving documentation, adding new features, or optimizing performance, your input helps make this tool better for everyone.
- Follow existing code style and structure.
- Keep changes focused and well-documented.
- For major changes, open an issue to discuss first.
Thanks for contributing!