You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having a performance issue while running the rocHPL benchmark, which was compiled with the ROCm hipcc CXX compiler.
I have compiled two versions of rocHPL, using two different compilers:
one version was compiled with the default GNU 7.5.0 CXX compiler,
and the other was compiled with Clang 15.0.0, which is hipcc taken from the rocm/5.3.3/hip/bin directory.
For the first version (compiled with GNU), the basic install.sh script was used, along with the --with-rocm option to specify the rocm/5.3.3 directory, from which rocBLAS was also used. Additionally, the --with-mpi option was used to specify my previously installed OpenMPI.
The second version (compiled with Clang) was compiled using a modified version of the install.sh script, with -DCMAKE_CXX_COMPILER=hipcc
in the cmake_common_options variable.
The other options for the install.sh script were the same as the first version.
When I run my benchmarks on the AMD Radeon Instinct MI50 32G with the same HPL.dat configuration, I get different results.
The GFLOPS achieved with hipcc rocHPL is always lower than that achieved with gnu rocHPL.
For example, here are the performance results for different configurations of the N parameter using the same HPL.dat file:
Hello there!
I'm having a performance issue while running the rocHPL benchmark, which was compiled with the ROCm hipcc CXX compiler.
I have compiled two versions of rocHPL, using two different compilers:
one version was compiled with the default GNU 7.5.0 CXX compiler,
and the other was compiled with Clang 15.0.0, which is hipcc taken from the rocm/5.3.3/hip/bin directory.
For the first version (compiled with GNU), the basic install.sh script was used, along with the --with-rocm option to specify the rocm/5.3.3 directory, from which rocBLAS was also used. Additionally, the --with-mpi option was used to specify my previously installed OpenMPI.
The second version (compiled with Clang) was compiled using a modified version of the install.sh script, with
-DCMAKE_CXX_COMPILER=hipcc
in the cmake_common_options variable.
The other options for the install.sh script were the same as the first version.
When I run my benchmarks on the AMD Radeon Instinct MI50 32G with the same HPL.dat configuration, I get different results.
The GFLOPS achieved with hipcc rocHPL is always lower than that achieved with gnu rocHPL.
For example, here are the performance results for different configurations of the N parameter using the same HPL.dat file:
What causes this to happen?
How can I correctly compile rocHPL using hipcc and avoid performance issues during benchmarking?
The text was updated successfully, but these errors were encountered: