Skip to content

release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp #95484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

illwieckz
Copy link
Contributor

The dynamic_hsa/ include directory is required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp.

It should then always be included or the build will fail if only src/rtl.cpp is built.

This also simplifies the way header files from dynamic_hsa/ are included in src/rtl.cpp.

Fixes:

error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope

…/rtl.cpp

The dynamic_hsa/ include directory is required by both
optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp.
It should then always be included or the build will fail
if only src/rtl.cpp is built.

This also simplifies the way header files from dynamic_hsa/
are included in src/rtl.cpp.

Fixes:

  error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope
@llvmbot
Copy link
Member

llvmbot commented Jun 13, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Thomas Debesse (illwieckz)

Changes

The dynamic_hsa/ include directory is required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp.

It should then always be included or the build will fail if only src/rtl.cpp is built.

This also simplifies the way header files from dynamic_hsa/ are included in src/rtl.cpp.

Fixes:

error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope

Full diff: https://github.com/llvm/llvm-project/pull/95484.diff

2 Files Affected:

  • (modified) openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt (+3-1)
  • (modified) openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp (-10)
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
index 68ce63467a6c8..42cc560c79112 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
@@ -38,13 +38,15 @@ add_definitions(-DDEBUG_PREFIX="TARGET AMDGPU RTL")
 set(LIBOMPTARGET_DLOPEN_LIBHSA OFF)
 option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" ${LIBOMPTARGET_DLOPEN_LIBHSA})
 
+# Required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp.
+include_directories(dynamic_hsa)
+
 if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA)
   libomptarget_say("Building AMDGPU NextGen plugin linked against libhsa")
   set(LIBOMPTARGET_EXTRA_SOURCE)
   set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64)
 else()
   libomptarget_say("Building AMDGPU NextGen plugin for dlopened libhsa")
-  include_directories(dynamic_hsa)
   set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp)
   set(LIBOMPTARGET_DEP_LIBRARIES)
 endif()
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
index 81634ae1edc49..8cedc72d5f63c 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -56,18 +56,8 @@
 #define BIGENDIAN_CPU
 #endif
 
-#if defined(__has_include)
-#if __has_include("hsa/hsa.h")
-#include "hsa/hsa.h"
-#include "hsa/hsa_ext_amd.h"
-#elif __has_include("hsa.h")
 #include "hsa.h"
 #include "hsa_ext_amd.h"
-#endif
-#else
-#include "hsa/hsa.h"
-#include "hsa/hsa_ext_amd.h"
-#endif
 
 namespace llvm {
 namespace omp {

@illwieckz
Copy link
Contributor Author

illwieckz commented Jun 13, 2024

I first noticed the issue when building the chipStar fork of LLVM 17: https://github.com/CHIP-SPV/llvm-project (the chipStar-llvm-17 branch), but the code being the same in LLVM 18, it is expected to fail in LLVM 18 too.

The whole folder disappeared in main so I made this patch to target the most recent release branch having those files: LLVM18.

It would be good to backport it to LLVM 17 too. Should I make another PR targetting release/17.x?

I haven't checked it yet if versions older than LLVM 17 are affected.

@pranav-sivaraman
Copy link

This is different from this file right? I'm trying to fix an issue when building LLVM 14 with a newer ROCm releases which fails to find the newer hsa/hsa.h headers. Not sure if I need to extend the patch to include these changes as well.

@illwieckz
Copy link
Contributor Author

The 14 branch seems to be very old, espially the file you link is in plugins/ directory, while the files I modify are in plugins-nextgen/ directory, witht the plugins/ directory not existing anymore. So I strongly doubt the patch is useful for LLVM 14, but your problem probably needs another but similar solution.

@illwieckz
Copy link
Contributor Author

@pranav-sivaraman try this patch:

diff --git a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
index 92523c23f68b..92bcd94edb7a 100644
--- a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
+++ b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
@@ -56,13 +56,14 @@ include_directories(
 set(LIBOMPTARGET_DLOPEN_LIBHSA OFF)
 option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" ${LIBOMPTARGET_DLOPEN_LIBHSA})
 
+include_directories(dynamic_hsa)
+
 if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA)
   libomptarget_say("Building AMDGPU plugin linked against libhsa")
   set(LIBOMPTARGET_EXTRA_SOURCE)
   set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64)
 else()
   libomptarget_say("Building AMDGPU plugin for dlopened libhsa")
-  include_directories(dynamic_hsa)
   set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp)
   set(LIBOMPTARGET_DEP_LIBRARIES)
 endif()

I haven't tested it, but maybe the mistake is similar.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made a change recently that made the dynamic_hsa version the default. The error you're seeing is from an old HSA, so if you're overriding the default to use an old library that's probably not worth working around.

Actually, this is for the release/18.x? Maybe worth hacking around it in that case since that's older.

@illwieckz
Copy link
Contributor Author

We made a change recently that made the dynamic_hsa version the default. The error you're seeing is from an old HSA, so if you're overriding the default to use an old library that's probably not worth working around.

The error I see comes from the fact there is no old HSA around to workaround an LLVM bug.

There is no hsa/hsa.h in the tree, the default dynamic_hsa is not used.

The hsa/hsa.h file is from ROCm, not from LLVM.

Without such patch, LLVM requires ROCm to be installed and configured to be in default includes for src/rtl.cpp to build if hsa.cpp is not built.

This patch is to make LLVM use dynamic_hsa for building src/rtl.cpp because it is the default.

This patch is needed to build both release/17.x and release/18.x, the main branch changed the code layout so the patch will not work.

I assume a full LLVM build will not trigger the build problem because something else will include dynamic_hsa and will make it findable by src/rtl.cpp by luck. But when building a not-full LLVM, just what's needed by some applications, dynamic_hsa is not added to the include directories while being required by src/rtl.cpp.

@illwieckz
Copy link
Contributor Author

illwieckz commented Jun 14, 2024

$ rg HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY

libc/utils/gpu/loader/amdgpu/Loader.cpp
521:                               HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY),

openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
74:  HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY = 0xA016,

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
1892:    if (auto Err = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY,

The openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp file requires the HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY symbol.

This symbol is expected to be provided by openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h, not by third-party external /opt/rocm/include/hsa/hsa_ext_amd.h.

The code in release/17.x and release/18.x is explictely looking for ROCm's hsa/_ext_amd.h and never look for LLVM dynamic_hsa/hsa_ext_amd.h. It tries to look for LLVM-provided hsa_ext_amd.h as a fallback but because of a mistake in CMakeLists.txt, this doesn't work in all cases because dynamic_hsa is not added to include directories in all cases.

@illwieckz
Copy link
Contributor Author

illwieckz commented Jun 14, 2024

Here is a script to reproduce the bug:

#! /usr/bin/env bash

set -x -u -e -o pipefail

version="${1:-18}"

CMAKE_BUILD_PARALLEL_LEVEL="$(nproc)"
export CMAKE_BUILD_PARALLEL_LEVEL="${CMAKE_BUILD_PARALLEL_LEVEL:-4}"

workspace="llvm-bug95484-${version}"

rm -rf "${workspace}"
mkdir "${workspace}"
cd "${workspace}"

git clone --depth 1 \
	--branch "release/${version}.x" \
	'https://github.com/llvm/llvm-project.git' \
	'llvm-project'

git clone --depth 1 \
	'https://github.com/KhronosGroup/SPIRV-Headers.git' \
	'llvm-project/llvm/projects/SPIRV-Headers'

git clone --depth 1 \
	--branch "llvm_release_${version}0" \
	'https://github.com/KhronosGroup/SPIRV-LLVM-Translator.git' \
	'llvm-project/llvm/projects/SPIRV-LLVM-Translator'

cmake \
	-S'llvm-project/llvm' \
	-B'build' \
	-G'Ninja' \
	-D'CMAKE_INSTALL_PREFIX'='install' \
	-D'CMAKE_BUILD_TYPE'='Release' \
	-D'LLVM_ENABLE_PROJECTS'='clang;openmp' \
	-D'LLVM_TARGETS_TO_BUILD'='Native' \
	-D'LLVM_EXPERIMENTAL_TARGETS_TO_BUILD'='SPIRV' \
	-D'LLVM_EXTERNAL_PROJECTS'='SPIRV-Headers'

cmake --build 'build'

cmake --install 'build'

It can be used just by saving it as llvm-bug95484 and running it by doing either:

  • ./llvm-bug95484
    to fetch and attempt a clean build of release/18.x in a way it reproduces the bug;
  • ./llvm-bug95484 17
    to fetch and reproduce the bug with release/17.x.

It will fail this way:

llvm-bug95484-18/llvm-project/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1902:37:
 error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope;
 did you mean ‘HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY’?
 1902 |     if (auto Err = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY,
      |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                     HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY

Edit: The script probably builds useless things, but this is the use case I know that reproduces the bug.

@illwieckz
Copy link
Contributor Author

I reproduce the bug with both release/18.x and release/17.x.

I don't reproduce the bug with release/16.x.

I cannot test release/15.x because of other unrelated errors happening (like not having getenv defined).

@jhuber6
Copy link
Contributor

jhuber6 commented Jun 14, 2024

The openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp file requires the HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY symbol.

This symbol is expected to be provided by openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h, not by third-party external /opt/rocm/include/hsa/hsa_ext_amd.h.

This was introduced in ROCm-5.3, see https://github.com/ROCm/ROCR-Runtime/blob/rocm-5.3.x/src/inc/hsa_ext_amd.h#L333. The dynamic_hsa/ version is a copy of this header for use when the system version is not provided. If the system fails to find HSA, then it will use the dynamic version. The problem here is that you have HSA, but it's too old. I don't know how much backward compatibility we really provide here, unfortunately the HSA headers really don't give you much versioning to work with, so we can't do ifdef on this stuff.

Using LIBOMPTARGET_FORCE_DLOPEN_LIBHSA=ON should resolve this, though I reworked that option recently so it's different in 19.x.

@jdoerfert
Copy link
Member

If we make being declare variant elide on a user defined compile time condition, we could use the change in the EF_AMDGPU_MACH_AMDGCN_LAST value to determine a minimum version:

EF_AMDGPU_MACH_AMDGCN_LAST <= EF_AMDGPU_MACH_AMDGCN_GFX1013,

It's not possible right now but not hard to do.

@illwieckz
Copy link
Contributor Author

If the system fails to find HSA, then it will use the dynamic version.

Why is it expected for LLVM to build against system HSA first, then to rely on LLVM's HSA only if system one is missing?

I'm building LLVM, not something else, so I expect LLVM to build against LLVM files.

unfortunately the HSA headers really don't give you much versioning to work with, so we can't do ifdef on this stuff.

If LLVM was attempting to use LLVM's HSA first, it would be possible to do #if __has_include("dynamic_hsa/hsa.h"), then #elif __has_include("hsa/hsa.h") for system's one. But since LLVM provides its own HSA, it would always use its own one.

If a two-weeks old ROCm 6.1.2 is already considered too old by both LLVM 17, it is safe to assume that ROCm 6.1.2 will never be young enough for LLVM17… Then I don't see why both LLVM 17 and LLVM 18 would try to use something that will never be young enough…

@illwieckz
Copy link
Contributor Author

Setting -DLIBOMPTARGET_FORCE_DLOPEN_LIBHSA=ON that would always add dynamic_hsa to the include dir doesn't fix the build error.

@illwieckz
Copy link
Contributor Author

And the ROCm HSA also provides hsa_amd_agent_info_s::HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY so if it was using such system HSA, it should not error out.

@illwieckz
Copy link
Contributor Author

illwieckz commented Jun 16, 2024

I discovered something wrong:

ii  libhsa-runtime-dev 5.2.3-5                       amd64        HSA Runtime API and runtime for ROCm - development files
ii  libhsa-runtime64-1 5.2.3-5                       amd64        HSA Runtime API and runtime for ROCm
ii  libhsakmt1:amd64   5.2.3+dfsg-1                  amd64        Thunk library for AMD KFD (shlib)

It looks like I have some HSA 5.2.3 files… The libhsa-runtime-dev provides /usr/include/hsa/hsa.h.

This libhsa-runtime-dev package looks to be provided by the system (Ubuntu), but and if I attempt to uninstall it, it attempts to also uninstall rocm-hip-runtime=6.1.2.60102-119~22.04 and libdrm-amdgpu-dev

So ROCm 6.1.2 actually installs both /opt/rocm-6.1.2/include/hsa/hsa.h and 5.2.3 /usr/include/hsa/hsa.h, and of course only the one in /opt provides HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY

This doesn't make sense, the ROCm packages may be just badly made by AMD.

When I temporarily remove /usr/include/hsa without uninstalling libhsa-runtime-dev AND while using -DLIBOMPTARGET_FORCE_DLOPEN_LIBHSA=ON, I can build LLVM18 without patch.

With -DLIBOMPTARGET_FORCE_DLOPEN_LIBHSA=OFF CMake complains that /usr/include/hsa is missing.

So it looks like I'm tracking a ROCm packaging bug from AMD side.

But I still don't get why LLVM doesn't use its own HSA by default and doesn't use its own headers for enum values used in its own code.

@jhuber6
Copy link
Contributor

jhuber6 commented Jun 16, 2024

This doesn't make sense, the ROCm packages may be just badly made by AMD.

When I temporarily remove /usr/include/hsa without uninstalling libhsa-runtime-dev AND while using -DLIBOMPTARGET_FORCE_DLOPEN_LIBHSA=ON, I can build LLVM18 without patch.

With -DLIBOMPTARGET_FORCE_DLOPEN_LIBHSA=OFF CMake complains that /usr/include/hsa is missing.

So it looks like I'm tracking a ROCm packaging bug from AMD side.

I don't think AMD handles the actual packaging, that's on the maintainers for your distribution. I'm guessing that the HSA runtime is a separate package from ROCm and gets put somewhere else for whatever reason. I don't have any issues like that with my distribution.

But I still don't get why LLVM doesn't use its own HSA by default and doesn't use its own headers for enum values used in its own code.

It does in the main branch, but I think you're right that the logic causes it to look at the system one first since hsa/hsa.h is higher up the search list. Should probably do something about that.

@illwieckz
Copy link
Contributor Author

I don't think AMD handles the actual packaging.

They do: https://repo.radeon.com

It happens that Ubuntu provides an old 5.2.3 libhsa-runtime-dev and it seems to be a dependency of some package provided by the AMD repo.radeon.com own repository providing ROCm 6.1.2…

It does in the main branch, but I think you're right that the logic causes it to look at the system one first since hsa/hsa.h is higher up the search list. Should probably do something about that.

Maybe, instead of adding dynamic_hsa/ directory to include directories and assuming an ambiguous implicit directory for hsa.h, we would not add that directory and do explicitely:

#if defined(__has_include)
#if __has_include("dynamic_hsa/hsa.h")
#include "dynamic_hsa/hsa.h"
#include "dynamic_hsa/hsa_ext_amd.h"
#if __has_include("hsa/hsa.h")
#include "hsa/hsa.h"
#include "hsa/hsa_ext_amd.h"
#elif __has_include("hsa.h")
#include "hsa.h"
#include "hsa_ext_amd.h"
#endif
#else
#include "dynamic_hsa/hsa.h"
#include "dynamic_hsa/hsa_ext_amd.h"
#endif

@illwieckz
Copy link
Contributor Author

The /usr/include/hsa/hsa.h file is provided by the stock Ubuntu libhsa-runtime-dev package.
The /opt/rocm-6.1.2/include/hsa/hsa.h file is provided by the AMD hsa-rocr-dev package.
The AMD rocm-hip-runtime package depends on both hsa-rocr-dev and libhsa-runtime-dev

$ debtree rocm-hip-runtime 2>/dev/null | grep -E -- '-> "hip-runtime-amd|-> "hsa-rocr-dev|-> "hipcc|-> "libamdhip64-dev|-> "libhsa-runtime-dev'
	"rocm-hip-runtime" -> "hip-runtime-amd" [color=blue,label="(= 6.1.40093.60102-119~22.04)"];
	"hip-runtime-amd" -> "hsa-rocr-dev" [color=blue,label="(>= 1.3)"];
	"hip-runtime-amd" -> "hipcc" [color=blue];
	"hipcc" -> "libamdhip64-dev" [color=blue];
	"libamdhip64-dev" -> "libhsa-runtime-dev" [color=blue];
$ apt-get install --reinstall rocm-hip-runtime hip-runtime-amd hsa-rocr-dev hipcc libhsa-runtime-dev
Get:1 http://archive.ubuntu.com/ubuntu mantic/universe amd64 hipcc amd64 5.2.3-12 [25,1 kB]
Get:2 http://archive.ubuntu.com/ubuntu mantic/universe amd64 libhsa-runtime-dev amd64 5.2.3-5 [73,3 kB]
Get:3 https://repo.radeon.com/rocm/apt/6.1.2 jammy/main amd64 hip-runtime-amd amd64 6.1.40093.60102-119~22.04 [27,1 MB]
Get:4 https://repo.radeon.com/rocm/apt/6.1.2 jammy/main amd64 hsa-rocr-dev amd64 1.13.0.60102-119~22.04 [102 kB]
Get:5 https://repo.radeon.com/rocm/apt/6.1.2 jammy/main amd64 rocm-hip-runtime amd64 6.1.2.60102-119~22.04 [2 042 B]

So basically:

─ rocm-hip-runtime 6.1.2.60102-119~22.04 (AMD repository)
  └ hip-runtime-amd 6.1.40093.60102-119~22.04 (AMD repository)
    ├ hsa-rocr-dev 1.13.0.60102-119~22.04 (AMD repository)
    │ └ /opt/rocm-6.1.2/include/hsa/hsa.h
    └ hipcc 1.0.0.60102-119~22.04 (Ubuntu repository)
      └ libhsa-runtime-dev 5.0.0-1 (Ubuntu repository)
        └ /usr/include/hsa/hsa.h

This is definitely an AMD packaging issue.

jhuber6 added a commit to jhuber6/llvm-project that referenced this pull request Jun 17, 2024
Summary:
The HSA headers existed previously in `include/hsa.h` and were moved to
`include/hsa/hsa.h` in a later ROCm version. The include headers here
were originally designed to favor a newer one. However, this
unintentionally prevented the dyanmic HSA's `hsa.h` from being used if
both were present. This patch changes the order so it will be found
first.

Related to llvm#95484.
@jhuber6
Copy link
Contributor

jhuber6 commented Jun 17, 2024

Made #95769, don't know if the window for the 18.x window is still open for a back port.

jhuber6 added a commit that referenced this pull request Jun 17, 2024
Summary:
The HSA headers existed previously in `include/hsa.h` and were moved to
`include/hsa/hsa.h` in a later ROCm version. The include headers here
were originally designed to favor a newer one. However, this
unintentionally prevented the dyanmic HSA's `hsa.h` from being used if
both were present. This patch changes the order so it will be found
first.

Related to #95484.
@jhuber6
Copy link
Contributor

jhuber6 commented Jun 17, 2024

I merged my patch, I think the window to backport to 18.x is passed so you'll need to use the main branch for now. Thanks for bringing this to my attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants