-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AMDGPU] Use the AMDGPUToolChain when targeting C/C++ directly #99687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) ChangesSummary: Unfortunately, there's no good logic for detercting this, so I simply Full diff: https://github.com/llvm/llvm-project/pull/99687.diff 1 Files Affected:
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 8e44d5afa40e0..7bd78846766b8 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -6404,9 +6404,20 @@ const ToolChain &Driver::getToolChain(const ArgList &Args,
case llvm::Triple::CUDA:
TC = std::make_unique<toolchains::NVPTXToolChain>(*this, Target, Args);
break;
- case llvm::Triple::AMDHSA:
- TC = std::make_unique<toolchains::ROCMToolChain>(*this, Target, Args);
+ case llvm::Triple::AMDHSA: {
+ bool IsOpenCL = llvm::any_of(
+ Args.filtered(options::OPT_INPUT, options::OPT_x), [](auto &Arg) {
+ if (Arg->getOption().matches(options::OPT_INPUT))
+ return StringRef(Arg->getValue()).ends_with(".cl");
+ return StringRef(Arg->getValue()).ends_with("cl");
+ });
+ TC =
+ IsOpenCL
+ ? std::make_unique<toolchains::ROCMToolChain>(*this, Target, Args)
+ : std::make_unique<toolchains::AMDGPUToolChain>(*this, Target,
+ Args);
break;
+ }
case llvm::Triple::AMDPAL:
case llvm::Triple::Mesa3D:
TC = std::make_unique<toolchains::AMDGPUToolChain>(*this, Target, Args);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs test. I also would assume you shouldn't have to reinvent language detection code
There's an existing function for getting the inputs, but it takes the |
Also, apparently there's some driver tests that expect If that's the case then I can drop the toolchain detection hacks. |
There's a test that's |
Okay, so the only user of the toolchain is |
That was a mistake from long ago we shouldn't continue. A language is not an environment |
Alright, it works regardless but I should probably keep the check in there just in case. |
Ping |
I feel choosing toolchain based on input files does not solve all the use cases. You may want to handle the object files, bitcodes, or assembly files differently by using different toolchains, e.g. you may want to choose rocm toolchain or amdgpu toolchain or HIPAMD toolchain to hand the object file, even though it has the same extension. Although you could introduce -x for different object type, e.g. cl_obj, cxx_obj, or hip_obj, but that is quite cubbersome. I am wondering whether we should introduce a clang option which allows us to directly choose a toolchain. |
I think the main difference here is in the compile phase, which wants to load the ROCm libraries. They're probably identical after that point since the only differences exist inside the It's an inconvenience for me that targeting C++ is held hostage by OpenCL so if I ever do |
ping |
So the main reason I want this is so that I don't need to keep passing |
clang/lib/Driver/Driver.cpp
Outdated
TC = | ||
llvm::any_of(Inputs, | ||
[](auto &Input) { return types::isOpenCL(Input.first); }) | ||
? std::make_unique<toolchains::ROCMToolChain>(*this, Target, Args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If input files have mixed OpenCL and C/C++ programs, ROCMToolChain will be chosen. Is ROCmToolChain is able to compile C/C++ programs like AMDGPUToolChain, or it will cause errors? Should we emit diagnostic message that mixed OpenCL and C/C++ programs are not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the mixed case, all of them will be the ROCMToolChain
, which I'm okay with. ROCMToolChain
is what currently is invoked, the main difference is that it includes a lot of stuff in the ROCMToolChain while we want straight C/C++ to be freestanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same C/C++ source file will be compiled differently, depending on whether they are compiled with OpenCL programs. This may cause confusion to the users. Can we let ROCMToolChain behave the same as AMDGPUToolChain for C/C++ programs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Toolchain doesn't have source file information, I could put it in the constructor but it would be pretty much the same thing. I can't disable the ROCm device libs because that would then break OpenCL. I could make it an error instead, but I figured the differences are pretty minor so it's not worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth noting that using the ROCm toolchain is more permissive here. I don't think it's super relevant that we emit an error in this case.
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change the behavior with amdgcn-- or amdgcn-mesa-mesa3d?
Those both use the |
Mostly I'm just wondering if it breaks whatever clover is doing. |
Don't think so, should be handled here llvm-project/clang/lib/Driver/Driver.cpp Line 6436 in e8b9e13
|
I skimmed clover and I'm not sure it's even using the clang driver. It seems to be doing its own thing and directly using CompilerInstance |
it's also tough to say it's legal to drop the |
Summary: The getToolChain pass uses the triple to determine which toolchain to create. Currently the amdgcn-amd-amdhsa triple maps to the ROCmToolChain which uses things expected to be provided by ROCm. This is neded for OpenCL, but directly targeting C++ does not want this since it's primarily being used for creating GPU runtime code. As far as I know I'm the only user of this, so this shouldn't change anything. Unfortunately, there's no good logic for detercting this, so I simply checked ahead of time if the input is either foo.cl or -x cl foo.c to choose between the two. This allows us to use the AMDGPU target normally, as otherwise it will error without passing -nogpulib.
Updated this to be simpler. This logic is only run when compiled from This technically does have the effect that if someone does Any chance of this landing before the merge? This primarily allows letting CMake use the AMDGPU as a freestanding target without EVERY single compile option check failing with "Could not find ROCm toolchain". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the right thing. amdgpu is completely usable without rocm libraries and already has out of tree users doing that. Needing to manually opt out of rocm libs when not using any of rocm is definitely annoying, especially in contexts like godbolt where one is just trying to look at the emitted asm.
The code is ugly but that's necessary to preserve opencl's behaviour, agreed that it's not worth trying to change that.
Unfortunately, this breaks SPIR-V JITing, and probably cases where the input is .bc that came from an upstream that has calls to ROCDL symbols (e.g. HIP compiled to .bc). I appreciate the enthusiasm, and the spirit is possibly in the right place, but it would be ideal if we spent a bit more time considering the implications of these types of changes. Is it at all possible to revert and resubmit once this is sorted @jhuber6? |
This PR is six months old. I'm surprised that this breaks uses of the ROCm-DLs because those are normally baked-in per-TU via the |
It is six months old, but it is relatively difficult to keep track of all the concurrent non-communicating swimlanes - I do apologise for not having done so though! It is possible that we are microfocusing on a potential use case, but there are other valid ones. For example, ignoring the SPIR-V angle, one could emit IR from a HIP app compiled with -nogpulib. HIP headers forward various functions to ROCDL symbols, which means that the IR will include calls to ROCDL symbols. It is valid to then take said IR as input, and "finalise" it for AMDGPU, linking the device libs in the process. In this case you cannot disambiguate based on the source language / file extension. An example of an invocation that ends up here would be the following: clang -O3 -x ir -v foo.bc -mcode-object-version=5 --save-temps -fcolor-diagnostics -fexceptions -fvisibility=hidden -fconvergent-functions -ffp-contract=fast-honor-pragmas -fno-experimental-relative-c++-abi-vtables -O3 -fno-autolink -finline-functions -o foo.co -target amdgcn-amd-amdhsa -mcpu=gfx1030
# which expands into
clang-20 -cc1 -mcode-object-version=5 -mllvm --amdhsa-code-object-version=5 -triple amdgcn-amd-amdhsa -Werror=atomic-alignment -S -dumpdir foo.co- -save-temps=cwd -disable-free -clear-ast-before-backend -main-file-name foo.bc -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -mframe-pointer=none -fno-experimental-relative-c++-abi-vtables -ffp-contract=fast-honor-pragmas -fno-rounding-math -mconstructor-aliases -target-cpu gfx1030 -debugger-tuning=gdb -fdebug-compilation-dir=/something/ -v -fconvergent-functions -resource-dir /llvm-project/build/lib/clang/20 -O3 -fno-experimental-relative-c++-abi-vtables -fno-autolink -ferror-limit 19 -fvisibility=hidden -fgnuc-version=4.2.1 -finline-functions -fskip-odr-check-in-gmf -fexceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o foo.s -x ir foo.bc
# then
clang-20 -cc1as -mllvm --amdhsa-code-object-version=5 -triple amdgcn-amd-amdhsa -filetype obj -main-file-name foo.bc -target-cpu gfx1030 -fdebug-compilation-dir=/something/ -dwarf-version=5 -mrelocation-model pic -o foo.s
# finally
ld.lld --no-undefined -shared foo.o -plugin-opt=mcpu=gfx1030 -o foo.co
ld.lld: error: undefined symbol: __ockl_fprintf_stderr_begin
>>> referenced by 3143359427648554286.spv.o:(__assert_fail)
>>> referenced by 3143359427648554286.spv.o:(__assert_fail) In the future, it would also be somewhat helpful, if possible, to advertise what amounts to flipping a long-established default, even when said default is dubious. |
Okay, so this case definitely tries to link the ROCm bitcode via If there's an existing use-case that this breaks, I can revert it. Long term however it would be nice if the ROCm device libraries could be pulled into a more friendly library format. Then you could just pass it to
I need to determine the optimal graph of AMD engineers to include in the reviewers list of patches like these. |
I outlined an existing, as far as we (AMD) are concerned, use case, so if it'd be possible to revert, it'd be amazing! I suspect there's more than this, though, and the "we shouldn't fix X" is extremely uncharitable, even if morally sound. Perhaps this entire topic warrants a bit more discussion and design before we reland? |
I was really hoping that this would make it into the reluctant to revert it since it'll result in clang-20 being considerably less usable for straight C/C++ on AMDGPU. I was mostly wondering if we knew anyone that did this in the wild or if this broke testing. The problem is that if we don't break this behavior then it's pretty much impossible to decouple ROCm's special magic with just using the AMD toolchain normally without passing |
I am explicitly saying that we know of this being done in something that we care about. So please revert, and we can see when and how to break things. |
Many thanks, this is highly appreciated; let's try to coordinate internally. |
…ly (llvm#99687)" Summary: This reverts commit 740e6ae. After discussions it was determined that the behavior for IR inputs needs to be maintained at least for now. In the future we should put a deprecation notice on this behavior. This patch keeps the old beahvior for OpenCL and IR inputs, while others will be standalone. This is good enough for standard compile flows.
…ly (llvm#99687)" Summary: This reverts commit 740e6ae. After discussions it was determined that the behavior for IR inputs needs to be maintained at least for now. In the future we should put a deprecation notice on this behavior. This patch keeps the old beahvior for OpenCL and IR inputs, while others will be standalone. This is good enough for standard compile flows.
…ly (llvm#99687)" Summary: This reverts commit 740e6ae. After discussions it was determined that the behavior for IR inputs needs to be maintained at least for now. In the future we should put a deprecation notice on this behavior. This patch keeps the old beahvior for OpenCL and IR inputs, while others will be standalone. This is good enough for standard compile flows.
…ly (llvm#99687)" Summary: This reverts commit 740e6ae. After discussions it was determined that the behavior for IR inputs needs to be maintained at least for now. In the future we should put a deprecation notice on this behavior. This patch keeps the old beahvior for OpenCL and IR inputs, while others will be standalone. This is good enough for standard compile flows.
Summary:
The
getToolChain
pass uses the triple to determine which toolchain tocreate. Currently the
amdgcn-amd-amdhsa
triple maps to theROCmToolChain
which uses things expected to be provided byROCm
.This is neded for OpenCL, but directly targeting C++ does not want this
since it's primarily being used for creating GPU runtime code. As far as
I know I'm the only user of this, so this shouldn't change anything.
Unfortunately, there's no good logic for detercting this, so I simply
checked ahead of time if the input is either
foo.cl
or-x cl foo.c
to choose between the two. This allows us to use the AMDGPU target
normally, as otherwise it will error without passing
-nogpulib
.