-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Re-enable closed-world assumption as an opt-in feature #115371
[AMDGPU] Re-enable closed-world assumption as an opt-in feature #115371
Conversation
@llvm/pr-subscribers-lto @llvm/pr-subscribers-backend-amdgpu Author: Shilei Tian (shiltian) ChangesAlthough the ABI (if any exists) doesn’t explicitly prohibit cross-device-image Full diff: https://github.com/llvm/llvm-project/pull/115371.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 2ae34636005eac..0a33ff7072be08 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1068,6 +1068,10 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
Attributor A(Functions, InfoCache, AC);
+ LLVM_DEBUG(dbgs() << "[AMDGPUAttributor] Module " << M.getName() << " is "
+ << (AC.IsClosedWorldModule ? "" : "not ")
+ << "assumed to be a closed world.\n");
+
for (auto *F : Functions) {
A.getOrCreateAAFor<AAAMDAttributes>(IRPosition::function(*F));
A.getOrCreateAAFor<AAUniformWorkGroupSize>(IRPosition::function(*F));
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 786baa6820e860..6b93a659debb7b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -449,6 +449,11 @@ static cl::opt<bool>
cl::desc("Enable AMDGPUAttributorPass"),
cl::init(true), cl::Hidden);
+static cl::opt<bool> HasClosedWorldAssumption(
+ "amdgpu-link-time-closed-world",
+ cl::desc("Whether has closed-world assumption at link time"),
+ cl::init(true), cl::Hidden);
+
extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
// Register the target
RegisterTargetMachine<R600TargetMachine> X(getTheR600Target());
@@ -836,8 +841,12 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
PM.addPass(InternalizePass(mustPreserveGV));
PM.addPass(GlobalDCEPass());
}
- if (EnableAMDGPUAttributor)
- PM.addPass(AMDGPUAttributorPass(*this));
+ if (EnableAMDGPUAttributor) {
+ AMDGPUAttributorOptions Opt;
+ if (HasClosedWorldAssumption)
+ Opt.IsClosedWorld = true;
+ PM.addPass(AMDGPUAttributorPass(*this, Opt));
+ }
}
});
diff --git a/llvm/test/LTO/AMDGPU/closed-world-assumption.ll b/llvm/test/LTO/AMDGPU/closed-world-assumption.ll
new file mode 100644
index 00000000000000..cfd3b0db74ccb0
--- /dev/null
+++ b/llvm/test/LTO/AMDGPU/closed-world-assumption.ll
@@ -0,0 +1,12 @@
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -O3 -debug-only=amdgpu-attributor -o - %s 2>&1 | FileCheck %s --check-prefix=NO-CW
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes="lto<O3>" -debug-only=amdgpu-attributor -o - %s 2>&1 | FileCheck %s --check-prefix=CW
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes="lto<O3>" -debug-only=amdgpu-attributor -amdgpu-link-time-closed-world=0 -o - %s 2>&1 | FileCheck %s --check-prefix=NO-CW
+
+; REQUIRES: amdgpu-registered-target
+; REQUIRES: asserts
+
+; NO-CW: Module {{.*}} is not assumed to be a closed world.
+; CW: Module {{.*}} is assumed to be a closed world.
+define hidden noundef i32 @_Z3foov() {
+ ret i32 1
+}
|
The issue was related to any use of global constructors, because those come from an externally initialized array of function pointers given to us by the linker. That affects HIP asan, OpenMP ctors / dtors, and the |
What format are they provided in? IR or binary? If they are binary, it is essentially calling from one device image to another. If they are IR, we should have seen them at this moment. |
They're external symbols provided by the linker. Think of it like this. extern int *foo_begin;
extern int *foo_end; static int foo[] = {1, 3, 4};
int *foo_begin = foo;
int *foo_end = foo + sizeof(foo); ld.lld foo_use.o foo_def.o Except that it's all internal to the linker. https://maskray.me/blog/2021-11-07-init-ctors-init-array |
I'm not sure that is related. Essentially what you are describing is ABI linking, which is not officially supported. That's why we only allow RDC for those programming models we support, and in RDC mode I don't think it's gonna happen. |
"ABI" linking works just fine for globals, it's only an issue for functions and functions that reference LDS. We still define global constructors in "non-RDC" mode, the distinction isn't really important once we're at the backend. |
I mean, whether they work or not doesn't really matter here. I'm not saying it doesn't work. They are just not officially supported. In all the cases we officially support, ABI linking will not be involved. |
I really don't think this should be opt-out, the behavior is not limited to |
We can't ship "miscompile by default". Some people will see their code is faster and not notice any problems and they'll be happy. Other people will see that their code stops working, track it down to this commit and be angry at us. This assumption is a really high value one. It ties into the rdc (or non-rdc, don't remember) cuda thing. It trivialises escape analysis. The attribute propagator will love it. I'd like it for the LDS allocator. But it will break code, and we really can't deliberately break code on the default options. Add the option as an opt-in and ideally set it in that cuda mode and document it as a good idea when the conditions are met. What we could do, if we wanted to be aggressive, is switch it on as part of -Ofast. I think that's reasonable because Ofast is already a break-my-code command and people using it are relatively likely to want this feature, and used to their code stopping working when the compiler changes (and thus likely to be open to adding the disable version if needed). Please add the flag, but default it to "correct" instead of to "fast". I'm not sure how this works in the context of clang either unless doing unity builds containing the language runtimes inline which seems very niche. Maybe the IR passes are the right place to check the flag so we don't have to wait until lld is running. The current target machine location looks reasonable.
That's documented as working but was a segv when I tried it a few years ago. I wouldn't expect the loader to handle references between code objects. |
We have had this conversation many times before. Long story short, you cannot make the argument it "happens to work", hence we have to continue to support it. If there is any other objection to this, please let us know, otherwise, I'll advocate to accept this and update the internal use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be the default. Frontends need to explicitly opt in. Wrong by default opt out optimizations are unacceptable in any circumstance.
Also needs to be surfaced as a proper user facing flag to enable/disable (and it shouldn't be an amdgpu backend specific flag)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @arsenm to make this opt-in by FE
I think this should also check functions reference in |
[Re opt-in]
I don't know what this is supposed to mean. Who checks llvm.global_ctors? The driver? The former cannot, and the latter will analyzes uses and act on the flag as defined, I don't see how that interacts with llvm.global_ctors. |
The ctors global needs to be counted as a user of the function if it contains it. |
Although the ABI (if any exists) doesn’t explicitly prohibit cross-device-image function calls, especially since our loader can handle them, for all officially supported programming models, this is not actually allowed. Given this, assuming a closed-world model at link time is safe. However, there are certain cases, such as the GPU libc project, that use non-standard approaches which could break this assumption. This PR introduces an option to disable this assumption when needed.
0ef2313
to
80b8e6b
Compare
I updated the PR to make it as an opt-in feature, as well as the description. FWIW, since we "support" cross-code-object global variables, we really can't assume anything by default thus we can't toggle anything in the front end as well. |
|
||
; NO-CW: Module {{.*}} is not assumed to be a closed world. | ||
; CW: Module {{.*}} is assumed to be a closed world. | ||
define hidden noundef i32 @_Z3foov() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not have any testable effects? I would expect to see a test that shows the before and after of the expected changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO-CW
and CW
, isn't it the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but that's just a debug message, I would expect to see IR changes. Also there's no sense in using C++ mangling in IR tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is tested in other files. this PR is just to toggle the switch on/off. it doesn't introduce this as a new feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still better to check an actual change, and avoid dependent on asserts builds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, we can possibly make this the option passed to the backend job for -fno-rdc
in HIP / CUDA.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/108/builds/6971 Here is the relevant piece of the build log for the reference
|
Although the ABI (if one exists) doesn’t explicitly prohibit cross-code-object function calls—particularly since our loader can handle them—such calls are not actually allowed in any of the officially supported programming models. However, this limitation has some nuances. For instance, the loader can handle cross-code-object global variables, which complicates the situation further.
Given this complexity, assuming a closed-world model at link time isn’t always safe. To address this, this PR introduces an option that enables this assumption, providing end users the flexibility to enable it for improved compiler optimizations. However, it is the user’s responsibility to ensure they do not violate this assumption.