[AMDGPU] Re-enable closed-world assumption as an opt-in feature #115371

shiltian · 2024-11-07T21:25:23Z

Although the ABI (if one exists) doesn’t explicitly prohibit cross-code-object function calls—particularly since our loader can handle them—such calls are not actually allowed in any of the officially supported programming models. However, this limitation has some nuances. For instance, the loader can handle cross-code-object global variables, which complicates the situation further.

Given this complexity, assuming a closed-world model at link time isn’t always safe. To address this, this PR introduces an option that enables this assumption, providing end users the flexibility to enable it for improved compiler optimizations. However, it is the user’s responsibility to ensure they do not violate this assumption.

shiltian · 2024-11-07T21:25:41Z

[AMDGPU] Re-enable closed-world assumption as an opt-in feature #115371 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2024-11-07T21:26:00Z

@llvm/pr-subscribers-lto

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

Although the ABI (if any exists) doesn’t explicitly prohibit cross-device-image
function calls, especially since our loader can handle them, for all officially
supported programming models, this is not actually allowed. Given this, assuming
a closed-world model at link time is safe. However, there are certain cases,
such as the GPU libc project, that use non-standard approaches which could break
this assumption. This PR introduces an option to disable this assumption when
needed.

Full diff: https://github.com/llvm/llvm-project/pull/115371.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+4)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+11-2)
(added) llvm/test/LTO/AMDGPU/closed-world-assumption.ll (+12)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 2ae34636005eac..0a33ff7072be08 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1068,6 +1068,10 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
 
   Attributor A(Functions, InfoCache, AC);
 
+  LLVM_DEBUG(dbgs() << "[AMDGPUAttributor] Module " << M.getName() << " is "
+                    << (AC.IsClosedWorldModule ? "" : "not ")
+                    << "assumed to be a closed world.\n");
+
   for (auto *F : Functions) {
     A.getOrCreateAAFor<AAAMDAttributes>(IRPosition::function(*F));
     A.getOrCreateAAFor<AAUniformWorkGroupSize>(IRPosition::function(*F));
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 786baa6820e860..6b93a659debb7b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -449,6 +449,11 @@ static cl::opt<bool>
                            cl::desc("Enable AMDGPUAttributorPass"),
                            cl::init(true), cl::Hidden);
 
+static cl::opt<bool> HasClosedWorldAssumption(
+    "amdgpu-link-time-closed-world",
+    cl::desc("Whether has closed-world assumption at link time"),
+    cl::init(true), cl::Hidden);
+
 extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
   // Register the target
   RegisterTargetMachine<R600TargetMachine> X(getTheR600Target());
@@ -836,8 +841,12 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
             PM.addPass(InternalizePass(mustPreserveGV));
             PM.addPass(GlobalDCEPass());
           }
-          if (EnableAMDGPUAttributor)
-            PM.addPass(AMDGPUAttributorPass(*this));
+          if (EnableAMDGPUAttributor) {
+            AMDGPUAttributorOptions Opt;
+            if (HasClosedWorldAssumption)
+              Opt.IsClosedWorld = true;
+            PM.addPass(AMDGPUAttributorPass(*this, Opt));
+          }
         }
       });
 
diff --git a/llvm/test/LTO/AMDGPU/closed-world-assumption.ll b/llvm/test/LTO/AMDGPU/closed-world-assumption.ll
new file mode 100644
index 00000000000000..cfd3b0db74ccb0
--- /dev/null
+++ b/llvm/test/LTO/AMDGPU/closed-world-assumption.ll
@@ -0,0 +1,12 @@
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -O3 -debug-only=amdgpu-attributor -o - %s 2>&1 | FileCheck %s --check-prefix=NO-CW
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes="lto<O3>" -debug-only=amdgpu-attributor -o - %s 2>&1 | FileCheck %s --check-prefix=CW
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes="lto<O3>" -debug-only=amdgpu-attributor -amdgpu-link-time-closed-world=0 -o - %s 2>&1 | FileCheck %s --check-prefix=NO-CW
+
+; REQUIRES: amdgpu-registered-target
+; REQUIRES: asserts
+
+; NO-CW: Module {{.*}} is not assumed to be a closed world.
+; CW: Module {{.*}} is assumed to be a closed world.
+define hidden noundef i32 @_Z3foov() {
+  ret i32 1
+}

jhuber6 · 2024-11-07T21:32:18Z

The issue was related to any use of global constructors, because those come from an externally initialized array of function pointers given to us by the linker. That affects HIP asan, OpenMP ctors / dtors, and the libc test suite.

shiltian · 2024-11-07T21:35:49Z

What format are they provided in? IR or binary? If they are binary, it is essentially calling from one device image to another. If they are IR, we should have seen them at this moment.

jhuber6 · 2024-11-07T21:38:42Z

What format are they provided in? IR or binary? If they are binary, it is essentially calling from one device image to another. If they are IR, we should have seen them at this moment.

They're external symbols provided by the linker. Think of it like this.

extern int *foo_begin;
extern int *foo_end;

static int foo[] = {1, 3, 4};
int *foo_begin = foo;
int *foo_end = foo + sizeof(foo);

ld.lld foo_use.o foo_def.o

Except that it's all internal to the linker. https://maskray.me/blog/2021-11-07-init-ctors-init-array

shiltian · 2024-11-07T21:47:21Z

I'm not sure that is related. Essentially what you are describing is ABI linking, which is not officially supported. That's why we only allow RDC for those programming models we support, and in RDC mode I don't think it's gonna happen.

jhuber6 · 2024-11-07T21:51:55Z

I'm not sure that is related. Essentially what you are describing is ABI linking, which is not officially supported. That's why we only allow RDC for those programming models we support, and in RDC mode I don't think it's gonna happen.

"ABI" linking works just fine for globals, it's only an issue for functions and functions that reference LDS. We still define global constructors in "non-RDC" mode, the distinction isn't really important once we're at the backend.

shiltian · 2024-11-07T22:14:19Z

I mean, whether they work or not doesn't really matter here. I'm not saying it doesn't work. They are just not officially supported. In all the cases we officially support, ABI linking will not be involved.

jhuber6 · 2024-11-08T18:43:27Z

I really don't think this should be opt-out, the behavior is not limited to libc, it's a quality of anything exposed to the outside environment like extern variables or externally visible variables. We can probably make a strong assertion about things in rdc mode, but that's a language feature that's not exposed to LLVM-IR. If we emit some module flag and then some error message stating that any IR modules with RDC mode flags can never be linked then I could see it working.

JonChesterfield · 2024-11-08T18:49:49Z

We can't ship "miscompile by default". Some people will see their code is faster and not notice any problems and they'll be happy. Other people will see that their code stops working, track it down to this commit and be angry at us.

This assumption is a really high value one. It ties into the rdc (or non-rdc, don't remember) cuda thing. It trivialises escape analysis. The attribute propagator will love it. I'd like it for the LDS allocator. But it will break code, and we really can't deliberately break code on the default options. Add the option as an opt-in and ideally set it in that cuda mode and document it as a good idea when the conditions are met.

What we could do, if we wanted to be aggressive, is switch it on as part of -Ofast. I think that's reasonable because Ofast is already a break-my-code command and people using it are relatively likely to want this feature, and used to their code stopping working when the compiler changes (and thus likely to be open to adding the disable version if needed).

Please add the flag, but default it to "correct" instead of to "fast". I'm not sure how this works in the context of clang either unless doing unity builds containing the language runtimes inline which seems very niche. Maybe the IR passes are the right place to check the flag so we don't have to wait until lld is running. The current target machine location looks reasonable.

especially since our loader can handle them

That's documented as working but was a segv when I tried it a few years ago. I wouldn't expect the loader to handle references between code objects.

jdoerfert · 2024-11-18T19:39:16Z

I'm not sure that is related. Essentially what you are describing is ABI linking, which is not officially supported. That's why we only allow RDC for those programming models we support, and in RDC mode I don't think it's gonna happen.

"ABI" linking works just fine for globals, it's only an issue for functions and functions that reference LDS. We still define global constructors in "non-RDC" mode, the distinction isn't really important once we're at the backend.

We have had this conversation many times before.
As said before: please point out the documentation that tells me this is expected to work.

Long story short, you cannot make the argument it "happens to work", hence we have to continue to support it.
There is no reason to assume this breaks code outside of the inside LLVM use case, at least I'm not aware of one. The LLVM use case can opt out of this.
There is no alternative to this, and holding this back for the LLVM use case is not helpful to anyone, especially since the feature is not "officially" supported to begin with.

If there is any other objection to this, please let us know, otherwise, I'll advocate to accept this and update the internal use case.

arsenm

This should not be the default. Frontends need to explicitly opt in. Wrong by default opt out optimizations are unacceptable in any circumstance.

Also needs to be surfaced as a proper user facing flag to enable/disable (and it shouldn't be an amdgpu backend specific flag)

scchan

Agree with @arsenm to make this opt-in by FE

jhuber6 · 2024-11-18T20:30:33Z

I think this should also check functions reference in llvm.global_ctors as used.

jdoerfert · 2024-11-26T00:07:45Z

[Re opt-in]
Sure, let's make this off by default and add a driver/cc1 flag that users and offloading languages that define this as default can set.

I think this should also check functions reference in llvm.global_ctors as used.

I don't know what this is supposed to mean. Who checks llvm.global_ctors? The driver?
The pass?

The former cannot, and the latter will analyzes uses and act on the flag as defined, I don't see how that interacts with llvm.global_ctors.

jhuber6 · 2024-11-26T00:10:01Z

I don't know what this is supposed to mean. Who checks llvm.global_ctors? The driver? The pass?

The former cannot, and the latter will analyzes uses and act on the flag as defined, I don't see how that interacts with llvm.global_ctors.

The ctors global needs to be counted as a user of the function if it contains it.

Although the ABI (if any exists) doesn’t explicitly prohibit cross-device-image function calls, especially since our loader can handle them, for all officially supported programming models, this is not actually allowed. Given this, assuming a closed-world model at link time is safe. However, there are certain cases, such as the GPU libc project, that use non-standard approaches which could break this assumption. This PR introduces an option to disable this assumption when needed.

shiltian · 2024-12-10T18:00:24Z

I updated the PR to make it as an opt-in feature, as well as the description. FWIW, since we "support" cross-code-object global variables, we really can't assume anything by default thus we can't toggle anything in the front end as well.

jhuber6 · 2024-12-10T18:02:38Z

llvm/test/LTO/AMDGPU/closed-world-assumption.ll

+
+; NO-CW: Module {{.*}} is not assumed to be a closed world.
+; CW: Module {{.*}} is assumed to be a closed world.
+define hidden noundef i32 @_Z3foov() {


Does this not have any testable effects? I would expect to see a test that shows the before and after of the expected changes.

NO-CW and CW, isn't it the difference?

Yeah but that's just a debug message, I would expect to see IR changes. Also there's no sense in using C++ mangling in IR tests.

that is tested in other files. this PR is just to toggle the switch on/off. it doesn't introduce this as a new feature.

It's still better to check an actual change, and avoid dependent on asserts builds

jhuber6

Okay, we can possibly make this the option passed to the backend job for -fno-rdc in HIP / CUDA.

llvm-ci · 2024-12-10T21:07:51Z

LLVM Buildbot has detected a new failure on builder clang-debian-cpp20 running on clang-debian-cpp20 while building llvm at step 2 "checkout".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/108/builds/6971

Here is the relevant piece of the build log for the reference

Step 2 (checkout) failure: update (failure)
git version 2.43.0
fatal: unable to access 'https://github.com/llvm/llvm-project.git/': Could not resolve host: github.com
fatal: unable to access 'https://github.com/llvm/llvm-project.git/': Could not resolve host: github.com

llvmbot added backend:AMDGPU LTO Link time optimization (regular/full LTO or ThinLTO) labels Nov 7, 2024

shiltian requested review from arsenm, jdoerfert, jhuber6 and gandhi56 November 7, 2024 21:25

shiltian requested a review from scchan November 7, 2024 21:29

arsenm requested changes Nov 18, 2024

View reviewed changes

scchan requested changes Nov 18, 2024

View reviewed changes

shiltian force-pushed the users/shiltian/closed-world-assumption-opt-in-by-default branch from 0ef2313 to 80b8e6b Compare December 10, 2024 17:54

shiltian changed the title ~~[AMDGPU] Re-enable closed-world assumption as an opt-out feature~~ [AMDGPU] Re-enable closed-world assumption as an opt-in feature Dec 10, 2024

jhuber6 reviewed Dec 10, 2024

View reviewed changes

jhuber6 approved these changes Dec 10, 2024

View reviewed changes

shiltian merged commit 04269ea into main Dec 10, 2024
8 checks passed

shiltian deleted the users/shiltian/closed-world-assumption-opt-in-by-default branch December 10, 2024 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Re-enable closed-world assumption as an opt-in feature #115371

[AMDGPU] Re-enable closed-world assumption as an opt-in feature #115371

shiltian commented Nov 7, 2024 •

edited

Loading

shiltian commented Nov 7, 2024 •

edited

Loading

llvmbot commented Nov 7, 2024 •

edited

Loading

jhuber6 commented Nov 7, 2024

shiltian commented Nov 7, 2024

jhuber6 commented Nov 7, 2024 •

edited

Loading

shiltian commented Nov 7, 2024

jhuber6 commented Nov 7, 2024

shiltian commented Nov 7, 2024

jhuber6 commented Nov 8, 2024

JonChesterfield commented Nov 8, 2024 •

edited

Loading

jdoerfert commented Nov 18, 2024

arsenm left a comment •

edited

Loading

scchan left a comment

jhuber6 commented Nov 18, 2024

jdoerfert commented Nov 26, 2024

jhuber6 commented Nov 26, 2024

shiltian commented Dec 10, 2024

jhuber6 Dec 10, 2024

shiltian Dec 10, 2024

jhuber6 Dec 10, 2024 •

edited

Loading

shiltian Dec 10, 2024

arsenm Dec 12, 2024

jhuber6 left a comment

llvm-ci commented Dec 10, 2024

[AMDGPU] Re-enable closed-world assumption as an opt-in feature #115371

[AMDGPU] Re-enable closed-world assumption as an opt-in feature #115371

Conversation

shiltian commented Nov 7, 2024 • edited Loading

shiltian commented Nov 7, 2024 • edited Loading

llvmbot commented Nov 7, 2024 • edited Loading

jhuber6 commented Nov 7, 2024

shiltian commented Nov 7, 2024

jhuber6 commented Nov 7, 2024 • edited Loading

shiltian commented Nov 7, 2024

jhuber6 commented Nov 7, 2024

shiltian commented Nov 7, 2024

jhuber6 commented Nov 8, 2024

JonChesterfield commented Nov 8, 2024 • edited Loading

jdoerfert commented Nov 18, 2024

arsenm left a comment • edited Loading

Choose a reason for hiding this comment

scchan left a comment

Choose a reason for hiding this comment

jhuber6 commented Nov 18, 2024

jdoerfert commented Nov 26, 2024

jhuber6 commented Nov 26, 2024

shiltian commented Dec 10, 2024

jhuber6 Dec 10, 2024

Choose a reason for hiding this comment

shiltian Dec 10, 2024

Choose a reason for hiding this comment

jhuber6 Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

shiltian Dec 10, 2024

Choose a reason for hiding this comment

arsenm Dec 12, 2024

Choose a reason for hiding this comment

jhuber6 left a comment

Choose a reason for hiding this comment

llvm-ci commented Dec 10, 2024

shiltian commented Nov 7, 2024 •

edited

Loading

shiltian commented Nov 7, 2024 •

edited

Loading

llvmbot commented Nov 7, 2024 •

edited

Loading

jhuber6 commented Nov 7, 2024 •

edited

Loading

JonChesterfield commented Nov 8, 2024 •

edited

Loading

arsenm left a comment •

edited

Loading

jhuber6 Dec 10, 2024 •

edited

Loading