Skip to content

Global string alignment #142346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dominik-steenken
Copy link
Contributor

When creating global strings, some targets have requirements that need to be taken into account. Previously, the global strings created by IRBuilder::createGlobalString had a hard-coded alignment of 1.

This commit makes it so that the alignment is taken from the data layout instead, giving targets the chance to align global strings according to their preferences.

This PR is motivated by (and should fix) #141491, where the 1-byte alignment in a global string created by a printf optimization led to the resulting assembly in a -fno-PIC compile to unexpectedly reference the GOT based on whether the code contained printf statements of the form printf("foo\n");.

I don't know and would appreciate feedback on whether this is the correct place to fix something like that.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:SystemZ llvm:ir labels Jun 2, 2025
@dominik-steenken
Copy link
Contributor Author

@uweigand FYI

@llvmbot
Copy link
Member

llvmbot commented Jun 2, 2025

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-ir

Author: Dominik Steenken (dominik-steenken)

Changes

When creating global strings, some targets have requirements that need to be taken into account. Previously, the global strings created by IRBuilder::createGlobalString had a hard-coded alignment of 1.

This commit makes it so that the alignment is taken from the data layout instead, giving targets the chance to align global strings according to their preferences.

This PR is motivated by (and should fix) #141491, where the 1-byte alignment in a global string created by a printf optimization led to the resulting assembly in a -fno-PIC compile to unexpectedly reference the GOT based on whether the code contained printf statements of the form printf("foo\n");.

I don't know and would appreciate feedback on whether this is the correct place to fix something like that.


Full diff: https://github.com/llvm/llvm-project/pull/142346.diff

2 Files Affected:

  • (added) clang/test/CodeGen/SystemZ/align-systemz-globalstring.c (+14)
  • (modified) llvm/lib/IR/IRBuilder.cpp (+1-1)
diff --git a/clang/test/CodeGen/SystemZ/align-systemz-globalstring.c b/clang/test/CodeGen/SystemZ/align-systemz-globalstring.c
new file mode 100644
index 0000000000000..f5178a079f898
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/align-systemz-globalstring.c
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -O1 -triple s390x-linux-gnu -emit-llvm %s -o - | FileCheck %s
+
+// #include <stdio.h>
+
+// CHECK: @msg1 = local_unnamed_addr constant [13 x i8] c"Hello World\0A\00", align 2
+// CHECK: @str = private unnamed_addr constant [12 x i8] c"Hello World\00", align 2
+
+const char msg1 [] = "Hello World\n";
+
+extern int printf(const char *__restrict __format, ...);
+
+void foo() {
+    printf(msg1);
+}
\ No newline at end of file
diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index 580b0af709337..ab06a587a861f 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -52,7 +52,7 @@ GlobalVariable *IRBuilderBase::CreateGlobalString(StringRef Str,
       *M, StrConstant->getType(), true, GlobalValue::PrivateLinkage,
       StrConstant, Name, nullptr, GlobalVariable::NotThreadLocal, AddressSpace);
   GV->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
-  GV->setAlignment(Align(1));
+  GV->setAlignment(M->getDataLayout().getPreferredAlign(GV));
   return GV;
 }
 

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right fix for the issue. Just like for allocas, the global variable alignment is a minimum required alignment, which can and should be raised by targets if it improves code generation. (The exception to this is if the global has a section.)

It looks like CGP has a shouldAlignPointerArgs hook for this purpose, but that seems overly specific for what is desired here. I'm surprised we don't seem have something more generic for this...

@@ -52,7 +52,7 @@ GlobalVariable *IRBuilderBase::CreateGlobalString(StringRef Str,
*M, StrConstant->getType(), true, GlobalValue::PrivateLinkage,
StrConstant, Name, nullptr, GlobalVariable::NotThreadLocal, AddressSpace);
GV->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
GV->setAlignment(Align(1));
GV->setAlignment(M->getDataLayout().getPreferredAlign(GV));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just equivalent to not setting the alignment at all...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it did change the alignment of the string produced by the PrintFOptimizer from 1 to 2, which is why i thought i might be on the right track.

@nikic nikic requested a review from efriedma-quic June 2, 2025 08:52
@dominik-steenken
Copy link
Contributor Author

dominik-steenken commented Jun 2, 2025

I'm not sure this is the right fix for the issue. Just like for allocas, the global variable alignment is a minimum required alignment, which can and should be raised by targets if it improves code generation. (The exception to this is if the global has a section.)

It looks like CGP has a shouldAlignPointerArgs hook for this purpose, but that seems overly specific for what is desired here. I'm surprised we don't seem have something more generic for this...

Like i said, i'm very open to moving this to someplace different, just lacking the background knowledge of where to put it :)

I'm going to try and take a look at how we handle allocas, and have a look at CGP as well.

@efriedma-quic
Copy link
Collaborator

If there's some general rule that means we want to increase the alignment of globals, I think it makes sense to make CodeGenPrepare or some target-specific pass, instead of trying to fix every frontend and every optimization pass to use increased alignment.

@dominik-steenken
Copy link
Contributor Author

dominik-steenken commented Jun 5, 2025

So i've been reading through the code and the history a bit, and there is this commit, which introduces the concept of a minimum alignment for globals to clang. The clang target gets a MinGlobalAlign attribute that is supposed to be referenced via ASTContext::getAlignOfGlobalVar(InChars) when creating global variables as part of clang's CodeGen. This is why the initial global variable that carries the string before the optimization has the correct alignment.

From what i can tell, this doesn't work here because the optimization for the printf happens in the context of LLVMs IRBuilder, where the ASTContext is (no longer?) available.

I don't know what should happen in general for optimizations that end up adding global variables, but in this particular case i think there is an easier way. The point of this optimization is to convert the printf into at puts, and that requires "taking off" the trailing newline. Other than that, the new string is intended to be a direct replacement for the existing one. Since that is the case, i should be able to set the alignment of the new GlobalVariable to the alignment of the existing one. Would this be ok?

@efriedma-quic
Copy link
Collaborator

The clang patch was written the way it was because it was necessary to comply with the ABI rules. Strings passed to printf don't have any sort of alignment requirement, so you can't really appeal to the ABI rules here, I think?

The problem with copying the alignment is that we don't really have any reason to believe the alignment of the original string is actually appropriate.

For the original patch, I think the comment was that the patch was equivalent to deleting the setAlignment() call. And we really don't want to set the alignment that high; the current getPreferredAlign() prefer higher alignment. Maybe GV->setAlignment(M->getDataLayout().getPrefTypeAlign(getInt8Ty())); would be okay, though.

When creating global strings, some targets have requirements that need to
be taken into account. Previously, the global strings created by
`IRBuilder::createGlobalString` had a hard-coded alignment of `1`.

This commit makes it so that the alignment is taken from the data layout
instead, giving targets the chance to align global strings according to
their preferences.
This commit adds a test to the `clang` test suite for the SystemZ backend
that checks for the correct alignment of global strings created by the
PrintFOptimizer for the case `printf("foo\n")` -> `puts("foo")`.
@dominik-steenken dominik-steenken force-pushed the global-string-alignment branch from f646713 to 59f7f10 Compare June 6, 2025 08:43
@dominik-steenken
Copy link
Contributor Author

dominik-steenken commented Jun 6, 2025

Ok. As far as i can tell,

M->getDataLayout().getPrefTypeAlign(getInt8Ty())

should be functionally equivalent to

M->getDataLayout().getPreferredAlign(GV)

for strings, with the exception that the latter potentially preserves a preexisting alignment on the GV, which is not relevant for this case, right?

In any case, i've reverted my changes to copy the alignment, and changed IRBuilder::createGlobalString in the suggested way.

@uweigand
Copy link
Member

uweigand commented Jun 6, 2025

The clang patch was written the way it was because it was necessary to comply with the ABI rules. Strings passed to printf don't have any sort of alignment requirement, so you can't really appeal to the ABI rules here, I think?

Just to be clear, the ABI requirement is that every symbol has to be at least 2-byte aligned. The problem is not passing the address of the string to printf, the problem is loading that address into a register in the first place. The most efficient way to load an address uses a PC-relative load address instruction, and this requires that the target symbol is 2-byte aligned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:SystemZ clang Clang issues not falling into any other category llvm:ir llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants