Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TBAA] Don't emit pointer-tbaa for void pointers. #122116

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Jan 8, 2025

While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers:

int count_elements(void * values) {
  void **seq = values;
  int count;
  for (count = 0; seq && seq[count]; count++);
  return count;
}

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
#119099

This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now.

@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2025

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Florian Hahn (fhahn)

Changes

While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers:

int count_elements(void * values) {
  void **seq = values;
  int count;
  for (count = 0; seq && seq[count]; count++);
  return count;
}

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
#119099

This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now.


Full diff: https://github.com/llvm/llvm-project/pull/122116.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenTBAA.cpp (+8)
  • (modified) clang/test/CodeGen/tbaa-pointers.c (+3-10)
diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp
index 75e66bae79afdc..3f1a24791ddd81 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -226,6 +226,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
       PtrDepth++;
       Ty = Ty->getPointeeType()->getBaseElementTypeUnsafe();
     } while (Ty->isPointerType());
+
+    // While there are no special rules in the standards regarding void pointers
+    // and strict aliasing, emitting distinct tags for void pointers break some
+    // common idioms and there is no good alternative to re-write the code
+    // without strict-aliasing violations.
+    if (Ty->isVoidType())
+      return AnyPtr;
+
     assert(!isa<VariableArrayType>(Ty));
     // When the underlying type is a builtin type, we compute the pointee type
     // string recursively, which is implicitly more forgiving than the standards
diff --git a/clang/test/CodeGen/tbaa-pointers.c b/clang/test/CodeGen/tbaa-pointers.c
index 4aae2552f107a3..48adac503357f0 100644
--- a/clang/test/CodeGen/tbaa-pointers.c
+++ b/clang/test/CodeGen/tbaa-pointers.c
@@ -208,12 +208,9 @@ int void_ptrs(void **ptr) {
 // COMMON-LABEL: define i32 @void_ptrs(
 // COMMON-SAME: ptr noundef [[PTRA:%.+]])
 // COMMON:        [[PTR_ADDR:%.+]]  = alloca ptr, align 8
-// DISABLE-NEXT:  store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[ANYPTR]]
-// DISABLE-NEXT:  [[L0:%.+]] = load ptr, ptr  [[PTR_ADDR]], align 8, !tbaa  [[ANYPTR]]
-// DISABLE-NEXT:  [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa  [[ANYPTR]]
-// DEFAULT-NEXT:  store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[P2VOID:!.+]]
-// DEFAULT-NEXT:  [[L0:%.+]] = load ptr, ptr  [[PTR_ADDR]], align 8, !tbaa  [[P2VOID]]
-// DEFAULT-NEXT:  [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa  [[P1VOID:!.+]]
+// COMMON-NEXT:   store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[ANYPTR]]
+// COMMON-NEXT:   [[L0:%.+]] = load ptr, ptr  [[PTR_ADDR]], align 8, !tbaa  [[ANYPTR]]
+// COMMON-NEXT:   [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa  [[ANYPTR]]
 // COMMON-NEXT:   [[BOOL:%.+]] = icmp ne ptr [[L1]], null
 // COMMON-NEXT:   [[BOOL_EXT:%.+]] = zext i1 [[BOOL]] to i64
 // COMMON-NEXT:   [[COND:%.+]] = select i1 [[BOOL]], i32 0, i32 1
@@ -254,7 +251,3 @@ int void_ptrs(void **ptr) {
 // COMMON:  [[INT_TAG]] = !{[[INT_TY:!.+]], [[INT_TY]], i64 0}
 // COMMON:  [[INT_TY]] = !{!"int", [[CHAR]], i64 0}
 // DEFAULT: [[ANYPTR]] = !{[[ANY_POINTER]],  [[ANY_POINTER]], i64 0}
-// DEFAULT: [[P2VOID]] = !{[[P2VOID_TY:!.+]], [[P2VOID_TY]], i64 0}
-// DEFAULT: [[P2VOID_TY]] = !{!"p2 void", [[ANY_POINTER]], i64 0}
-// DEFAULT: [[P1VOID]] = !{[[P1VOID_TY:!.+]], [[P1VOID_TY]], i64 0}
-// DEFAULT: [[P1VOID_TY]] = !{!"p1 void", [[ANY_POINTER]], i64 0}

@pinskia
Copy link

pinskia commented Jan 9, 2025

Note GCC already treats void* as being compatiable with all other pointers and has since GCC 6 (when it started to disambiguates accesses to different pointers). I don't think it is documented though.
Another example where clang/LLVM handles void pointers differently from GCC:

extern void abort(void);
int f(void *a, int **b, void *c)
{
  void **e = a;
  int d = **b;
  *e = c;
  return d + **b;
}
int main()
{
  int d = 1;
  int ff = 0;
  int *e = &d;
  if (f(&e, &e, &ff) != 1)
    abort();
}

@pinskia
Copy link

pinskia commented Jan 9, 2025

Note GCC's testcase testsuite/gcc.dg/alias-14.c is testcase for this extension: "as an extension we consider void * universal. Writes to it should alias."

@fhahn
Copy link
Contributor Author

fhahn commented Jan 9, 2025

@pinskia thanks for sharing the GCC context!

@fhahn fhahn force-pushed the clang-pointer-tbaa-void branch from 0144b06 to 471ed86 Compare January 9, 2025 11:50
Copy link

github-actions bot commented Jan 9, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@rjmccall
Copy link
Contributor

rjmccall commented Jan 9, 2025

I agree that allowing void* l-values to alias arbitrary pointer objects is probably the right pragmatic choice. We should document it, though.

@pinskia, does GCC apply this recursively — e.g. are void** l-values treated specially in any way, or are they basically just char** for aliasing purposes?

@fhahn
Copy link
Contributor Author

fhahn commented Jan 9, 2025

What would be a good place to document this?

@pinskia
Copy link

pinskia commented Jan 9, 2025

@pinskia, does GCC apply this recursively — e.g. are void** l-values treated specially in any way, or are they basically just char** for aliasing purposes?

It looks like GCC is handling this as recusive in that a store to a void** location will conflict with a load from a other* location.

That is:

void f(void ***a, int **b, void **d)
{
  int *c = *b;
  *a = d;
  if (*b != c)
    __builtin_abort();
}

The conditional will not be optimized away.

fhahn added 2 commits January 9, 2025 19:30
While there are no special rules in the standards regarding
void pointers and strict aliasing, emitting distinct tags for void
pointers break some common idioms and there is no good alternative to
re-write the code without strict-aliasing violations. An example is
to count the entries in an array of pointers:

    int count_elements(void * values) {
      void **seq = values;
      int count;
      for (count = 0; seq && seq[count]; count++);
      return count;
    }

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
llvm#119099

This patch avoids emitting distinct tags for void pointers, to avoid
those idioms causing mis-compiles for now.
@fhahn
Copy link
Contributor Author

fhahn commented Jan 9, 2025

@pinskia, does GCC apply this recursively — e.g. are void** l-values treated specially in any way, or are they basically just char** for aliasing purposes?

It looks like GCC is handling this as recusive in that a store to a void** location will conflict with a load from a other* location.

That is:

void f(void ***a, int **b, void **d)
{
  int *c = *b;
  *a = d;
  if (*b != c)
    __builtin_abort();
}

The conditional will not be optimized away.

Same behavior in Clang with this patch

@fhahn fhahn force-pushed the clang-pointer-tbaa-void branch from 471ed86 to 3dcf901 Compare January 9, 2025 19:47
@rjmccall
Copy link
Contributor

rjmccall commented Jan 9, 2025

Okay, so if the ultimate pointee type is void, we're basically treating that as a generic pointer, no matter what the pointer depth is? I guess that makes sense.

What would be a good place to document this?

Hmm, I was hoping that we would have a section in the manual already about aliasing. That seems like a good thing to add, especially now that we've got a small family of options controlling it. Are you interested in working on that? It should probably go on the main manual page right before the section on PGO.

fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:codegen clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants