Skip to content

[TBAA] Don't emit pointer tbaa for unnamed structs or unions. #116596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 21, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Nov 18, 2024

For unnamed structs or unions, C's compatible types rule applies. Two compatible types in different compilation units can have different mangled names, meaning the metadata emitted below would incorrectly mark them as no-alias. Use AnyPtr for such types in both C and C++, as C and C++ types may be visible when doing LTO.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Nov 18, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 18, 2024

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Florian Hahn (fhahn)

Changes

For unnamed structs or unions, C's compatible types rule applies. Two compatible types in different compilation units can have different mangled names, meaning the metadata emitted below would incorrectly mark them as no-alias. Use AnyPtr for such types in both C and C++, as C and C++ types may be visible when doing LTO.


Full diff: https://github.com/llvm/llvm-project/pull/116596.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenTBAA.cpp (+9)
  • (modified) clang/test/CodeGen/tbaa-pointers.c (+1-4)
diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp
index c31579e8323174..95d9f893a2c7d1 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -230,6 +230,15 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
               ->getString();
       TyName = Name;
     } else {
+      // For unnamed structs or unions C's compatible types rule applies. Two
+      // compatible types in different compilation units can have different
+      // mangled names, meaning the metadata emitted below would incorrectly
+      // mark them as no-alias. Use AnyPtr for such types in both C and C++, as
+      // C and C++ types may be visible when doing LTO.
+      const auto *RT = Ty->getAs<RecordType>();
+      if (RT && !RT->getDecl()->getDeclName())
+        return AnyPtr;
+
       // For non-builtin types use the mangled name of the canonical type.
       llvm::raw_svector_ostream TyOut(TyName);
       MangleCtx->mangleCanonicalTypeName(QualType(Ty, 0), TyOut);
diff --git a/clang/test/CodeGen/tbaa-pointers.c b/clang/test/CodeGen/tbaa-pointers.c
index 9417a0e2f09e8c..068459f4dce118 100644
--- a/clang/test/CodeGen/tbaa-pointers.c
+++ b/clang/test/CodeGen/tbaa-pointers.c
@@ -190,8 +190,6 @@ typedef struct {
   int i1;
 } TypedefS;
 
-// FIXME: The !tbaa tag for unnamed structs doesn't account for compatible
-// types in C.
 void unamed_struct_typedef(TypedefS *ptr) {
 // COMMON-LABEL: define void @unamed_struct_typedef(
 // COMMON-SAME: ptr noundef [[PTRA:%.+]])
@@ -238,5 +236,4 @@ void unamed_struct_typedef(TypedefS *ptr) {
 // DEFAULT: [[S2_TY]]  = !{!"S2", [[ANY_POINTER]], i64 0}
 // COMMON:  [[INT_TAG]] = !{[[INT_TY:!.+]], [[INT_TY]], i64 0}
 // COMMON:  [[INT_TY]] = !{!"int", [[CHAR]], i64 0}
-// ENABLED: [[P1TYPEDEF]] = !{[[P1TYPEDEF_TY:!.+]],  [[P1TYPEDEF_TY]], i64 0}
-// ENABLED: [[P1TYPEDEF_TY]] = !{!"p1 _ZTS8TypedefS", [[ANY_POINTER]], i64 0}
+// ENABLED: [[P1TYPEDEF]] = !{[[ANY_POINTER]],  [[ANY_POINTER]], i64 0}

// compatible types in different compilation units can have different
// mangled names, meaning the metadata emitted below would incorrectly
// mark them as no-alias. Use AnyPtr for such types in both C and C++, as
// C and C++ types may be visible when doing LTO.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like this to note that

  • using AnyPtr is over-conservative here, and we could make this summarize the members as per the C compatibility rule in the future and
  • this also covers anonymous structs and unions, which have a different compatibility rule, but it doesn't matter because you can never have a pointer to an anonymous struct or union.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added the clarifications as suggested!

@fhahn fhahn force-pushed the tbaa-unnamed-structs branch from 87b0688 to ddd22cf Compare November 20, 2024 19:12
For unnamed structs or unions, C's compatible types rule applies.
Two compatible types in different compilation units can have different
mangled names, meaning the metadata emitted below would incorrectly mark
them as no-alias. Use AnyPtr for such types in both C and C++, as C and
C++ types may be visible when doing LTO.
@fhahn fhahn force-pushed the tbaa-unnamed-structs branch from ddd22cf to fe01d7e Compare November 21, 2024 20:08
// compatibility rule, but it doesn't matter because you can never have a
// pointer to an anonymous struct or union.
const auto *RT = Ty->getAs<RecordType>();
if (RT && !RT->getDecl()->getDeclName())
Copy link
Contributor

@rjmccall rjmccall Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This getAs<> is now redundant with the isRecordType() above; please do the getAs<> above and test RT there, and then you can assume you have a record here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

fhahn added a commit to fhahn/llvm-project that referenced this pull request Nov 21, 2024
Support for more precise TBAA metadata has been added a while ago
(behind the -fpointer-tbaa flag). The more precise TBAA metadata allows
treating accesses of different pointer types as no-alias.

This helps to remove more redundant loads and stores in a number of
workloads.

Some highlights on the impact across llvm-test-suite's MultiSource,
SPEC2006 & SPEC2017 include:
 * +2% more NoAlias results for memory access
 * +4% more loops vectorized
 * +3% more stores removed by DSE.

This closes a relatively big gap to GCC, which has been supporting
disambiguating based on pointer types for a long time.
(https://clang.godbolt.org/z/K7Wbhrz4q)

Pointer-TBAA support for pointers to builtin types has been added in
llvm#76612.

Support for user-defined types has been added in
llvm#110569.

There are 2 pending PRs with bug fixes for special cases uncovered
during some of my testing:
 * llvm#116991
 * llvm#116596
@fhahn fhahn merged commit 41a0c66 into llvm:main Nov 21, 2024
6 of 7 checks passed
@fhahn fhahn deleted the tbaa-unnamed-structs branch November 21, 2024 22:26
fhahn added a commit to fhahn/llvm-project that referenced this pull request Nov 22, 2024
Support for more precise TBAA metadata has been added a while ago
(behind the -fpointer-tbaa flag). The more precise TBAA metadata allows
treating accesses of different pointer types as no-alias.

This helps to remove more redundant loads and stores in a number of
workloads.

Some highlights on the impact across llvm-test-suite's MultiSource,
SPEC2006 & SPEC2017 include:
 * +2% more NoAlias results for memory access
 * +4% more loops vectorized
 * +3% more stores removed by DSE.

This closes a relatively big gap to GCC, which has been supporting
disambiguating based on pointer types for a long time.
(https://clang.godbolt.org/z/K7Wbhrz4q)

Pointer-TBAA support for pointers to builtin types has been added in
llvm#76612.

Support for user-defined types has been added in
llvm#110569.

There are 2 pending PRs with bug fixes for special cases uncovered
during some of my testing:
 * llvm#116991
 * llvm#116596
fhahn added a commit to fhahn/llvm-project that referenced this pull request Dec 3, 2024
Support for more precise TBAA metadata has been added a while ago
(behind the -fpointer-tbaa flag). The more precise TBAA metadata allows
treating accesses of different pointer types as no-alias.

This helps to remove more redundant loads and stores in a number of
workloads.

Some highlights on the impact across llvm-test-suite's MultiSource,
SPEC2006 & SPEC2017 include:
 * +2% more NoAlias results for memory access
 * +4% more loops vectorized
 * +3% more stores removed by DSE.

This closes a relatively big gap to GCC, which has been supporting
disambiguating based on pointer types for a long time.
(https://clang.godbolt.org/z/K7Wbhrz4q)

Pointer-TBAA support for pointers to builtin types has been added in
llvm#76612.

Support for user-defined types has been added in
llvm#110569.

There are 2 pending PRs with bug fixes for special cases uncovered
during some of my testing:
 * llvm#116991
 * llvm#116596
fhahn added a commit that referenced this pull request Dec 4, 2024
Support for more precise TBAA metadata has been added a while ago
(behind the -fpointer-tbaa flag). The more precise TBAA metadata allows
treating accesses of different pointer types as no-alias.

This helps to remove more redundant loads and stores in a number of
workloads.

Some highlights on the impact across llvm-test-suite's MultiSource,
SPEC2006 & SPEC2017 include:
 * +2% more NoAlias results for memory accesses
 * +3% more stores removed by DSE,
 * +4% more loops vectorized.

This closes a relatively big gap to GCC, which has been supporting
disambiguating based on pointer types for a long time.
(https://clang.godbolt.org/z/K7Wbhrz4q)

Pointer-TBAA support for pointers to builtin types has been added in
#76612.

Support for user-defined types has been added in
#110569.

There are 2 recent PRs with bug fixes for special cases uncovered during
testing:
 * #116991
 * #116596

PR: #117244
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 9, 2025
…16596)

For unnamed structs or unions, C's compatible types rule applies. Two
compatible types in different compilation units can have different
mangled names, meaning the metadata emitted below would incorrectly mark
them as no-alias. Use AnyPtr for such types in both C and C++, as C and
C++ types may be visible when doing LTO.

PR: llvm#116596
(cherry picked from commit 41a0c66)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants