Skip to content

[Clang] Enable -fpointer-tbaa by default. #117244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 4, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Nov 21, 2024

Support for more precise TBAA metadata has been added a while ago (behind the -fpointer-tbaa flag). The more precise TBAA metadata allows treating accesses of different pointer types as no-alias.

This helps to remove more redundant loads and stores in a number of workloads.

Some highlights on the impact across llvm-test-suite's MultiSource, SPEC2006 & SPEC2017 include:

  • +2% more NoAlias results for memory accesses
  • +3% more stores removed by DSE,
  • +4% more loops vectorized.

This closes a relatively big gap to GCC, which has been supporting disambiguating based on pointer types for a long time. (https://clang.godbolt.org/z/K7Wbhrz4q)

Pointer-TBAA support for pointers to builtin types has been added in #76612.

Support for user-defined types has been added in
#110569.

There are 2 recent PRs with bug fixes for special cases uncovered during some of my testing:

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:openmp OpenMP related changes to Clang labels Nov 21, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2024

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Florian Hahn (fhahn)

Changes

Support for more precise TBAA metadata has been added a while ago (behind the -fpointer-tbaa flag). The more precise TBAA metadata allows treating accesses of different pointer types as no-alias.

This helps to remove more redundant loads and stores in a number of workloads.

Some highlights on the impact across llvm-test-suite's MultiSource, SPEC2006 & SPEC2017 include:

  • +2% more NoAlias results for memory accesses
  • +3% more stores removed by DSE,
  • +4% more loops vectorized.

This closes a relatively big gap to GCC, which has been supporting disambiguating based on pointer types for a long time. (https://clang.godbolt.org/z/K7Wbhrz4q)

Pointer-TBAA support for pointers to builtin types has been added in #76612.

Support for user-defined types has been added in
#110569.

There are 2 recent PRs with bug fixes for special cases uncovered during some of my testing:


Patch is 142.42 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117244.diff

11 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+4)
  • (modified) clang/include/clang/Basic/CodeGenOptions.def (+1-1)
  • (modified) clang/include/clang/Driver/Options.td (+6-3)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+2-2)
  • (modified) clang/test/CodeGen/attr-counted-by.c (+28-28)
  • (modified) clang/test/CodeGen/tbaa-pointers.c (+103-103)
  • (modified) clang/test/CodeGen/tbaa-reference.cpp (+4-4)
  • (modified) clang/test/CodeGenCXX/template-instantiation.cpp (+1)
  • (modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (+80-73)
  • (modified) clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp (+222-222)
  • (modified) clang/unittests/CodeGen/TBAAMetadataTest.cpp (+9-4)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 8c81de341937ca..f42e970f0e07f6 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -211,6 +211,10 @@ C++ Language Changes
 - The builtin type alias ``__builtin_common_type`` has been added to improve the
   performance of ``std::common_type``.
 
+- Clang now emits distinct type-based alias analysis tags for incompatible
+  pointers by default, enabling more powerful alias analysis when accessing
+  pointer types. The new default behavior can be disabled using ``-fno-pointer-tbaa``.
+
 C++2c Feature Support
 ^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index 4cf22c4ee08ce0..0f4ed13d5f3d8c 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -242,7 +242,7 @@ ENUM_CODEGENOPT(StructReturnConvention, StructReturnConventionKind, 2, SRCK_Defa
 
 CODEGENOPT(RelaxAll          , 1, 0) ///< Relax all machine code instructions.
 CODEGENOPT(RelaxedAliasing   , 1, 0) ///< Set when -fno-strict-aliasing is enabled.
-CODEGENOPT(PointerTBAA, 1, 0)        ///< Whether or not to use distinct TBAA tags for pointers.
+CODEGENOPT(PointerTBAA       , 1, 1) ///< Whether or not to use distinct TBAA tags for pointers.
 CODEGENOPT(StructPathTBAA    , 1, 0) ///< Whether or not to use struct-path TBAA.
 CODEGENOPT(NewStructPathTBAA , 1, 0) ///< Whether or not to use enhanced struct-path TBAA.
 CODEGENOPT(SaveTempLabels    , 1, 0) ///< Save temporary labels.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 5167c3c39e315a..35cdf789598936 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -7361,9 +7361,12 @@ def fuse_register_sized_bitfield_access: Flag<["-"], "fuse-register-sized-bitfie
 def relaxed_aliasing : Flag<["-"], "relaxed-aliasing">,
   HelpText<"Turn off Type Based Alias Analysis">,
   MarshallingInfoFlag<CodeGenOpts<"RelaxedAliasing">>;
-def pointer_tbaa: Flag<["-"], "pointer-tbaa">,
-  HelpText<"Turn on Type Based Alias Analysis for pointer accesses">,
-  MarshallingInfoFlag<CodeGenOpts<"PointerTBAA">>;
+defm pointer_tbaa: BoolOption<"", "pointer-tbaa", CodeGenOpts<"PointerTBAA">, 
+DefaultTrue,
+  PosFlag<SetTrue, [], [ClangOption], "Enable">,
+  NegFlag<SetFalse, [], [ClangOption], "Disable">,
+  BothFlags<[], [ClangOption], " that single precision floating-point divide and sqrt used in ">>
+  ;
 def no_struct_path_tbaa : Flag<["-"], "no-struct-path-tbaa">,
   HelpText<"Turn off struct-path aware Type Based Alias Analysis">,
   MarshallingInfoNegativeFlag<CodeGenOpts<"StructPathTBAA">>;
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index d3eec9fea0d498..affe63642d2ffa 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -5935,9 +5935,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
   if (!Args.hasFlag(options::OPT_fstrict_aliasing, StrictAliasingAliasOption,
                     options::OPT_fno_strict_aliasing, !IsWindowsMSVC))
     CmdArgs.push_back("-relaxed-aliasing");
-  if (Args.hasFlag(options::OPT_fpointer_tbaa, options::OPT_fno_pointer_tbaa,
+  if (Args.hasFlag(options::OPT_fno_pointer_tbaa, options::OPT_fpointer_tbaa,
                    false))
-    CmdArgs.push_back("-pointer-tbaa");
+    CmdArgs.push_back("-no-pointer-tbaa");
   if (!Args.hasFlag(options::OPT_fstruct_path_tbaa,
                     options::OPT_fno_struct_path_tbaa, true))
     CmdArgs.push_back("-no-struct-path-tbaa");
diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index f70e552bca26ab..d62d1c412f1a31 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -1137,7 +1137,7 @@ struct test13_bar {
 // SANITIZE-WITH-ATTR:       cont5:
 // SANITIZE-WITH-ATTR-NEXT:    [[REVMAP:%.*]] = getelementptr inbounds nuw i8, ptr [[TMP0]], i64 16
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x ptr], ptr [[REVMAP]], i64 0, i64 [[INDEX]]
-// SANITIZE-WITH-ATTR-NEXT:    store ptr null, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA14:![0-9]+]]
+// SANITIZE-WITH-ATTR-NEXT:    store ptr null, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA15:![0-9]+]]
 // SANITIZE-WITH-ATTR-NEXT:    ret i32 0
 //
 // NO-SANITIZE-WITH-ATTR-LABEL: define dso_local noundef i32 @test13(
@@ -1146,7 +1146,7 @@ struct test13_bar {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr @test13_f, align 8, !tbaa [[TBAA8:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[REVMAP:%.*]] = getelementptr inbounds nuw i8, ptr [[TMP0]], i64 16
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x ptr], ptr [[REVMAP]], i64 0, i64 [[INDEX]]
-// NO-SANITIZE-WITH-ATTR-NEXT:    store ptr null, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA11:![0-9]+]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    store ptr null, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA12:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret i32 0
 //
 // SANITIZE-WITHOUT-ATTR-LABEL: define dso_local noundef i32 @test13(
@@ -1164,7 +1164,7 @@ struct test13_bar {
 // SANITIZE-WITHOUT-ATTR:       cont5:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[REVMAP:%.*]] = getelementptr inbounds nuw i8, ptr [[TMP0]], i64 16
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x ptr], ptr [[REVMAP]], i64 0, i64 [[INDEX]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    store ptr null, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA14:![0-9]+]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    store ptr null, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA15:![0-9]+]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret i32 0
 //
 // NO-SANITIZE-WITHOUT-ATTR-LABEL: define dso_local noundef i32 @test13(
@@ -1173,7 +1173,7 @@ struct test13_bar {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr @test13_f, align 8, !tbaa [[TBAA8:![0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[REVMAP:%.*]] = getelementptr inbounds nuw i8, ptr [[TMP0]], i64 16
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x ptr], ptr [[REVMAP]], i64 0, i64 [[INDEX]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    store ptr null, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA11:![0-9]+]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    store ptr null, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA12:![0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret i32 0
 //
 int test13(long index) {
@@ -1467,7 +1467,7 @@ int test24(int c, struct tests_foo *var) {
 // SANITIZE-WITH-ATTR-LABEL: define dso_local i32 @test25(
 // SANITIZE-WITH-ATTR-SAME: i32 noundef [[C:%.*]], ptr noundef [[VAR:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITH-ATTR-NEXT:  entry:
-// SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[VAR]], align 8, !tbaa [[TBAA14]]
+// SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[VAR]], align 8, !tbaa [[TBAA17:![0-9]+]]
 // SANITIZE-WITH-ATTR-NEXT:    [[DOTCOUNTED_BY_LOAD:%.*]] = load i32, ptr [[TMP0]], align 4
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = icmp ugt i32 [[DOTCOUNTED_BY_LOAD]], 10
 // SANITIZE-WITH-ATTR-NEXT:    br i1 [[TMP1]], label [[CONT5:%.*]], label [[HANDLER_OUT_OF_BOUNDS:%.*]], !prof [[PROF3]], !nosanitize [[META2]]
@@ -1482,7 +1482,7 @@ int test24(int c, struct tests_foo *var) {
 // NO-SANITIZE-WITH-ATTR-LABEL: define dso_local i32 @test25(
 // NO-SANITIZE-WITH-ATTR-SAME: i32 noundef [[C:%.*]], ptr nocapture noundef readonly [[VAR:%.*]]) local_unnamed_addr #[[ATTR8:[0-9]+]] {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[VAR]], align 8, !tbaa [[TBAA11]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[VAR]], align 8, !tbaa [[TBAA14:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 44
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret i32 [[TMP1]]
@@ -1490,7 +1490,7 @@ int test24(int c, struct tests_foo *var) {
 // SANITIZE-WITHOUT-ATTR-LABEL: define dso_local i32 @test25(
 // SANITIZE-WITHOUT-ATTR-SAME: i32 noundef [[C:%.*]], ptr noundef [[VAR:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[VAR]], align 8, !tbaa [[TBAA14]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[VAR]], align 8, !tbaa [[TBAA17:![0-9]+]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 44
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret i32 [[TMP1]]
@@ -1498,7 +1498,7 @@ int test24(int c, struct tests_foo *var) {
 // NO-SANITIZE-WITHOUT-ATTR-LABEL: define dso_local i32 @test25(
 // NO-SANITIZE-WITHOUT-ATTR-SAME: i32 noundef [[C:%.*]], ptr nocapture noundef readonly [[VAR:%.*]]) local_unnamed_addr #[[ATTR7:[0-9]+]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[VAR]], align 8, !tbaa [[TBAA11]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[VAR]], align 8, !tbaa [[TBAA14:![0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 44
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret i32 [[TMP1]]
@@ -1600,7 +1600,7 @@ struct test27_foo {
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ENTRIES:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 24
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x ptr], ptr [[ENTRIES]], i64 0, i64 [[IDXPROM]]
-// SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA14]]
+// SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA19:![0-9]+]]
 // SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM4:%.*]] = sext i32 [[J]] to i64
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds [[STRUCT_TEST27_BAR:%.*]], ptr [[TMP2]], i64 [[IDXPROM4]]
 // SANITIZE-WITH-ATTR-NEXT:    ret ptr [[ARRAYIDX5]]
@@ -1611,7 +1611,7 @@ struct test27_foo {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ENTRIES:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 24
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[I]] to i64
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x ptr], ptr [[ENTRIES]], i64 0, i64 [[IDXPROM]]
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA11]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA16:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM1:%.*]] = sext i32 [[J]] to i64
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds [[STRUCT_TEST27_BAR:%.*]], ptr [[TMP0]], i64 [[IDXPROM1]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret ptr [[ARRAYIDX2]]
@@ -1622,7 +1622,7 @@ struct test27_foo {
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ENTRIES:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 24
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[I]] to i64
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x ptr], ptr [[ENTRIES]], i64 0, i64 [[IDXPROM]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA14]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA19:![0-9]+]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM3:%.*]] = sext i32 [[J]] to i64
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX4:%.*]] = getelementptr inbounds [[STRUCT_TEST27_BAR:%.*]], ptr [[TMP0]], i64 [[IDXPROM3]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret ptr [[ARRAYIDX4]]
@@ -1633,7 +1633,7 @@ struct test27_foo {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ENTRIES:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 24
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[I]] to i64
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x ptr], ptr [[ENTRIES]], i64 0, i64 [[IDXPROM]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA11]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA16:![0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM1:%.*]] = sext i32 [[J]] to i64
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds [[STRUCT_TEST27_BAR:%.*]], ptr [[TMP0]], i64 [[IDXPROM1]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret ptr [[ARRAYIDX2]]
@@ -1651,9 +1651,9 @@ struct test28_foo {
 // SANITIZE-WITH-ATTR-LABEL: define dso_local i32 @test28(
 // SANITIZE-WITH-ATTR-SAME: ptr noundef [[P:%.*]], i32 noundef [[I:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITH-ATTR-NEXT:  entry:
-// SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[P]], align 8, !tbaa [[TBAA14]]
-// SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[TMP0]], align 8, !tbaa [[TBAA14]]
-// SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[TMP1]], align 8, !tbaa [[TBAA14]]
+// SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[P]], align 8, !tbaa [[TBAA21:![0-9]+]]
+// SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[TMP0]], align 8, !tbaa [[TBAA21]]
+// SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[TMP1]], align 8, !tbaa [[TBAA21]]
 // SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[I]] to i64
 // SANITIZE-WITH-ATTR-NEXT:    [[DOTCOUNTED_BY_GEP:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 8
 // SANITIZE-WITH-ATTR-NEXT:    [[DOTCOUNTED_BY_LOAD:%.*]] = load i32, ptr [[DOTCOUNTED_BY_GEP]], align 4
@@ -1672,9 +1672,9 @@ struct test28_foo {
 // NO-SANITIZE-WITH-ATTR-LABEL: define dso_local i32 @test28(
 // NO-SANITIZE-WITH-ATTR-SAME: ptr nocapture noundef readonly [[P:%.*]], i32 noundef [[I:%.*]]) local_unnamed_addr #[[ATTR8]] {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[P]], align 8, !tbaa [[TBAA11]]
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[TMP0]], align 8, !tbaa [[TBAA11]]
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[TMP1]], align 8, !tbaa [[TBAA11]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[P]], align 8, !tbaa [[TBAA18:![0-9]+]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[TMP0]], align 8, !tbaa [[TBAA18]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[TMP1]], align 8, !tbaa [[TBAA18]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARR:%.*]] = getelementptr inbounds nuw i8, ptr [[TMP2]], i64 12
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[I]] to i64
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARR]], i64 0, i64 [[IDXPROM]]
@@ -1684,9 +1684,9 @@ struct test28_foo {
 // SANITIZE-WITHOUT-ATTR-LABEL: define dso_local i32 @test28(
 // SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i32 noundef [[I:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[P]], align 8, !tbaa [[TBAA14]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[TMP0]], align 8, !tbaa [[TBAA14]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[TMP1]], align 8, !tbaa [[TBAA14]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[P]], align 8, !tbaa [[TBAA21:![0-9]+]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[TMP0]], align 8, !tbaa [[TBAA21]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[TMP1]], align 8, !tbaa [[TBAA21]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARR:%.*]] = getelementptr inbounds nuw i8, ptr [[TMP2]], i64 12
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[I]] to i64
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARR]], i64 0, i64 [[IDXPROM]]
@@ -1696,9 +1696,9 @@ struct test28_foo {
 // NO-SANITIZE-WITHOUT-ATTR-LABEL: define dso_local i32 @test28(
 // NO-SANITIZE-WITHOUT-ATTR-SAME: ptr nocapture noundef readonly [[P:%.*]], i32 noundef [[I:%.*]]) local_unnamed_addr #[[ATTR7]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[P]], align 8, !tbaa [[TBAA11]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[TMP0]], align 8, !tbaa [[TBAA11]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[TMP1]], align 8, !tbaa [[TBAA11]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[P]], align 8, !tbaa [[TBAA18:![0-9]+]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[TMP0]], align 8, !tbaa [[TBAA18]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[TMP1]], align 8, !tbaa [[TBAA18]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARR:%.*]] = getelementptr inbounds nuw i8, ptr [[TMP2]], i64 12
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[I]] to i64
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARR]], i64 0, i64 [[IDXPROM]]
@@ -1727,7 +1727,7 @@ struct annotated_struct_array {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [10 x ptr], ptr [[ANN]], i64 0, i64 [[TMP1]]
-// SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA14]]
+// SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA23:![0-9]+]]
 // SANITIZE-WITH-ATTR-NEXT:    [[DOT_COUNTED_BY_GEP:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 8
 // SANITIZE-WITH-ATTR-NEXT:    [[DOT_COUNTED_BY_LOAD:%.*]] = load i32, ptr [[DOT_COUNTED_BY_GEP]], align 4
 // SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM15:%.*]] = sext i32 [[IDX2]] to i64
@@ -1751,7 +1751,7 @@ struct annotated_struct_array {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[IDX1]] to i64
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [10 x ptr], ptr [[ANN]], i64 0, i64 [[IDXPROM]]
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA11]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA20:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[DOT_COUNTED_BY_GEP:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 8
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[DOT_COUNTED_BY_LOAD:%.*]] = load i32, ptr [[DOT_COUNTED_BY_GEP]], align 4
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = shl i32 [[DOT_COUNTED_BY_LOAD]], 2
@@ -1774,7 +1774,7 @@ struct annotated_struct_array {
 // SANITIZE-WITHOUT-ATTR-NEXT:    unreachable, !nosanitize [[META9]]
 // SANITIZE-WITHOUT-ATTR:       cont21:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [10 x ptr], ptr [[ANN]], i64 0, i64 [[TMP1]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA14]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA23:![0-9]+]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[TMP2]], i64 12
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM18:%.*]] = sext i32 [[IDX2]] to i64
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX19:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM18]]
@@ -1786,7 +1786,7 @@ struct annotated_struct_array {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[IDX1]] to i64
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [10 x ptr], ptr [[ANN]], i64 0, i64 [[IDXPROM]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA11]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:...
[truncated]

Copy link

github-actions bot commented Nov 21, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 157d847ba737b4136aeb1d92912f549ea1c96d4c 1222bbbd8afe57b870f6b91a40c2324279f91949 --extensions cpp,c -- clang/lib/Driver/ToolChains/Clang.cpp clang/test/CodeGen/attr-counted-by.c clang/test/CodeGen/tbaa-pointers.c clang/test/CodeGen/tbaa-reference.cpp clang/test/CodeGenCXX/template-instantiation.cpp clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp clang/unittests/CodeGen/TBAAMetadataTest.cpp
View the diff from clang-format here.
diff --git a/clang/unittests/CodeGen/TBAAMetadataTest.cpp b/clang/unittests/CodeGen/TBAAMetadataTest.cpp
index cad8783ea7..adc0218fea 100644
--- a/clang/unittests/CodeGen/TBAAMetadataTest.cpp
+++ b/clang/unittests/CodeGen/TBAAMetadataTest.cpp
@@ -39,11 +39,8 @@ auto OmnipotentCharC = MMTuple(
   MConstInt(0, 64)
 );
 
-auto AnyPtr = MMTuple(
-  MMString("any pointer"),
-  OmnipotentCharC,
-  MConstInt(0, 64)
-);
+auto AnyPtr =
+    MMTuple(MMString("any pointer"), OmnipotentCharC, MConstInt(0, 64));
 
 auto OmnipotentCharCXX = MMTuple(
   MMString("omnipotent char"),
@@ -116,28 +113,19 @@ TEST(TBAAMetadataTest, BasicTypes) {
           MConstInt(0))));
   ASSERT_TRUE(I);
 
-  I = matchNext(I,
+  I = matchNext(
+      I,
       MInstruction(Instruction::Store,
-        MValType(PointerType::getUnqual(Compiler.Context)),
-        MMTuple(
-          MMTuple(
-            MMString("p1 void"),
-            AnyPtr,
-            MConstInt(0)),
-          MSameAs(0),
-          MConstInt(0))));
+                   MValType(PointerType::getUnqual(Compiler.Context)),
+                   MMTuple(MMTuple(MMString("p1 void"), AnyPtr, MConstInt(0)),
+                           MSameAs(0), MConstInt(0))));
   ASSERT_TRUE(I);
 
-  I = matchNext(I,
-      MInstruction(Instruction::Store,
-        MValType(PointerType::getUnqual(Compiler.Context)),
-        MMTuple(
-          MMTuple(
-            MMString("p1 int"),
-            AnyPtr,
-            MConstInt(0)),
-          MSameAs(0),
-          MConstInt(0))));
+  I = matchNext(
+      I, MInstruction(Instruction::Store,
+                      MValType(PointerType::getUnqual(Compiler.Context)),
+                      MMTuple(MMTuple(MMString("p1 int"), AnyPtr, MConstInt(0)),
+                              MSameAs(0), MConstInt(0))));
   ASSERT_TRUE(I);
 }
 

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me. Since it's standards-compliant, has a specific opt-out flag, and is overridden by the general -fno-strict-aliasing flag, I don't see any need for the RFC process. Still, you should definitely wait for more feedback, especially from @AaronBallman.

This is not a prerequisite for landing the patch — or a requirement at all, for that matter — but I think it'd be great if you could make a sort of capstone post on the forums talking about this work and your performance analysis in more detail.

@fhahn fhahn force-pushed the clang-enable-pointer-tbaa branch from 58f3bcf to 1222bbb Compare November 22, 2024 10:19
@fhahn
Copy link
Contributor Author

fhahn commented Nov 22, 2024

This seems reasonable to me. Since it's standards-compliant, has a specific opt-out flag, and is overridden by the general -fno-strict-aliasing flag, I don't see any need for the RFC process. Still, you should definitely wait for more feedback, especially from @AaronBallman.

This is not a prerequisite for landing the patch — or a requirement at all, for that matter — but I think it'd be great if you could make a sort of capstone post on the forums talking about this work and your performance analysis in more detail.

Thanks @rjmccall, I'll try to write something up, although it might take a few weeks.

@AaronBallman
Copy link
Collaborator

What are the chances that this will exploit undefined behavior in existing user code in ways that will make upgrading to the latest Clang more difficult because of the perception of "miscompiles?" If this is a likely scenario for users to hit, do other tools like UBSan (etc) help users to find and fix the issues?

I'm not opposed to the changes, I just want to make sure we are setting users up for success when they upgrade.

@jcranmer-intel
Copy link
Contributor

do other tools like UBSan (etc) help users to find and fix the issues?

We don't yet have a reliable tool for sanitizing TBAA failures. There is a project to do that (the TypeSanitizer, see https://discourse.llvm.org/t/reviving-typesanitizer-a-sanitizer-to-catch-type-based-aliasing-violations/66092), but the last time I tried using the TypeSanitizer, it wasn't ready for production use.

@fhahn
Copy link
Contributor Author

fhahn commented Nov 22, 2024

What are the chances that this will exploit undefined behavior in existing user code in ways that will make upgrading to the latest Clang more difficult because of the perception of "miscompiles?" If this is a likely scenario for users to hit, do other tools like UBSan (etc) help users to find and fix the issues?

As @jcranmer-intel mentioned, there is work on a type sanitizer and there are multiple people interested in making improvements to it. Hopefully we will be able to get an initial version in main soon. At the moment, the instrumentation is very expensive and there are some false positives I believe, which limits the usefulness for now.

However the existing -fno-pointer-tbaa should make it relatively easy for our users to check if a mis-compile is due to the new pointer TBAA metadata (which would imply a strict alias violation in the code or bug in an optimization). It might be worth calling this out in the Potentially Breaking Changes section?

Given that cutting the release branch is still ~2 months away, I am hoping that enabling this now we will get this exposed to a large number of users before going in the release candidate phase, which should give us a good amount of signal if this is going to be disruptive to a larger number of users. If that's the case, there would be plenty of time to restore to original behavior in time for the Clang 20 release.

Another thing to note is that GCC (and I think other commercial compilers) already use this information in their optimizers, so it is probably likely that users of those compilers would already have any existing violations exposed.

@fhahn
Copy link
Contributor Author

fhahn commented Nov 22, 2024

I'd also like to share a bit more info about the expected compile-time impact of the change:

  • stage1-O3 0.01%,
  • stage1-ReleaseThinLTO +0.04%,
  • stage1-ReleaseLTO-g +0.07% ,
  • stage2-O3 -0.07%
  • clang build time +0.13%

Full data: https://llvm-compile-time-tracker.com/compare.php?from=a62e1c8eddcda420abec57976dc48f97669277dc&to=570c7e85ef7f3479969f9f970a0086ff9c04036f&stat=instructions

TLDR: While there are some increase in compile-time, those are caused by a number of additional optimizations. The increase in clang build time yields a slightly faster stage2 clang while doing more optimizations.

Lets take a closer look at some of the increases in compile-time.

The statistics data below is for ARM64 macOS not X86 Linux like the compile-time tracker, but most impact is from IR optimizations, so I don’t expect the data to be vastly different.

For stage1-O3, the biggest increase is Bullet +0.39%. This boils down to being able to apply more transformations, most notably probably +70% more loops being vectorized, which itself adds compile-time, as well as adding extra code to simplify.

Top Statistic Changes for Bullet
loop-simplifycfg.NumLoopBlocksDeleted 6.0 -> 26.0 +333.33%
aarch64-ccmp.NumImmRangeRejs 66.0 -> 131.0 +98.48%
loop-vectorize.LoopsVectorized 94.0 -> 160.0 +70.21%
dse.NumRedundantStores 25.0 -> 39.0 +56.00%
aarch64-ldst-opt.NumPostFolded 228.0 -> 355.0 +55.70%
codegenprepare.NumPHIsElim 17.0 -> 25.0 +47.06%
loop-simplifycfg.NumTerminatorsFolded 9.0 -> 13.0 +44.44%
correlated-value-propagation.NumPhis 306.0 -> 423.0 +38.24%
dagcombine.PostIndexedNodes 343.0 -> 473.0 +37.90%
loop-idiom.NumMemSet 51.0 -> 66.0 +29.41%

For stage1-ReleaseLTO-g the biggest increase is kimwitu++ +0.7%. The input bitcode size after merging all modules before LTO is the same. Looking at the impact on the transformations, we optimize quite a bit more around memory instructions, so likely the main contributors to compile-time being GVN, DSE and LICM.

Top Statistic Changes for kimwitu++
loop-idiom.NumMemSet 1.0 -> 3.0 +200.00%
licm.NumLoadStorePromoted 27.0 -> 75.0 +177.78%
dse.NumFastStores 129.0 -> 285.0 +120.93%
dse.NumGetDomMemoryDefPassed 145.0 -> 301.0 +107.59%
globalsmodref-aa.NumIndirectGlobalVars 1.0 -> 2.0 +100.00%
licm.NumPromotionCandidates 121.0 -> 169.0 +39.67%
capture-tracking.NumNotCapturedBefore 141.0 -> 189.0 +34.04%
instsimplify.NumSimplified 494.0 -> 626.0 +26.72%
gvn.NumGVNSimpl 701.0 -> 833.0 +18.83%
gvn.NumPRELoadMoved2CEPred 13.0 -> 15.0 +15.38%
lcssa.NumLCSSA 2820.0 -> 3252.0 +15.32%
licm.NumMovedLoads 15.0 -> 17.0 +13.33%
early-cse.NumCSE 1224.0 -> 1344.0 +9.80%
gvn.IsValueFullyAvailableInBlockNumSpeculationsMax 114.0 -> 125.0 +9.65%
early-cse.NumCSELoad 3055.0 -> 3291.0 +7.73%
instcombine.NegatorMaxTotalValuesVisited 14.0 -> 15.0 +7.14%
dse.NumDomMemDefChecks 6005.0 -> 6327.0 +5.36%

Another increase to highlight is clang build time (+0.13%). I didn’t collect stats for this one, but it looks like the extra optimizations make the stage2 Clang about 0.08% faster on CTMark (stage1-O3 +0.01% vs stage2-O3 -0.07%), while doing a number of additional transformations.

@fhahn
Copy link
Contributor Author

fhahn commented Dec 3, 2024

ping :) @AaronBallman WDYT re making this clear in the release notes (also that there's an easy way to disable) + the ongoing work in parallel for the type sanitizer?

@AaronBallman
Copy link
Collaborator

ping :) @AaronBallman WDYT re making this clear in the release notes (also that there's an easy way to disable) + the ongoing work in parallel for the type sanitizer?

I think it's a reasonable experiment to try.

I would call it a potentially breaking change because it's possible this will quietly change code behavior, though.

fhahn added 2 commits December 3, 2024 19:10
Support for more precise TBAA metadata has been added a while ago
(behind the -fpointer-tbaa flag). The more precise TBAA metadata allows
treating accesses of different pointer types as no-alias.

This helps to remove more redundant loads and stores in a number of
workloads.

Some highlights on the impact across llvm-test-suite's MultiSource,
SPEC2006 & SPEC2017 include:
 * +2% more NoAlias results for memory access
 * +4% more loops vectorized
 * +3% more stores removed by DSE.

This closes a relatively big gap to GCC, which has been supporting
disambiguating based on pointer types for a long time.
(https://clang.godbolt.org/z/K7Wbhrz4q)

Pointer-TBAA support for pointers to builtin types has been added in
llvm#76612.

Support for user-defined types has been added in
llvm#110569.

There are 2 pending PRs with bug fixes for special cases uncovered
during some of my testing:
 * llvm#116991
 * llvm#116596
@fhahn fhahn force-pushed the clang-enable-pointer-tbaa branch from 1222bbb to fa436bc Compare December 3, 2024 19:34
@fhahn
Copy link
Contributor Author

fhahn commented Dec 3, 2024

ping :) @AaronBallman WDYT re making this clear in the release notes (also that there's an easy way to disable) + the ongoing work in parallel for the type sanitizer?

I think it's a reasonable experiment to try.

I would call it a potentially breaking change because it's possible this will quietly change code behavior, though.

Thanks, I moved the release note to the 'Potentially Breaking Changes' section and included a sentence that this may silently change code behavior for code that has strict-aliasing violations.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@mizvekov mizvekov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a strict improvement over the general 'fstrict-aliasing' mode we already have, and I see little value in maintaining separate levels of strict aliasing conformance long term.

I think it would not be a good situation for users to rely on some level of strict aliasing violations, by clang accepting some bugs, rejecting others, without a clear line separating them.

I'd favor not having the flag, with 'fno-strict-aliasing' still being enforced.
I don't have strong objections against the new flag, if it's only temporary, with a clear goalpost for removing it.

@fhahn
Copy link
Contributor Author

fhahn commented Dec 4, 2024

I think this is a strict improvement over the general 'fstrict-aliasing' mode we already have, and I see little value in maintaining separate levels of strict aliasing conformance long term.

I think it would not be a good situation for users to rely on some level of strict aliasing violations, by clang accepting some bugs, rejecting others, without a clear line separating them.

I'd favor not having the flag, with 'fno-strict-aliasing' still being enforced. I don't have strong objections against the new flag, if it's only temporary, with a clear goalpost for removing it.

I agree that long-term we should remove the separate flag, maybe after one or two releases. Initially I expect it will help users during the transition phase (and also us during bug triage to quickly identify if an issue is related to the new behavior)

@fhahn fhahn merged commit 7954a05 into llvm:main Dec 4, 2024
8 of 9 checks passed
@fhahn fhahn deleted the clang-enable-pointer-tbaa branch December 4, 2024 20:55
@bgra8
Copy link
Contributor

bgra8 commented Dec 9, 2024

@fhahn we have identified a bunch of tests at google that fail somewhere in the open source library https://code.videolan.org/videolan/x264. The x264 is built with -fstrict-aliasing. All tests pass when removing -fstrict-aliasing or disabling Pointer-TBAA (-fno-pointer-tbaa).

Is this expected? Are the two mutually exclusive?

@fhahn
Copy link
Contributor Author

fhahn commented Dec 9, 2024

@fhahn we have identified a bunch of tests at google that fail somewhere in the open source library https://code.videolan.org/videolan/x264. The x264 is built with -fstrict-aliasing. All tests pass when removing -fstrict-aliasing or disabling Pointer-TBAA (-fno-pointer-tbaa).

Is this expected? Are the two mutually exclusive?

This is likely caused by strict-aliasing violations in the source, could you share a reproducer or steps on how to reproduce?

@bgra8
Copy link
Contributor

bgra8 commented Dec 9, 2024

No reproducer yet. The dependency to x264 from the tests is several levels deep ...

jsji added a commit to intel/llvm that referenced this pull request Dec 22, 2024
[Clang] Enable -fpointer-tbaa by default.
(llvm/llvm-project#117244)
jsji added a commit to intel/llvm that referenced this pull request Dec 27, 2024
[Clang] Enable -fpointer-tbaa by default.
(llvm/llvm-project#117244)
fhahn pushed a commit to fhahn/llvm-project that referenced this pull request Jan 3, 2025
fhahn pushed a commit to fhahn/llvm-project that referenced this pull request Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants