Skip to content

[lld][macho] Strip .__uniq. and .llvm. hashes in -order_file #140670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

SharonXSharon
Copy link
Contributor

@SharonXSharon SharonXSharon commented May 20, 2025

/// Symbols can be appended with "(.__uniq.xxxx)?.llvm.yyyy" where "xxxx" and
/// "yyyy" are numbers that could change between builds. We need to use the root
/// symbol name before this suffix so these symbols can be matched with profiles
/// which may have different suffixes.

Just like what we are doing in BP, https://github.com/llvm/llvm-project/blob/main/lld/MachO/BPSectionOrderer.cpp#L127

the patch removes the suffixes when parsing the order file and getting the symbol priority to have a better symbol match.

@SharonXSharon SharonXSharon marked this pull request as ready for review May 20, 2025 03:49
@llvmbot
Copy link
Member

llvmbot commented May 20, 2025

@llvm/pr-subscribers-lld-elf
@llvm/pr-subscribers-lld-wasm
@llvm/pr-subscribers-lld-coff

@llvm/pr-subscribers-lld

Author: None (SharonXSharon)

Changes
/// Symbols can be appended with "(.__uniq.xxxx)?.llvm.yyyy" where "xxxx" and
/// "yyyy" are numbers that could change between builds. We need to use the root
/// symbol name before this suffix so these symbols can be matched with profiles
/// which may have different suffixes.

Just like what we are doing in BP, https://github.com/llvm/llvm-project/blob/main/lld/MachO/BPSectionOrderer.cpp#L127

the patch removes the suffixes when parsing the order file and getting the symbol priority to have a better symbol match.


Full diff: https://github.com/llvm/llvm-project/pull/140670.diff

3 Files Affected:

  • (modified) lld/MachO/SectionPriorities.cpp (+8-2)
  • (modified) lld/MachO/SectionPriorities.h (+5)
  • (added) lld/test/MachO/order-file-strip-hashes.s (+101)
diff --git a/lld/MachO/SectionPriorities.cpp b/lld/MachO/SectionPriorities.cpp
index 7a4a5d8465f64..213623b338472 100644
--- a/lld/MachO/SectionPriorities.cpp
+++ b/lld/MachO/SectionPriorities.cpp
@@ -245,12 +245,18 @@ DenseMap<const InputSection *, int> CallGraphSort::run() {
   return orderMap;
 }
 
+StringRef macho::PriorityBuilder::getRootSymbol(StringRef Name) {
+  auto [P0, S0] = Name.rsplit(".llvm.");
+  auto [P1, S1] = P0.rsplit(".__uniq.");
+  return P1;
+}
+
 std::optional<int>
 macho::PriorityBuilder::getSymbolPriority(const Defined *sym) {
   if (sym->isAbsolute())
     return std::nullopt;
 
-  auto it = priorities.find(sym->getName());
+  auto it = priorities.find(getRootSymbol(sym->getName()));
   if (it == priorities.end())
     return std::nullopt;
   const SymbolPriorityEntry &entry = it->second;
@@ -330,7 +336,7 @@ void macho::PriorityBuilder::parseOrderFile(StringRef path) {
         break;
       }
     }
-    symbol = line.trim();
+    symbol = getRootSymbol(line.trim());
 
     if (!symbol.empty()) {
       SymbolPriorityEntry &entry = priorities[symbol];
diff --git a/lld/MachO/SectionPriorities.h b/lld/MachO/SectionPriorities.h
index 44fb101990c51..0bbf238c2c116 100644
--- a/lld/MachO/SectionPriorities.h
+++ b/lld/MachO/SectionPriorities.h
@@ -69,6 +69,11 @@ class PriorityBuilder {
   std::optional<int> getSymbolPriority(const Defined *sym);
   llvm::DenseMap<llvm::StringRef, SymbolPriorityEntry> priorities;
   llvm::MapVector<SectionPair, uint64_t> callGraphProfile;
+  /// Symbols can be appended with "(.__uniq.xxxx)?.llvm.yyyy" where "xxxx" and
+  /// "yyyy" are numbers that could change between builds. We need to use the
+  /// root symbol name before this suffix so these symbols can be matched with
+  /// profiles which may have different suffixes.
+  llvm::StringRef getRootSymbol(llvm::StringRef Name);
 };
 
 extern PriorityBuilder priorityBuilder;
diff --git a/lld/test/MachO/order-file-strip-hashes.s b/lld/test/MachO/order-file-strip-hashes.s
new file mode 100644
index 0000000000000..d7e21371ad9ca
--- /dev/null
+++ b/lld/test/MachO/order-file-strip-hashes.s
@@ -0,0 +1,101 @@
+# RUN: rm -rf %t && split-file %s %t
+# RUN: llvm-mc -filetype=obj -triple=arm64-apple-darwin %t/a.s -o %t/a.o
+
+# RUN: %lld -arch arm64 -lSystem -e _main -o %t/a.out %t/a.o -order_file %t/ord-1
+# RUN: llvm-nm --numeric-sort --format=just-symbols %t/a.out | FileCheck %s
+
+
+# .text
+# CHECK: A
+# CHECK: B
+# CHECK: C
+# .section __DATA,__objc_const
+# CHECK: _OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1
+# CHECK: _ALPHABETIC_SORT_FIRST
+# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat1
+# CHECK: _OBJC_$_CATEGORY_SOME_$_FOLDED
+# CHECK: _OBJC_$_CATEGORY_CLASS_METHODS_Foo_$_Cat2
+# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat2
+# .section __DATA,__objc_data
+# CHECK: _OBJC_CLASS_$_Baz
+# CHECK: _OBJC_CLASS_$_Bar
+# CHECK: _OBJC_CLASS_$_Foo
+# CHECK: _OBJC_CLASS_$_Baz2
+
+	
+#--- a.s
+.text
+.globl _main, A, _B, C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+
+_main:
+  ret
+A:
+  ret
+F:
+  add w0, w0, #3
+  bl C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+  ret
+C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222:
+  add w0, w0, #2
+  bl  A
+  ret
+D:
+  add w0, w0, #2
+  bl B
+  ret
+B:
+  add w0, w0, #1
+  bl  A
+  ret
+E:
+  add w0, w0, #2
+  bl C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+  ret
+
+.section __DATA,__objc_const
+# test multiple symbols at the same address, which will be alphabetic sorted based symbol names
+_OBJC_$_CATEGORY_CLASS_METHODS_Foo_$_Cat2:
+  .quad 789
+
+_OBJC_$_CATEGORY_SOME_$_FOLDED:
+_OBJC_$_CATEGORY_Foo_$_Cat1:
+_ALPHABETIC_SORT_FIRST:
+ .quad 123
+
+_OBJC_$_CATEGORY_Foo_$_Cat2:
+ .quad 222
+
+_OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1:
+  .quad 456
+
+.section __DATA,__objc_data
+_OBJC_CLASS_$_Foo:
+ .quad 123
+
+_OBJC_CLASS_$_Bar.llvm.1234:
+ .quad 456
+
+_OBJC_CLASS_$_Baz:
+ .quad 789
+
+_OBJC_CLASS_$_Baz2:
+ .quad 999
+
+.section __DATA,__objc_classrefs
+.quad _OBJC_CLASS_$_Foo
+.quad _OBJC_CLASS_$_Bar.llvm.1234
+.quad _OBJC_CLASS_$_Baz
+
+.subsections_via_symbols
+
+
+#--- ord-1
+# change order, parital covered
+A
+B
+C.__uniq.555555555555555555555555555555555555555.llvm.6666666666666666666
+_OBJC_CLASS_$_Baz
+_OBJC_CLASS_$_Bar.__uniq.12345
+_OBJC_CLASS_$_Foo.__uniq.123.llvm.123456789
+_OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1
+_OBJC_$_CATEGORY_Foo_$_Cat1.llvm.1234567

@llvmbot
Copy link
Member

llvmbot commented May 20, 2025

@llvm/pr-subscribers-lld-macho

Author: None (SharonXSharon)

Changes
/// Symbols can be appended with "(.__uniq.xxxx)?.llvm.yyyy" where "xxxx" and
/// "yyyy" are numbers that could change between builds. We need to use the root
/// symbol name before this suffix so these symbols can be matched with profiles
/// which may have different suffixes.

Just like what we are doing in BP, https://github.com/llvm/llvm-project/blob/main/lld/MachO/BPSectionOrderer.cpp#L127

the patch removes the suffixes when parsing the order file and getting the symbol priority to have a better symbol match.


Full diff: https://github.com/llvm/llvm-project/pull/140670.diff

3 Files Affected:

  • (modified) lld/MachO/SectionPriorities.cpp (+8-2)
  • (modified) lld/MachO/SectionPriorities.h (+5)
  • (added) lld/test/MachO/order-file-strip-hashes.s (+101)
diff --git a/lld/MachO/SectionPriorities.cpp b/lld/MachO/SectionPriorities.cpp
index 7a4a5d8465f64..213623b338472 100644
--- a/lld/MachO/SectionPriorities.cpp
+++ b/lld/MachO/SectionPriorities.cpp
@@ -245,12 +245,18 @@ DenseMap<const InputSection *, int> CallGraphSort::run() {
   return orderMap;
 }
 
+StringRef macho::PriorityBuilder::getRootSymbol(StringRef Name) {
+  auto [P0, S0] = Name.rsplit(".llvm.");
+  auto [P1, S1] = P0.rsplit(".__uniq.");
+  return P1;
+}
+
 std::optional<int>
 macho::PriorityBuilder::getSymbolPriority(const Defined *sym) {
   if (sym->isAbsolute())
     return std::nullopt;
 
-  auto it = priorities.find(sym->getName());
+  auto it = priorities.find(getRootSymbol(sym->getName()));
   if (it == priorities.end())
     return std::nullopt;
   const SymbolPriorityEntry &entry = it->second;
@@ -330,7 +336,7 @@ void macho::PriorityBuilder::parseOrderFile(StringRef path) {
         break;
       }
     }
-    symbol = line.trim();
+    symbol = getRootSymbol(line.trim());
 
     if (!symbol.empty()) {
       SymbolPriorityEntry &entry = priorities[symbol];
diff --git a/lld/MachO/SectionPriorities.h b/lld/MachO/SectionPriorities.h
index 44fb101990c51..0bbf238c2c116 100644
--- a/lld/MachO/SectionPriorities.h
+++ b/lld/MachO/SectionPriorities.h
@@ -69,6 +69,11 @@ class PriorityBuilder {
   std::optional<int> getSymbolPriority(const Defined *sym);
   llvm::DenseMap<llvm::StringRef, SymbolPriorityEntry> priorities;
   llvm::MapVector<SectionPair, uint64_t> callGraphProfile;
+  /// Symbols can be appended with "(.__uniq.xxxx)?.llvm.yyyy" where "xxxx" and
+  /// "yyyy" are numbers that could change between builds. We need to use the
+  /// root symbol name before this suffix so these symbols can be matched with
+  /// profiles which may have different suffixes.
+  llvm::StringRef getRootSymbol(llvm::StringRef Name);
 };
 
 extern PriorityBuilder priorityBuilder;
diff --git a/lld/test/MachO/order-file-strip-hashes.s b/lld/test/MachO/order-file-strip-hashes.s
new file mode 100644
index 0000000000000..d7e21371ad9ca
--- /dev/null
+++ b/lld/test/MachO/order-file-strip-hashes.s
@@ -0,0 +1,101 @@
+# RUN: rm -rf %t && split-file %s %t
+# RUN: llvm-mc -filetype=obj -triple=arm64-apple-darwin %t/a.s -o %t/a.o
+
+# RUN: %lld -arch arm64 -lSystem -e _main -o %t/a.out %t/a.o -order_file %t/ord-1
+# RUN: llvm-nm --numeric-sort --format=just-symbols %t/a.out | FileCheck %s
+
+
+# .text
+# CHECK: A
+# CHECK: B
+# CHECK: C
+# .section __DATA,__objc_const
+# CHECK: _OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1
+# CHECK: _ALPHABETIC_SORT_FIRST
+# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat1
+# CHECK: _OBJC_$_CATEGORY_SOME_$_FOLDED
+# CHECK: _OBJC_$_CATEGORY_CLASS_METHODS_Foo_$_Cat2
+# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat2
+# .section __DATA,__objc_data
+# CHECK: _OBJC_CLASS_$_Baz
+# CHECK: _OBJC_CLASS_$_Bar
+# CHECK: _OBJC_CLASS_$_Foo
+# CHECK: _OBJC_CLASS_$_Baz2
+
+	
+#--- a.s
+.text
+.globl _main, A, _B, C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+
+_main:
+  ret
+A:
+  ret
+F:
+  add w0, w0, #3
+  bl C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+  ret
+C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222:
+  add w0, w0, #2
+  bl  A
+  ret
+D:
+  add w0, w0, #2
+  bl B
+  ret
+B:
+  add w0, w0, #1
+  bl  A
+  ret
+E:
+  add w0, w0, #2
+  bl C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+  ret
+
+.section __DATA,__objc_const
+# test multiple symbols at the same address, which will be alphabetic sorted based symbol names
+_OBJC_$_CATEGORY_CLASS_METHODS_Foo_$_Cat2:
+  .quad 789
+
+_OBJC_$_CATEGORY_SOME_$_FOLDED:
+_OBJC_$_CATEGORY_Foo_$_Cat1:
+_ALPHABETIC_SORT_FIRST:
+ .quad 123
+
+_OBJC_$_CATEGORY_Foo_$_Cat2:
+ .quad 222
+
+_OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1:
+  .quad 456
+
+.section __DATA,__objc_data
+_OBJC_CLASS_$_Foo:
+ .quad 123
+
+_OBJC_CLASS_$_Bar.llvm.1234:
+ .quad 456
+
+_OBJC_CLASS_$_Baz:
+ .quad 789
+
+_OBJC_CLASS_$_Baz2:
+ .quad 999
+
+.section __DATA,__objc_classrefs
+.quad _OBJC_CLASS_$_Foo
+.quad _OBJC_CLASS_$_Bar.llvm.1234
+.quad _OBJC_CLASS_$_Baz
+
+.subsections_via_symbols
+
+
+#--- ord-1
+# change order, parital covered
+A
+B
+C.__uniq.555555555555555555555555555555555555555.llvm.6666666666666666666
+_OBJC_CLASS_$_Baz
+_OBJC_CLASS_$_Bar.__uniq.12345
+_OBJC_CLASS_$_Foo.__uniq.123.llvm.123456789
+_OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1
+_OBJC_$_CATEGORY_Foo_$_Cat1.llvm.1234567

@SharonXSharon
Copy link
Contributor Author

Tagging @ellishg @alx32 @int3 @MaskRay for review, thanks!

@MaskRay
Copy link
Member

MaskRay commented May 20, 2025

This looks strange and we will lose the ability to order the suffixed symbols. But I am ok for the change if @ellishg is ok

@SharonXSharon
Copy link
Contributor Author

This looks strange and we will lose the ability to order the suffixed symbols. But I am ok for the change if @ellishg is ok

This is only going to drop the "(.__uniq.xxxx)?.llvm.yyyy" suffix where "xxxx" and "yyyy" are numbers that aren't stable, and will change between one compilation to another, so we actually will be able to order the real root symbols with this patch. BP is doing the same

Copy link
Contributor

@ellishg ellishg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaskRay is saying that if we want to order a.llvm.0123 before a.llvm.5678, we can't guarantee that after this patch. Although in practice this relies on knowing the internal implementation of how these suffixes are generated. I think it's fine to remove these suffixes

Comment on lines 248 to 253
StringRef macho::PriorityBuilder::getRootSymbol(StringRef Name) {
auto [P0, S0] = Name.rsplit(".llvm.");
auto [P1, S1] = P0.rsplit(".__uniq.");
return P1;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reuse the implementation for BP. Although I'm not sure where it should live. Maybe https://github.com/llvm/llvm-project/blob/main/lld/include/lld/Common/LLVM.h? @MaskRay what do you think?

/// Symbols can be appended with "(.__uniq.xxxx)?(.llvm.yyyy)?(.Tgm)?" where
/// "xxxx" and "yyyy" are numbers that could change between builds, and .Tgm is
/// the global merge functions suffix
/// (see GlobalMergeFunc::MergingInstanceSuffix). We need to use the root symbol
/// name before this suffix so these symbols can be matched with profiles which
/// may have different suffixes.
inline StringRef getRootSymbol(StringRef name) {
name.consume_back(".Tgm");
auto [P0, S0] = name.rsplit(".llvm.");
auto [P1, S1] = P0.rsplit(".__uniq.");
return P1;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lld/Common/LLVM.h is for forward declarations. Perhaps a new header in lld/Common?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaskRay @ellishg I created a new header lld/Common/Utils.h
please review again, thanks!

Comment on lines 8 to 23
# .text
# CHECK: A
# CHECK: B
# CHECK: C
# .section __DATA,__objc_const
# CHECK: _OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1
# CHECK: _ALPHABETIC_SORT_FIRST
# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat1
# CHECK: _OBJC_$_CATEGORY_SOME_$_FOLDED
# CHECK: _OBJC_$_CATEGORY_CLASS_METHODS_Foo_$_Cat2
# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat2
# .section __DATA,__objc_data
# CHECK: _OBJC_CLASS_$_Baz
# CHECK: _OBJC_CLASS_$_Bar
# CHECK: _OBJC_CLASS_$_Foo
# CHECK: _OBJC_CLASS_$_Baz2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these check lines down next to where the orderfile is defined? That will make it easier to see how they relate.

# CHECK: C
# .section __DATA,__objc_const
# CHECK: _OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1
# CHECK: _ALPHABETIC_SORT_FIRST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these symbols are not found in the orderfile, so I don't think we should be checking their orders as it could change.

@ellishg ellishg requested review from MaskRay and alx32 May 20, 2025 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants