-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[lld][macho] Strip .__uniq. and .llvm. hashes in -order_file #140670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ing in -order_file
@llvm/pr-subscribers-lld-elf @llvm/pr-subscribers-lld Author: None (SharonXSharon) Changes
Just like what we are doing in BP, https://github.com/llvm/llvm-project/blob/main/lld/MachO/BPSectionOrderer.cpp#L127 the patch removes the suffixes when parsing the order file and getting the symbol priority to have a better symbol match. Full diff: https://github.com/llvm/llvm-project/pull/140670.diff 3 Files Affected:
diff --git a/lld/MachO/SectionPriorities.cpp b/lld/MachO/SectionPriorities.cpp
index 7a4a5d8465f64..213623b338472 100644
--- a/lld/MachO/SectionPriorities.cpp
+++ b/lld/MachO/SectionPriorities.cpp
@@ -245,12 +245,18 @@ DenseMap<const InputSection *, int> CallGraphSort::run() {
return orderMap;
}
+StringRef macho::PriorityBuilder::getRootSymbol(StringRef Name) {
+ auto [P0, S0] = Name.rsplit(".llvm.");
+ auto [P1, S1] = P0.rsplit(".__uniq.");
+ return P1;
+}
+
std::optional<int>
macho::PriorityBuilder::getSymbolPriority(const Defined *sym) {
if (sym->isAbsolute())
return std::nullopt;
- auto it = priorities.find(sym->getName());
+ auto it = priorities.find(getRootSymbol(sym->getName()));
if (it == priorities.end())
return std::nullopt;
const SymbolPriorityEntry &entry = it->second;
@@ -330,7 +336,7 @@ void macho::PriorityBuilder::parseOrderFile(StringRef path) {
break;
}
}
- symbol = line.trim();
+ symbol = getRootSymbol(line.trim());
if (!symbol.empty()) {
SymbolPriorityEntry &entry = priorities[symbol];
diff --git a/lld/MachO/SectionPriorities.h b/lld/MachO/SectionPriorities.h
index 44fb101990c51..0bbf238c2c116 100644
--- a/lld/MachO/SectionPriorities.h
+++ b/lld/MachO/SectionPriorities.h
@@ -69,6 +69,11 @@ class PriorityBuilder {
std::optional<int> getSymbolPriority(const Defined *sym);
llvm::DenseMap<llvm::StringRef, SymbolPriorityEntry> priorities;
llvm::MapVector<SectionPair, uint64_t> callGraphProfile;
+ /// Symbols can be appended with "(.__uniq.xxxx)?.llvm.yyyy" where "xxxx" and
+ /// "yyyy" are numbers that could change between builds. We need to use the
+ /// root symbol name before this suffix so these symbols can be matched with
+ /// profiles which may have different suffixes.
+ llvm::StringRef getRootSymbol(llvm::StringRef Name);
};
extern PriorityBuilder priorityBuilder;
diff --git a/lld/test/MachO/order-file-strip-hashes.s b/lld/test/MachO/order-file-strip-hashes.s
new file mode 100644
index 0000000000000..d7e21371ad9ca
--- /dev/null
+++ b/lld/test/MachO/order-file-strip-hashes.s
@@ -0,0 +1,101 @@
+# RUN: rm -rf %t && split-file %s %t
+# RUN: llvm-mc -filetype=obj -triple=arm64-apple-darwin %t/a.s -o %t/a.o
+
+# RUN: %lld -arch arm64 -lSystem -e _main -o %t/a.out %t/a.o -order_file %t/ord-1
+# RUN: llvm-nm --numeric-sort --format=just-symbols %t/a.out | FileCheck %s
+
+
+# .text
+# CHECK: A
+# CHECK: B
+# CHECK: C
+# .section __DATA,__objc_const
+# CHECK: _OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1
+# CHECK: _ALPHABETIC_SORT_FIRST
+# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat1
+# CHECK: _OBJC_$_CATEGORY_SOME_$_FOLDED
+# CHECK: _OBJC_$_CATEGORY_CLASS_METHODS_Foo_$_Cat2
+# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat2
+# .section __DATA,__objc_data
+# CHECK: _OBJC_CLASS_$_Baz
+# CHECK: _OBJC_CLASS_$_Bar
+# CHECK: _OBJC_CLASS_$_Foo
+# CHECK: _OBJC_CLASS_$_Baz2
+
+
+#--- a.s
+.text
+.globl _main, A, _B, C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+
+_main:
+ ret
+A:
+ ret
+F:
+ add w0, w0, #3
+ bl C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+ ret
+C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222:
+ add w0, w0, #2
+ bl A
+ ret
+D:
+ add w0, w0, #2
+ bl B
+ ret
+B:
+ add w0, w0, #1
+ bl A
+ ret
+E:
+ add w0, w0, #2
+ bl C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+ ret
+
+.section __DATA,__objc_const
+# test multiple symbols at the same address, which will be alphabetic sorted based symbol names
+_OBJC_$_CATEGORY_CLASS_METHODS_Foo_$_Cat2:
+ .quad 789
+
+_OBJC_$_CATEGORY_SOME_$_FOLDED:
+_OBJC_$_CATEGORY_Foo_$_Cat1:
+_ALPHABETIC_SORT_FIRST:
+ .quad 123
+
+_OBJC_$_CATEGORY_Foo_$_Cat2:
+ .quad 222
+
+_OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1:
+ .quad 456
+
+.section __DATA,__objc_data
+_OBJC_CLASS_$_Foo:
+ .quad 123
+
+_OBJC_CLASS_$_Bar.llvm.1234:
+ .quad 456
+
+_OBJC_CLASS_$_Baz:
+ .quad 789
+
+_OBJC_CLASS_$_Baz2:
+ .quad 999
+
+.section __DATA,__objc_classrefs
+.quad _OBJC_CLASS_$_Foo
+.quad _OBJC_CLASS_$_Bar.llvm.1234
+.quad _OBJC_CLASS_$_Baz
+
+.subsections_via_symbols
+
+
+#--- ord-1
+# change order, parital covered
+A
+B
+C.__uniq.555555555555555555555555555555555555555.llvm.6666666666666666666
+_OBJC_CLASS_$_Baz
+_OBJC_CLASS_$_Bar.__uniq.12345
+_OBJC_CLASS_$_Foo.__uniq.123.llvm.123456789
+_OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1
+_OBJC_$_CATEGORY_Foo_$_Cat1.llvm.1234567
|
@llvm/pr-subscribers-lld-macho Author: None (SharonXSharon) Changes
Just like what we are doing in BP, https://github.com/llvm/llvm-project/blob/main/lld/MachO/BPSectionOrderer.cpp#L127 the patch removes the suffixes when parsing the order file and getting the symbol priority to have a better symbol match. Full diff: https://github.com/llvm/llvm-project/pull/140670.diff 3 Files Affected:
diff --git a/lld/MachO/SectionPriorities.cpp b/lld/MachO/SectionPriorities.cpp
index 7a4a5d8465f64..213623b338472 100644
--- a/lld/MachO/SectionPriorities.cpp
+++ b/lld/MachO/SectionPriorities.cpp
@@ -245,12 +245,18 @@ DenseMap<const InputSection *, int> CallGraphSort::run() {
return orderMap;
}
+StringRef macho::PriorityBuilder::getRootSymbol(StringRef Name) {
+ auto [P0, S0] = Name.rsplit(".llvm.");
+ auto [P1, S1] = P0.rsplit(".__uniq.");
+ return P1;
+}
+
std::optional<int>
macho::PriorityBuilder::getSymbolPriority(const Defined *sym) {
if (sym->isAbsolute())
return std::nullopt;
- auto it = priorities.find(sym->getName());
+ auto it = priorities.find(getRootSymbol(sym->getName()));
if (it == priorities.end())
return std::nullopt;
const SymbolPriorityEntry &entry = it->second;
@@ -330,7 +336,7 @@ void macho::PriorityBuilder::parseOrderFile(StringRef path) {
break;
}
}
- symbol = line.trim();
+ symbol = getRootSymbol(line.trim());
if (!symbol.empty()) {
SymbolPriorityEntry &entry = priorities[symbol];
diff --git a/lld/MachO/SectionPriorities.h b/lld/MachO/SectionPriorities.h
index 44fb101990c51..0bbf238c2c116 100644
--- a/lld/MachO/SectionPriorities.h
+++ b/lld/MachO/SectionPriorities.h
@@ -69,6 +69,11 @@ class PriorityBuilder {
std::optional<int> getSymbolPriority(const Defined *sym);
llvm::DenseMap<llvm::StringRef, SymbolPriorityEntry> priorities;
llvm::MapVector<SectionPair, uint64_t> callGraphProfile;
+ /// Symbols can be appended with "(.__uniq.xxxx)?.llvm.yyyy" where "xxxx" and
+ /// "yyyy" are numbers that could change between builds. We need to use the
+ /// root symbol name before this suffix so these symbols can be matched with
+ /// profiles which may have different suffixes.
+ llvm::StringRef getRootSymbol(llvm::StringRef Name);
};
extern PriorityBuilder priorityBuilder;
diff --git a/lld/test/MachO/order-file-strip-hashes.s b/lld/test/MachO/order-file-strip-hashes.s
new file mode 100644
index 0000000000000..d7e21371ad9ca
--- /dev/null
+++ b/lld/test/MachO/order-file-strip-hashes.s
@@ -0,0 +1,101 @@
+# RUN: rm -rf %t && split-file %s %t
+# RUN: llvm-mc -filetype=obj -triple=arm64-apple-darwin %t/a.s -o %t/a.o
+
+# RUN: %lld -arch arm64 -lSystem -e _main -o %t/a.out %t/a.o -order_file %t/ord-1
+# RUN: llvm-nm --numeric-sort --format=just-symbols %t/a.out | FileCheck %s
+
+
+# .text
+# CHECK: A
+# CHECK: B
+# CHECK: C
+# .section __DATA,__objc_const
+# CHECK: _OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1
+# CHECK: _ALPHABETIC_SORT_FIRST
+# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat1
+# CHECK: _OBJC_$_CATEGORY_SOME_$_FOLDED
+# CHECK: _OBJC_$_CATEGORY_CLASS_METHODS_Foo_$_Cat2
+# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat2
+# .section __DATA,__objc_data
+# CHECK: _OBJC_CLASS_$_Baz
+# CHECK: _OBJC_CLASS_$_Bar
+# CHECK: _OBJC_CLASS_$_Foo
+# CHECK: _OBJC_CLASS_$_Baz2
+
+
+#--- a.s
+.text
+.globl _main, A, _B, C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+
+_main:
+ ret
+A:
+ ret
+F:
+ add w0, w0, #3
+ bl C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+ ret
+C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222:
+ add w0, w0, #2
+ bl A
+ ret
+D:
+ add w0, w0, #2
+ bl B
+ ret
+B:
+ add w0, w0, #1
+ bl A
+ ret
+E:
+ add w0, w0, #2
+ bl C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
+ ret
+
+.section __DATA,__objc_const
+# test multiple symbols at the same address, which will be alphabetic sorted based symbol names
+_OBJC_$_CATEGORY_CLASS_METHODS_Foo_$_Cat2:
+ .quad 789
+
+_OBJC_$_CATEGORY_SOME_$_FOLDED:
+_OBJC_$_CATEGORY_Foo_$_Cat1:
+_ALPHABETIC_SORT_FIRST:
+ .quad 123
+
+_OBJC_$_CATEGORY_Foo_$_Cat2:
+ .quad 222
+
+_OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1:
+ .quad 456
+
+.section __DATA,__objc_data
+_OBJC_CLASS_$_Foo:
+ .quad 123
+
+_OBJC_CLASS_$_Bar.llvm.1234:
+ .quad 456
+
+_OBJC_CLASS_$_Baz:
+ .quad 789
+
+_OBJC_CLASS_$_Baz2:
+ .quad 999
+
+.section __DATA,__objc_classrefs
+.quad _OBJC_CLASS_$_Foo
+.quad _OBJC_CLASS_$_Bar.llvm.1234
+.quad _OBJC_CLASS_$_Baz
+
+.subsections_via_symbols
+
+
+#--- ord-1
+# change order, parital covered
+A
+B
+C.__uniq.555555555555555555555555555555555555555.llvm.6666666666666666666
+_OBJC_CLASS_$_Baz
+_OBJC_CLASS_$_Bar.__uniq.12345
+_OBJC_CLASS_$_Foo.__uniq.123.llvm.123456789
+_OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1
+_OBJC_$_CATEGORY_Foo_$_Cat1.llvm.1234567
|
This looks strange and we will lose the ability to order the suffixed symbols. But I am ok for the change if @ellishg is ok |
This is only going to drop the "(.__uniq.xxxx)?.llvm.yyyy" suffix where "xxxx" and "yyyy" are numbers that aren't stable, and will change between one compilation to another, so we actually will be able to order the real root symbols with this patch. BP is doing the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaskRay is saying that if we want to order a.llvm.0123
before a.llvm.5678
, we can't guarantee that after this patch. Although in practice this relies on knowing the internal implementation of how these suffixes are generated. I think it's fine to remove these suffixes
lld/MachO/SectionPriorities.cpp
Outdated
StringRef macho::PriorityBuilder::getRootSymbol(StringRef Name) { | ||
auto [P0, S0] = Name.rsplit(".llvm."); | ||
auto [P1, S1] = P0.rsplit(".__uniq."); | ||
return P1; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's reuse the implementation for BP
. Although I'm not sure where it should live. Maybe https://github.com/llvm/llvm-project/blob/main/lld/include/lld/Common/LLVM.h
? @MaskRay what do you think?
llvm-project/lld/include/lld/Common/BPSectionOrdererBase.inc
Lines 150 to 161 in cbcfe66
/// Symbols can be appended with "(.__uniq.xxxx)?(.llvm.yyyy)?(.Tgm)?" where | |
/// "xxxx" and "yyyy" are numbers that could change between builds, and .Tgm is | |
/// the global merge functions suffix | |
/// (see GlobalMergeFunc::MergingInstanceSuffix). We need to use the root symbol | |
/// name before this suffix so these symbols can be matched with profiles which | |
/// may have different suffixes. | |
inline StringRef getRootSymbol(StringRef name) { | |
name.consume_back(".Tgm"); | |
auto [P0, S0] = name.rsplit(".llvm."); | |
auto [P1, S1] = P0.rsplit(".__uniq."); | |
return P1; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lld/Common/LLVM.h is for forward declarations. Perhaps a new header in lld/Common?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# .text | ||
# CHECK: A | ||
# CHECK: B | ||
# CHECK: C | ||
# .section __DATA,__objc_const | ||
# CHECK: _OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1 | ||
# CHECK: _ALPHABETIC_SORT_FIRST | ||
# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat1 | ||
# CHECK: _OBJC_$_CATEGORY_SOME_$_FOLDED | ||
# CHECK: _OBJC_$_CATEGORY_CLASS_METHODS_Foo_$_Cat2 | ||
# CHECK: _OBJC_$_CATEGORY_Foo_$_Cat2 | ||
# .section __DATA,__objc_data | ||
# CHECK: _OBJC_CLASS_$_Baz | ||
# CHECK: _OBJC_CLASS_$_Bar | ||
# CHECK: _OBJC_CLASS_$_Foo | ||
# CHECK: _OBJC_CLASS_$_Baz2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move these check lines down next to where the orderfile is defined? That will make it easier to see how they relate.
# CHECK: C | ||
# .section __DATA,__objc_const | ||
# CHECK: _OBJC_$_CATEGORY_INSTANCE_METHODS_Foo_$_Cat1 | ||
# CHECK: _ALPHABETIC_SORT_FIRST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these symbols are not found in the orderfile, so I don't think we should be checking their orders as it could change.
Just like what we are doing in BP, https://github.com/llvm/llvm-project/blob/main/lld/MachO/BPSectionOrderer.cpp#L127
the patch removes the suffixes when parsing the order file and getting the symbol priority to have a better symbol match.