Skip to content

[lld][ELF] Merge equivalent symbols found during ICF #139493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 96 additions & 3 deletions lld/ELF/ICF.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@
#include "SymbolTable.h"
#include "Symbols.h"
#include "SyntheticSections.h"
#include "Target.h"
#include "llvm/ADT/EquivalenceClasses.h"
#include "llvm/BinaryFormat/ELF.h"
#include "llvm/Object/ELF.h"
#include "llvm/Support/Parallel.h"
Expand Down Expand Up @@ -116,6 +118,10 @@ template <class ELFT> class ICF {

size_t findBoundary(size_t begin, size_t end);

// A relocation with side-effects is considered non-trivial. Eg: relocation
// creates GOT entry or TLS slot.
template <class RelTy> bool isTrivialRelocation(Symbol &s, RelTy reloc);

void forEachClassRange(size_t begin, size_t end,
llvm::function_ref<void(size_t, size_t)> fn);

Expand Down Expand Up @@ -234,6 +240,28 @@ void ICF<ELFT>::segregate(size_t begin, size_t end, uint32_t eqClassBase,
}
}

template <class ELFT>
template <class RelTy>
bool ICF<ELFT>::isTrivialRelocation(Symbol &s, RelTy reloc) {
// For our use cases, we can get by without calculating exact location within
// the section, and just use fake location array. We need to ensure validity
// for loc[-1] to loc[3] as various targets' getRelExpr() reference them.
std::array<uint8_t, 5> fakeLocArray;
uint8_t *fakeLoc = fakeLocArray.data() + 1;
RelExpr expr =
ctx.target->getRelExpr(reloc.getType(ctx.arg.isMips64EL), s, fakeLoc);

if (needsGot(expr) || needsTls(s, expr))
return false;
return true;
}

// Two symbols referenced by relocations can be merged together safely
// when their addends are same.
static bool canMergeSymbols(uint64_t addA, uint64_t addB) {
return addA == addB;
}

// Compare two lists of relocations.
template <class ELFT>
template <class RelTy>
Expand Down Expand Up @@ -286,9 +314,13 @@ bool ICF<ELFT>::constantEq(const InputSection *secA, Relocs<RelTy> ra,
// Relocations referring to InputSections are constant-equal if their
// section offsets are equal.
if (isa<InputSection>(da->section)) {
if (da->value + addA == db->value + addB)
if (da->value + addA == db->value + addB) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was not in #134342 . I have tried to understand why the logic is so complex.
Looks like https://reviews.llvm.org/D34094 , while supporting MergeInputSection, also introduced the untested condition for InputSection

    if (isa<InputSection>(DA->Section))
      return DA->Value + AddA == DB->Value + AddB;

I feel that we should switch to the safer da->value == db->value && addA == addB, which likely doesn't lose anything (I believe it's difficult for clang emitted code to utilize the relaxed condition.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think da->value + addA == db->value + addB is more readable. I can read it as if the address resolves to the same place, sections are considered same. It also takes care of section folding for more cases. Changing it to da->value == db->value && addA == addB would exclude some cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it to da->value == db->value && addA == addB would exclude some cases.

The excluded cases are likely negligible. This condition is for InputSection (non merge string), where symbols defined not at offset 0 in -ffunction-sections -fdata-sections code are extremely rare. I don't know a case LLVM codegen would do this.

Copy link
Contributor Author

@pranavk pranavk May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chromium build that made us revert the original change did have many sections that were being merged due to this property and @zmodem tried my second commit in this PR (e4041b7) separately and noticed a binary size regression that was unacceptable to them. This comment on that Chromium bug is relevant: ~~ https://b.corp.google.com/issues/415810137#comment44
(https://issues.chromium.org/issues/415810137#comment44)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you insist, I propose to do that in a separate PR so that these changes are not affected by any potential revert.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps break disagreement, I lean towards @MaskRay 's suggestion to use da->value == db->value && addA == addB as the condition (same value, same addend). Maybe it misses something, but it's conceptually simpler even if it misses some case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I corrected the link in my previous comment to a public one. Apologies I didn't notice that I pasted an internal link to the Chromium bug.

Using da->value == db->value && addA == addB may seem trivial but it misses cases that increases the size of chromium builds to unacceptable levels. As I mentioned in my previous comment, @zmodem had tried it already and noticed a large size regression.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misremembering, but IIRC the only size impact we complained about was from the suggested workaround of using --icf=safe.

I don't remember evaluating the size impact of this PR. We just tried it and confirmed it works (https://crbug.com/415810137#comment50), waited for it to get through review, and reverted the original change as the fix seemed to be taking a while.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Thanks for clarification. I was under the impression that after my comment here (https://issues.chromium.org/issues/415810137#comment50), you tried it and confirmed it works but the size regression was not acceptable to you.

I am okay simplifying the condition here after this clarification.

// For non-trivial relocations, if we cannot merge symbols together,
// we must not merge sections either.
if (!isTrivialRelocation(sa, *rai) && !canMergeSymbols(addA, addB))
return false;
continue;
return false;
}
}

// Relocations referring to MergeInputSections are constant-equal if their
Expand Down Expand Up @@ -333,6 +365,29 @@ bool ICF<ELFT>::equalsConstant(const InputSection *a, const InputSection *b) {
: constantEq(a, ra.relas, b, rb.relas);
}

template <class ELFT, class RelTy>
static SmallVector<std::pair<Symbol *, uint64_t>>
getReloc(const InputSection *sec, Relocs<RelTy> relocs) {
SmallVector<std::pair<Symbol *, uint64_t>> syms;
for (auto ri = relocs.begin(), re = relocs.end(); ri != re; ++ri) {
Symbol &sym = sec->file->getRelocTargetSym(*ri);
syms.emplace_back(&sym, getAddend<ELFT>(*ri));
}
return syms;
}

template <class ELFT>
static SmallVector<std::pair<Symbol *, uint64_t>>
getRelocTargetSyms(const InputSection *sec) {
const RelsOrRelas<ELFT> rel = sec->template relsOrRelas<ELFT>();
if (rel.areRelocsCrel())
return getReloc<ELFT>(sec, rel.crels);
if (rel.areRelocsRel())
return getReloc<ELFT>(sec, rel.rels);

return getReloc<ELFT>(sec, rel.relas);
}

// Compare two lists of relocations. Returns true if all pairs of
// relocations point to the same section in terms of ICF.
template <class ELFT>
Expand All @@ -352,6 +407,14 @@ bool ICF<ELFT>::variableEq(const InputSection *secA, Relocs<RelTy> ra,
auto *da = cast<Defined>(&sa);
auto *db = cast<Defined>(&sb);

// Prevent sections containing local symbols from merging into sections with
// global symbols, or vice-versa. This is to prevent local-global symbols
// getting merged into each other (done later in ICF). We do this as
// post-ICF passes cannot handle duplicates when iterating over local
// symbols. There are also assertions that prevent this.
if ((da->isLocal() || db->isLocal()) && !isTrivialRelocation(sa, *rai))
return false;

// We already dealt with absolute and non-InputSection symbols in
// constantEq, and for InputSections we have already checked everything
// except the equivalence class.
Expand Down Expand Up @@ -537,14 +600,29 @@ template <class ELFT> void ICF<ELFT>::run() {
auto print = [&ctx = ctx]() -> ELFSyncStream {
return {ctx, ctx.arg.printIcfSections ? DiagLevel::Msg : DiagLevel::None};
};

EquivalenceClasses<Symbol *> symbolEquivalence;
// Merge sections by the equivalence class.
// Merge symbols identified as equivalent during ICF.
forEachClassRange(0, sections.size(), [&](size_t begin, size_t end) {
if (end - begin == 1)
return;
print() << "selected section " << sections[begin];
SmallVector<std::pair<Symbol *, uint64_t>> syms =
getRelocTargetSyms<ELFT>(sections[begin]);
for (size_t i = begin + 1; i < end; ++i) {
print() << " removing identical section " << sections[i];
sections[begin]->replace(sections[i]);
SmallVector<std::pair<Symbol *, uint64_t>> replacedSyms =
getRelocTargetSyms<ELFT>(sections[i]);
assert(syms.size() == replacedSyms.size() &&
"Should have same number of syms!");
for (size_t j = 0; j < syms.size(); j++) {
if (syms[j].first->isLocal() || replacedSyms[j].first->isLocal() ||
!canMergeSymbols(syms[j].second, replacedSyms[j].second))
continue;
symbolEquivalence.unionSets(syms[j].first, replacedSyms[j].first);
}

// At this point we know sections merged are fully identical and hence
// we want to remove duplicate implicit dependencies such as link order
Expand All @@ -563,11 +641,26 @@ template <class ELFT> void ICF<ELFT>::run() {
d->folded = true;
}
};
for (Symbol *sym : ctx.symtab->getSymbols())
for (Symbol *sym : ctx.symtab->getSymbols()) {
fold(sym);
auto it = symbolEquivalence.findLeader(sym);
if (it != symbolEquivalence.member_end() && *it != sym) {
print() << "redirecting '" << sym->getName() << "' in symtab to '"
<< (*it)->getName() << "'";
ctx.symtab->redirect(sym, *it);
}
}
parallelForEach(ctx.objectFiles, [&](ELFFileBase *file) {
for (Symbol *sym : file->getLocalSymbols())
fold(sym);
for (Symbol *&sym : file->getMutableGlobalSymbols()) {
auto it = symbolEquivalence.findLeader(sym);
if (it != symbolEquivalence.member_end() && *it != sym) {
print() << "redirecting '" << sym->getName() << "' to '"
<< (*it)->getName() << "'";
sym = *it;
}
}
});

// InputSectionDescription::sections is populated by processSectionCommands().
Expand Down
34 changes: 1 addition & 33 deletions lld/ELF/Relocations.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -127,29 +127,6 @@ void elf::reportRangeError(Ctx &ctx, uint8_t *loc, int64_t v, int n,
}
}

// Build a bitmask with one bit set for each 64 subset of RelExpr.
static constexpr uint64_t buildMask() { return 0; }

template <typename... Tails>
static constexpr uint64_t buildMask(int head, Tails... tails) {
return (0 <= head && head < 64 ? uint64_t(1) << head : 0) |
buildMask(tails...);
}

// Return true if `Expr` is one of `Exprs`.
// There are more than 64 but less than 128 RelExprs, so we divide the set of
// exprs into [0, 64) and [64, 128) and represent each range as a constant
// 64-bit mask. Then we decide which mask to test depending on the value of
// expr and use a simple shift and bitwise-and to test for membership.
template <RelExpr... Exprs> static bool oneof(RelExpr expr) {
assert(0 <= expr && (int)expr < 128 &&
"RelExpr is too large for 128-bit mask!");

if (expr >= 64)
return (uint64_t(1) << (expr - 64)) & buildMask((Exprs - 64)...);
return (uint64_t(1) << expr) & buildMask(Exprs...);
}

static RelType getMipsPairType(RelType type, bool isLocal) {
switch (type) {
case R_MIPS_HI16:
Expand Down Expand Up @@ -196,15 +173,6 @@ static bool needsPlt(RelExpr expr) {
RE_PPC64_CALL_PLT>(expr);
}

bool lld::elf::needsGot(RelExpr expr) {
return oneof<R_GOT, RE_AARCH64_AUTH_GOT, RE_AARCH64_AUTH_GOT_PC, R_GOT_OFF,
RE_MIPS_GOT_LOCAL_PAGE, RE_MIPS_GOT_OFF, RE_MIPS_GOT_OFF32,
RE_AARCH64_GOT_PAGE_PC, RE_AARCH64_AUTH_GOT_PAGE_PC,
RE_AARCH64_AUTH_GOT_PAGE_PC, R_GOT_PC, R_GOTPLT,
RE_AARCH64_GOT_PAGE, RE_LOONGARCH_GOT, RE_LOONGARCH_GOT_PAGE_PC>(
expr);
}

// True if this expression is of the form Sym - X, where X is a position in the
// file (PC, or GOT for example).
static bool isRelExpr(RelExpr expr) {
Expand Down Expand Up @@ -1593,7 +1561,7 @@ void RelocationScanner::scanOne(typename Relocs<RelTy>::const_iterator &i) {
//
// Some RISCV TLSDESC relocations reference a local NOTYPE symbol,
// but we need to process them in handleTlsRelocation.
if (sym.isTls() || oneof<R_TLSDESC_PC, R_TLSDESC_CALL>(expr)) {
if (needsTls(sym, expr)) {
if (unsigned processed =
handleTlsRelocation(expr, type, offset, sym, addend)) {
i += processed - 1;
Expand Down
39 changes: 37 additions & 2 deletions lld/ELF/Relocations.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#ifndef LLD_ELF_RELOCATIONS_H
#define LLD_ELF_RELOCATIONS_H

#include "Symbols.h"
#include "lld/Common/LLVM.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/STLExtras.h"
Expand All @@ -18,7 +19,6 @@
namespace lld::elf {
struct Ctx;
class Defined;
class Symbol;
class InputSection;
class InputSectionBase;
class OutputSection;
Expand Down Expand Up @@ -355,6 +355,29 @@ inline Relocs<RelTy> sortRels(Relocs<RelTy> rels,
return rels;
}

// Build a bitmask with one bit set for each 64 subset of RelExpr.
constexpr uint64_t buildMask() { return 0; }

template <typename... Tails>
constexpr uint64_t buildMask(int head, Tails... tails) {
return (0 <= head && head < 64 ? uint64_t(1) << head : 0) |
buildMask(tails...);
}

// Return true if `Expr` is one of `Exprs`.
// There are more than 64 but less than 128 RelExprs, so we divide the set of
// exprs into [0, 64) and [64, 128) and represent each range as a constant
// 64-bit mask. Then we decide which mask to test depending on the value of
// expr and use a simple shift and bitwise-and to test for membership.
template <RelExpr... Exprs> static bool oneof(RelExpr expr) {
assert(0 <= expr && (int)expr < 128 &&
"RelExpr is too large for 128-bit mask!");

if (expr >= 64)
return (uint64_t(1) << (expr - 64)) & buildMask((Exprs - 64)...);
return (uint64_t(1) << expr) & buildMask(Exprs...);
}

template <bool is64>
inline Relocs<llvm::object::Elf_Crel_Impl<is64>>
sortRels(Relocs<llvm::object::Elf_Crel_Impl<is64>> rels,
Expand All @@ -367,7 +390,19 @@ RelocationBaseSection &getIRelativeSection(Ctx &ctx);
// Returns true if Expr refers a GOT entry. Note that this function returns
// false for TLS variables even though they need GOT, because TLS variables uses
// GOT differently than the regular variables.
bool needsGot(RelExpr expr);
inline bool needsGot(RelExpr expr) {
return oneof<R_GOT, RE_AARCH64_AUTH_GOT, RE_AARCH64_AUTH_GOT_PC, R_GOT_OFF,
RE_MIPS_GOT_LOCAL_PAGE, RE_MIPS_GOT_OFF, RE_MIPS_GOT_OFF32,
RE_AARCH64_GOT_PAGE_PC, RE_AARCH64_AUTH_GOT_PAGE_PC,
RE_AARCH64_AUTH_GOT_PAGE_PC, R_GOT_PC, R_GOTPLT,
RE_AARCH64_GOT_PAGE, RE_LOONGARCH_GOT, RE_LOONGARCH_GOT_PAGE_PC>(
expr);
}

inline bool needsTls(Symbol &s, RelExpr expr) {
return s.isTls() || oneof<R_TLSDESC_PC, R_TLSDESC_CALL>(expr);
}

} // namespace lld::elf

#endif
6 changes: 6 additions & 0 deletions lld/ELF/SymbolTable.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@ using namespace llvm::ELF;
using namespace lld;
using namespace lld::elf;

void SymbolTable::redirect(Symbol *from, Symbol *to) {
int &fromIdx = symMap[CachedHashStringRef(from->getName())];
const int toIdx = symMap[CachedHashStringRef(to->getName())];
fromIdx = toIdx;
}

void SymbolTable::wrap(Symbol *sym, Symbol *real, Symbol *wrap) {
// Redirect __real_foo to the original foo and foo to the original __wrap_foo.
int &idx1 = symMap[CachedHashStringRef(sym->getName())];
Expand Down
1 change: 1 addition & 0 deletions lld/ELF/SymbolTable.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ class SymbolTable {
SymbolTable(Ctx &ctx) : ctx(ctx) {}
ArrayRef<Symbol *> getSymbols() const { return symVector; }

void redirect(Symbol *from, Symbol *to);
void wrap(Symbol *sym, Symbol *real, Symbol *wrap);

Symbol *insert(StringRef name);
Expand Down
Loading
Loading