-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[LLD] [COFF] Fix linking import libraries with -wholearchive: #122806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-lld @llvm/pr-subscribers-lld-coff Author: Martin Storsjö (mstorsjo) ChangesWhen LLD links against an import library (for the regular, short import libraries), it doesn't actually link in the header/trailer object files at all, but synthesizes new corresponding data structures into the right sections. If the whole of such an import library is forced to be linked, e.g. with the -wholearchive: option, we actually end up linking in those header/trailer objects. The header objects contain a construct which LLD fails to handle; previously we'd error out with the error ".idata$4 should not refer to special section 0". Within the import library header object, in the import directory we have relocations towards the IAT (.idata$4 and .idata$5), but the header object itself doesn't contain any data for those sections. In the case of GNU generated import libraries, the header objects contain zero length sections .idata$4 and .idata$5, with relocations against them. However in the case of LLVM generated import libraries, the sections .idata$4 and .idata$5 are not included in the list of sections. The symbol table does contain section symbols for these sections, but without any actual associated section. This can probably be seen as a declaration of an empty section. If the header/trailer objects of a short import library are linked forcibly and we also reference other functions in the library, we end up with two import directory entries for this DLL, one that gets synthesized by LLD, and one from the actual header object file. This is inelegant, but should be acceptable. While it would seem unusual to link import libraries with the -wholearchive: option, this can happen in certain scenarios. Rust builds libraries that contain relevant import libraries bundled along with compiled Rust code as regular object files, all within one single archive. Such an archive can then end up linked with the -wholarchive: option, if build systems decide to use such an option for including static libraries. This should fix msys2/MINGW-packages#21017. This works for the header/trailer object files in import libraries generated by LLVM; import libraries generated by MSVC are vaguely different. ecb5ea6 did an attempt at fixing the issue for MSVC generated libraries, but it's not entirely correct, and isn't enough for making things work for that case. Full diff: https://github.com/llvm/llvm-project/pull/122806.diff 3 Files Affected:
diff --git a/lld/COFF/InputFiles.cpp b/lld/COFF/InputFiles.cpp
index a94c984cfd4870..86284f042fe7a3 100644
--- a/lld/COFF/InputFiles.cpp
+++ b/lld/COFF/InputFiles.cpp
@@ -746,6 +746,26 @@ std::optional<Symbol *> ObjFile::createDefined(
if (sectionNumber == llvm::COFF::IMAGE_SYM_DEBUG)
return nullptr;
+ if (sym.isEmptySectionDeclaration()) {
+ // As there is no coff_section in the object file for these, make a
+ // new virtual one, with everything zeroed out (i.e. an empty section),
+ // with only the name and characteristics set.
+ StringRef name = getName();
+ auto *hdr = make<coff_section>();
+ memset(hdr, 0, sizeof(*hdr));
+ strncpy(hdr->Name, name.data(),
+ std::min(name.size(), (size_t)COFF::NameSize));
+ // We have no idea what characteristics should be assumed here; pick
+ // a default. This matches what is used for .idata sections in the regular
+ // object files in import libraries.
+ hdr->Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ |
+ IMAGE_SCN_MEM_WRITE;
+ auto *sc = make<SectionChunk>(this, hdr);
+ chunks.push_back(sc);
+ return make<DefinedRegular>(this, /*Name*/ "", /*IsCOMDAT*/ false,
+ /*IsExternal*/ false, sym.getGeneric(), sc);
+ }
+
if (llvm::COFF::isReservedSectionNumber(sectionNumber))
Fatal(ctx) << toString(this) << ": " << getName()
<< " should not refer to special section "
diff --git a/lld/test/COFF/wholearchive-implib.s b/lld/test/COFF/wholearchive-implib.s
new file mode 100644
index 00000000000000..64b94ab730229e
--- /dev/null
+++ b/lld/test/COFF/wholearchive-implib.s
@@ -0,0 +1,35 @@
+// REQUIRES: x86
+// RUN: split-file %s %t.dir
+// RUN: llvm-lib -machine:amd64 -out:%t.lib -def:%t.dir/lib.def
+// RUN: llvm-mc -filetype=obj -triple=x86_64-windows %t.dir/main.s -o %t.main.obj
+
+// RUN: lld-link -out:%t.exe %t.main.obj -wholearchive:%t.lib -entry:entry -subsystem:console
+// RUN: llvm-readobj --coff-imports %t.exe | FileCheck %s
+
+// As LLD usually doesn't use the header/trailer object files from import
+// libraries, but instead synthesizes those structures, we end up with two
+// import directory entries if we force those objects to be included.
+
+// CHECK: Import {
+// CHECK-NEXT: Name: lib.dll
+// CHECK-NEXT: ImportLookupTableRVA: 0x2050
+// CHECK-NEXT: ImportAddressTableRVA: 0x2070
+// CHECK-NEXT: }
+// CHECK-NEXT: Import {
+// CHECK-NEXT: Name: lib.dll
+// CHECK-NEXT: ImportLookupTableRVA: 0x2058
+// CHECK-NEXT: ImportAddressTableRVA: 0x2078
+// CHECK-NEXT: Symbol: func (0)
+// CHECK-NEXT: }
+
+
+#--- main.s
+.global entry
+entry:
+ call func
+ ret
+
+#--- lib.def
+LIBRARY lib.dll
+EXPORTS
+func
diff --git a/llvm/include/llvm/Object/COFF.h b/llvm/include/llvm/Object/COFF.h
index 05b3587224c296..4de2c680f57b1a 100644
--- a/llvm/include/llvm/Object/COFF.h
+++ b/llvm/include/llvm/Object/COFF.h
@@ -392,6 +392,11 @@ class COFFSymbolRef {
getValue() == 0;
}
+ bool isEmptySectionDeclaration() const {
+ return isSection() && getSectionNumber() == COFF::IMAGE_SYM_UNDEFINED &&
+ getValue() == 0;
+ }
+
bool isWeakExternal() const {
return getStorageClass() == COFF::IMAGE_SYM_CLASS_WEAK_EXTERNAL;
}
|
To make the situation a bit clearer - the header object file for an LLVM generated object file looks like this: --- !COFF
header:
Machine: IMAGE_FILE_MACHINE_AMD64
Characteristics: [ ]
sections:
- Name: '.idata$2'
Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
Alignment: 4
SectionData: '0000000000000000000000000000000000000000'
SizeOfRawData: 20
Relocations:
- VirtualAddress: 12
SymbolName: '.idata$6'
Type: IMAGE_REL_AMD64_ADDR32NB
- VirtualAddress: 0
SymbolName: '.idata$4'
Type: IMAGE_REL_AMD64_ADDR32NB
- VirtualAddress: 16
SymbolName: '.idata$5'
Type: IMAGE_REL_AMD64_ADDR32NB
- Name: '.idata$6'
Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
Alignment: 2
SectionData: 666F6F2E646C6C00
SizeOfRawData: 8
symbols:
- Name: __IMPORT_DESCRIPTOR_foo
Value: 0
SectionNumber: 1
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_EXTERNAL
- Name: '.idata$2'
Value: 0
SectionNumber: 1
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_SECTION
- Name: '.idata$6'
Value: 0
SectionNumber: 2
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_STATIC
- Name: '.idata$4'
Value: 0
SectionNumber: 0
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_SECTION
- Name: '.idata$5'
Value: 0
SectionNumber: 0
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_SECTION
- Name: __NULL_IMPORT_DESCRIPTOR
Value: 0
SectionNumber: 0
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_EXTERNAL
- Name: "^?foo_NULL_THUNK_DATA"
Value: 0
SectionNumber: 0
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_EXTERNAL
... And for comparison, this is what it looks like in the header object of a GNU generated import library: --- !COFF
header:
Machine: IMAGE_FILE_MACHINE_AMD64
Characteristics: [ IMAGE_FILE_LINE_NUMS_STRIPPED ]
sections:
- Name: .text
Characteristics: [ IMAGE_SCN_CNT_CODE, IMAGE_SCN_MEM_EXECUTE, IMAGE_SCN_MEM_READ ]
Alignment: 16
SectionData: ''
- Name: .data
Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
Alignment: 16
SectionData: ''
- Name: .bss
Characteristics: [ IMAGE_SCN_CNT_UNINITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
Alignment: 16
SectionData: ''
- Name: '.idata$2'
Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
Alignment: 4
SectionData: '0000000000000000000000000000000000000000'
SizeOfRawData: 20
Relocations:
- VirtualAddress: 0
SymbolName: '.idata$4'
Type: IMAGE_REL_AMD64_ADDR32NB
- VirtualAddress: 12
SymbolName: __test_long_lib_iname
Type: IMAGE_REL_AMD64_ADDR32NB
- VirtualAddress: 16
SymbolName: '.idata$5'
Type: IMAGE_REL_AMD64_ADDR32NB
- Name: '.idata$5'
Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
Alignment: 4
SectionData: ''
- Name: '.idata$4'
Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
Alignment: 4
SectionData: ''
symbols:
- Name: .file
Value: 0
SectionNumber: -2
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_FILE
File: fake
- Name: hname
Value: 0
SectionNumber: 6
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_STATIC
- Name: fthunk
Value: 0
SectionNumber: 5
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_STATIC
- Name: .text
Value: 0
SectionNumber: 1
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_STATIC
SectionDefinition:
Length: 0
NumberOfRelocations: 0
NumberOfLinenumbers: 0
CheckSum: 0
Number: 0
- Name: .data
Value: 0
SectionNumber: 2
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_STATIC
SectionDefinition:
Length: 0
NumberOfRelocations: 0
NumberOfLinenumbers: 0
CheckSum: 0
Number: 0
- Name: .bss
Value: 0
SectionNumber: 3
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_STATIC
SectionDefinition:
Length: 0
NumberOfRelocations: 0
NumberOfLinenumbers: 0
CheckSum: 0
Number: 0
- Name: '.idata$2'
Value: 0
SectionNumber: 4
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_STATIC
SectionDefinition:
Length: 20
NumberOfRelocations: 3
NumberOfLinenumbers: 0
CheckSum: 0
Number: 0
- Name: '.idata$4'
Value: 0
SectionNumber: 6
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_STATIC
- Name: '.idata$5'
Value: 0
SectionNumber: 5
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_STATIC
- Name: _head_test_long_lib
Value: 0
SectionNumber: 4
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_EXTERNAL
- Name: __test_long_lib_iname
Value: 0
SectionNumber: 0
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_EXTERNAL
... In the latter case, we have actual sections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks overall reasonable to me.
I was curious about the behavior of other linkers. The MSVC linker doesn't seem to allow linking entire import libraries this way. Either I made a mistake during testing, Rust has some specific handling for this, or it’s simply not expected to work there.
On the other hand, ld.bfd does permit linking this way.
I also experimented with a simplified version of your yaml file to isolate the issue from import libraries (and archives in general):
--- !COFF
header:
Machine: IMAGE_FILE_MACHINE_AMD64
Characteristics: [ ]
sections:
- Name: '.itest$2'
Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
Alignment: 4
SectionData: '000000000000000000000000'
SizeOfRawData: 8
Relocations:
- VirtualAddress: 0
SymbolName: '.itest$4'
Type: IMAGE_REL_AMD64_ADDR64
- Name: '.itest$6'
Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
Alignment: 2
SectionData: 01000000
SizeOfRawData: 4
symbols:
- Name: '.itest$2'
Value: 0
SectionNumber: 1
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_SECTION
- Name: '.itest$6'
Value: 0
SectionNumber: 2
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_STATIC
- Name: '.itest$4'
Value: 0
SectionNumber: 0
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_SECTION
...
It might be worth adding something similar, paired with checks on the map file or by dumping section content as an additional test (that’s how I noticed the missing alignment).
lld/COFF/InputFiles.cpp
Outdated
// As there is no coff_section in the object file for these, make a | ||
// new virtual one, with everything zeroed out (i.e. an empty section), | ||
// with only the name and characteristics set. | ||
StringRef name = getName(); | ||
auto *hdr = make<coff_section>(); | ||
memset(hdr, 0, sizeof(*hdr)); | ||
strncpy(hdr->Name, name.data(), | ||
std::min(name.size(), (size_t)COFF::NameSize)); | ||
// We have no idea what characteristics should be assumed here; pick | ||
// a default. This matches what is used for .idata sections in the regular | ||
// object files in import libraries. | ||
hdr->Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ | | ||
IMAGE_SCN_MEM_WRITE; | ||
auto *sc = make<SectionChunk>(this, hdr); | ||
chunks.push_back(sc); | ||
return make<DefinedRegular>(this, /*Name*/ "", /*IsCOMDAT*/ false, | ||
/*IsExternal*/ false, sym.getGeneric(), sc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thought was to use EmptyChunk
, but it likely doesn't propagate partial section information, so this approach seems reasonable. We should ensure alignment is set explicitly to avoid defaulting to 16 (at least that's what ld.bfd seems to do).
It only doesn't work due to the symbol name |
Hmm, at least the testcase I'm adding here seems to work if I swap in link.exe instead of lld-link. (There one also sees the ideal behaviour where the import directory doesn't contain a duplicate entry for this DLL.)
Yes, a standalone test like that might be good to have as well. As the real use case is kinda entangled with with how LLD handles import libraries (and the curiosity where LLD normally entirely ignores these object files), I think it's good to test that case as well, but let's add such a standalone case too. |
aa4da95
to
ed06ce5
Compare
Your example yaml has got this small contradiction:
That's 12 bytes of zeros. To make the testcase more interesting, I tweaked this to contain 10 bytes, while setting the empty section to align to 4 bytes. When testing this synthetic case with MS link.exe, I'm hitting this error:
I guess this is what you hit and refer to? (Side note: Inspecting the short import libraries is quite messy, as all members have the same name; I have an old longstanding custom patch for It seems that link.exe gives this error if there's no other object file defining the empty section. If we do add a separate object that adds a concrete definition of the section, even if it is entirely empty, linking succeeds - and we can observe that link.exe doesn't give these empty sections any extra alignment at all. I guess that shows a small discrepancy to how link.exe handles it - it defers the section chunk definition entirely to another object file, inferring the alignment from there. While with the way I implement it here, we have to invent a section and make up alignment. I chose to set 4 byte alignment here, as that's what is needed for |
However, while I can get MS link.exe to link this example by adding such an extra object file, I note curiously that the value in the relocation ends up incorrect - it seems to set the address of the start of the whole |
Yes, a few variations of that combined with your findings about requiring it in a separate object file defining the section explains the behavior. Thanks! One variant I encountered with an import library happened because I used an overly simplified test case, passing only an import library with
I don’t think this is an issue in practice, though, since there would typically be actual imports present. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lld/COFF/InputFiles.cpp
Outdated
IMAGE_SCN_MEM_WRITE | IMAGE_SCN_ALIGN_4BYTES; | ||
auto *sc = make<SectionChunk>(this, hdr); | ||
chunks.push_back(sc); | ||
return make<DefinedRegular>(this, /*Name*/ "", /*IsCOMDAT*/ false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's /*Name=*/""
according to https://llvm.org/docs/CodingStandards.html#comment-formatting
A custom option sounds good to me. Based on the description, it seems like you had to detect duplicate names. Perhaps a simpler approach would be to prepend the member index to each file unconditionally (when the new option is used). |
When LLD links against an import library (for the regular, short import libraries), it doesn't actually link in the header/trailer object files at all, but synthesizes new corresponding data structures into the right sections. If the whole of such an import library is forced to be linked, e.g. with the -wholearchive: option, we actually end up linking in those header/trailer objects. The header objects contain a construct which LLD fails to handle; previously we'd error out with the error ".idata$4 should not refer to special section 0". Within the import library header object, in the import directory we have relocations towards the IAT (.idata$4 and .idata$5), but the header object itself doesn't contain any data for those sections. In the case of GNU generated import libraries, the header objects contain zero length sections .idata$4 and .idata$5, with relocations against them. However in the case of LLVM generated import libraries, the sections .idata$4 and .idata$5 are not included in the list of sections. The symbol table does contain section symbols for these sections, but without any actual associated section. This can probably be seen as a declaration of an empty section. If the header/trailer objects of a short import library are linked forcibly and we also reference other functions in the library, we end up with two import directory entries for this DLL, one that gets synthesized by LLD, and one from the actual header object file. This is inelegant, but should be acceptable. While it would seem unusual to link import libraries with the -wholearchive: option, this can happen in certain scenarios. Rust builds libraries that contain relevant import libraries bundled along with compiled Rust code as regular object files, all within one single archive. Such an archive can then end up linked with the -wholarchive: option, if build systems decide to use such an option for including static libraries. This should fix msys2/MINGW-packages#21017. This works for the header/trailer object files in import libraries generated by LLVM; import libraries generated by MSVC are vaguely different. ecb5ea6 did an attempt at fixing the issue for MSVC generated libraries, but it's not entirely correct, and isn't enough for making things work for that case.
ed06ce5
to
812ba2f
Compare
When LLD links against an import library (for the regular, short import libraries), it doesn't actually link in the header/trailer object files at all, but synthesizes new corresponding data structures into the right sections.
If the whole of such an import library is forced to be linked, e.g. with the -wholearchive: option, we actually end up linking in those header/trailer objects. The header objects contain a construct which LLD fails to handle; previously we'd error out with the error ".idata$4 should not refer to special section 0".
Within the import library header object, in the import directory we have relocations towards the IAT (.idata$4 and .idata$5), but the header object itself doesn't contain any data for those sections.
In the case of GNU generated import libraries, the header objects contain zero length sections .idata$4 and .idata$5, with relocations against them. However in the case of LLVM generated import libraries, the sections .idata$4 and .idata$5 are not included in the list of sections. The symbol table does contain section symbols for these sections, but without any actual associated section. This can probably be seen as a declaration of an empty section.
If the header/trailer objects of a short import library are linked forcibly and we also reference other functions in the library, we end up with two import directory entries for this DLL, one that gets synthesized by LLD, and one from the actual header object file. This is inelegant, but should be acceptable.
While it would seem unusual to link import libraries with the -wholearchive: option, this can happen in certain scenarios.
Rust builds libraries that contain relevant import libraries bundled along with compiled Rust code as regular object files, all within one single archive. Such an archive can then end up linked with the -wholarchive: option, if build systems decide to use such an option for including static libraries.
This should fix msys2/MINGW-packages#21017.
This works for the header/trailer object files in import libraries generated by LLVM; import libraries generated by MSVC are vaguely different. ecb5ea6 did an attempt at fixing the issue for MSVC generated libraries, but it's not entirely correct, and isn't enough for making things work for that case.