Skip to content

[LLD] [COFF] Fix linking import libraries with -wholearchive: #122806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 15, 2025

Conversation

mstorsjo
Copy link
Member

When LLD links against an import library (for the regular, short import libraries), it doesn't actually link in the header/trailer object files at all, but synthesizes new corresponding data structures into the right sections.

If the whole of such an import library is forced to be linked, e.g. with the -wholearchive: option, we actually end up linking in those header/trailer objects. The header objects contain a construct which LLD fails to handle; previously we'd error out with the error ".idata$4 should not refer to special section 0".

Within the import library header object, in the import directory we have relocations towards the IAT (.idata$4 and .idata$5), but the header object itself doesn't contain any data for those sections.

In the case of GNU generated import libraries, the header objects contain zero length sections .idata$4 and .idata$5, with relocations against them. However in the case of LLVM generated import libraries, the sections .idata$4 and .idata$5 are not included in the list of sections. The symbol table does contain section symbols for these sections, but without any actual associated section. This can probably be seen as a declaration of an empty section.

If the header/trailer objects of a short import library are linked forcibly and we also reference other functions in the library, we end up with two import directory entries for this DLL, one that gets synthesized by LLD, and one from the actual header object file. This is inelegant, but should be acceptable.

While it would seem unusual to link import libraries with the -wholearchive: option, this can happen in certain scenarios.

Rust builds libraries that contain relevant import libraries bundled along with compiled Rust code as regular object files, all within one single archive. Such an archive can then end up linked with the -wholarchive: option, if build systems decide to use such an option for including static libraries.

This should fix msys2/MINGW-packages#21017.

This works for the header/trailer object files in import libraries generated by LLVM; import libraries generated by MSVC are vaguely different. ecb5ea6 did an attempt at fixing the issue for MSVC generated libraries, but it's not entirely correct, and isn't enough for making things work for that case.

@llvmbot
Copy link
Member

llvmbot commented Jan 13, 2025

@llvm/pr-subscribers-lld
@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-lld-coff

Author: Martin Storsjö (mstorsjo)

Changes

When LLD links against an import library (for the regular, short import libraries), it doesn't actually link in the header/trailer object files at all, but synthesizes new corresponding data structures into the right sections.

If the whole of such an import library is forced to be linked, e.g. with the -wholearchive: option, we actually end up linking in those header/trailer objects. The header objects contain a construct which LLD fails to handle; previously we'd error out with the error ".idata$4 should not refer to special section 0".

Within the import library header object, in the import directory we have relocations towards the IAT (.idata$4 and .idata$5), but the header object itself doesn't contain any data for those sections.

In the case of GNU generated import libraries, the header objects contain zero length sections .idata$4 and .idata$5, with relocations against them. However in the case of LLVM generated import libraries, the sections .idata$4 and .idata$5 are not included in the list of sections. The symbol table does contain section symbols for these sections, but without any actual associated section. This can probably be seen as a declaration of an empty section.

If the header/trailer objects of a short import library are linked forcibly and we also reference other functions in the library, we end up with two import directory entries for this DLL, one that gets synthesized by LLD, and one from the actual header object file. This is inelegant, but should be acceptable.

While it would seem unusual to link import libraries with the -wholearchive: option, this can happen in certain scenarios.

Rust builds libraries that contain relevant import libraries bundled along with compiled Rust code as regular object files, all within one single archive. Such an archive can then end up linked with the -wholarchive: option, if build systems decide to use such an option for including static libraries.

This should fix msys2/MINGW-packages#21017.

This works for the header/trailer object files in import libraries generated by LLVM; import libraries generated by MSVC are vaguely different. ecb5ea6 did an attempt at fixing the issue for MSVC generated libraries, but it's not entirely correct, and isn't enough for making things work for that case.


Full diff: https://github.com/llvm/llvm-project/pull/122806.diff

3 Files Affected:

  • (modified) lld/COFF/InputFiles.cpp (+20)
  • (added) lld/test/COFF/wholearchive-implib.s (+35)
  • (modified) llvm/include/llvm/Object/COFF.h (+5)
diff --git a/lld/COFF/InputFiles.cpp b/lld/COFF/InputFiles.cpp
index a94c984cfd4870..86284f042fe7a3 100644
--- a/lld/COFF/InputFiles.cpp
+++ b/lld/COFF/InputFiles.cpp
@@ -746,6 +746,26 @@ std::optional<Symbol *> ObjFile::createDefined(
   if (sectionNumber == llvm::COFF::IMAGE_SYM_DEBUG)
     return nullptr;
 
+  if (sym.isEmptySectionDeclaration()) {
+    // As there is no coff_section in the object file for these, make a
+    // new virtual one, with everything zeroed out (i.e. an empty section),
+    // with only the name and characteristics set.
+    StringRef name = getName();
+    auto *hdr = make<coff_section>();
+    memset(hdr, 0, sizeof(*hdr));
+    strncpy(hdr->Name, name.data(),
+            std::min(name.size(), (size_t)COFF::NameSize));
+    // We have no idea what characteristics should be assumed here; pick
+    // a default. This matches what is used for .idata sections in the regular
+    // object files in import libraries.
+    hdr->Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ |
+                           IMAGE_SCN_MEM_WRITE;
+    auto *sc = make<SectionChunk>(this, hdr);
+    chunks.push_back(sc);
+    return make<DefinedRegular>(this, /*Name*/ "", /*IsCOMDAT*/ false,
+                                /*IsExternal*/ false, sym.getGeneric(), sc);
+  }
+
   if (llvm::COFF::isReservedSectionNumber(sectionNumber))
     Fatal(ctx) << toString(this) << ": " << getName()
                << " should not refer to special section "
diff --git a/lld/test/COFF/wholearchive-implib.s b/lld/test/COFF/wholearchive-implib.s
new file mode 100644
index 00000000000000..64b94ab730229e
--- /dev/null
+++ b/lld/test/COFF/wholearchive-implib.s
@@ -0,0 +1,35 @@
+// REQUIRES: x86
+// RUN: split-file %s %t.dir
+// RUN: llvm-lib -machine:amd64 -out:%t.lib -def:%t.dir/lib.def
+// RUN: llvm-mc -filetype=obj -triple=x86_64-windows %t.dir/main.s -o %t.main.obj
+
+// RUN: lld-link -out:%t.exe %t.main.obj -wholearchive:%t.lib -entry:entry -subsystem:console
+// RUN: llvm-readobj --coff-imports %t.exe | FileCheck %s
+
+// As LLD usually doesn't use the header/trailer object files from import
+// libraries, but instead synthesizes those structures, we end up with two
+// import directory entries if we force those objects to be included.
+
+// CHECK:      Import {
+// CHECK-NEXT:   Name: lib.dll
+// CHECK-NEXT:   ImportLookupTableRVA: 0x2050
+// CHECK-NEXT:   ImportAddressTableRVA: 0x2070
+// CHECK-NEXT: }
+// CHECK-NEXT: Import {
+// CHECK-NEXT:   Name: lib.dll
+// CHECK-NEXT:   ImportLookupTableRVA: 0x2058
+// CHECK-NEXT:   ImportAddressTableRVA: 0x2078
+// CHECK-NEXT:   Symbol: func (0)
+// CHECK-NEXT: }
+
+
+#--- main.s
+.global entry
+entry:
+  call func
+  ret
+
+#--- lib.def
+LIBRARY lib.dll
+EXPORTS
+func
diff --git a/llvm/include/llvm/Object/COFF.h b/llvm/include/llvm/Object/COFF.h
index 05b3587224c296..4de2c680f57b1a 100644
--- a/llvm/include/llvm/Object/COFF.h
+++ b/llvm/include/llvm/Object/COFF.h
@@ -392,6 +392,11 @@ class COFFSymbolRef {
            getValue() == 0;
   }
 
+  bool isEmptySectionDeclaration() const {
+    return isSection() && getSectionNumber() == COFF::IMAGE_SYM_UNDEFINED &&
+           getValue() == 0;
+  }
+
   bool isWeakExternal() const {
     return getStorageClass() == COFF::IMAGE_SYM_CLASS_WEAK_EXTERNAL;
   }

@mstorsjo
Copy link
Member Author

To make the situation a bit clearer - the header object file for an LLVM generated object file looks like this:

--- !COFF
header:
  Machine:         IMAGE_FILE_MACHINE_AMD64
  Characteristics: [  ]
sections:
  - Name:            '.idata$2'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
    Alignment:       4
    SectionData:     '0000000000000000000000000000000000000000'
    SizeOfRawData:   20
    Relocations:
      - VirtualAddress:  12
        SymbolName:      '.idata$6'
        Type:            IMAGE_REL_AMD64_ADDR32NB
      - VirtualAddress:  0
        SymbolName:      '.idata$4'
        Type:            IMAGE_REL_AMD64_ADDR32NB
      - VirtualAddress:  16
        SymbolName:      '.idata$5'
        Type:            IMAGE_REL_AMD64_ADDR32NB
  - Name:            '.idata$6'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
    Alignment:       2
    SectionData:     666F6F2E646C6C00
    SizeOfRawData:   8
symbols:
  - Name:            __IMPORT_DESCRIPTOR_foo
    Value:           0
    SectionNumber:   1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
  - Name:            '.idata$2'
    Value:           0
    SectionNumber:   1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_SECTION
  - Name:            '.idata$6'
    Value:           0
    SectionNumber:   2
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
  - Name:            '.idata$4'
    Value:           0
    SectionNumber:   0
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_SECTION
  - Name:            '.idata$5'
    Value:           0
    SectionNumber:   0
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_SECTION
  - Name:            __NULL_IMPORT_DESCRIPTOR
    Value:           0
    SectionNumber:   0
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
  - Name:            "^?foo_NULL_THUNK_DATA"
    Value:           0
    SectionNumber:   0
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
...

And for comparison, this is what it looks like in the header object of a GNU generated import library:

--- !COFF            
header:
  Machine:         IMAGE_FILE_MACHINE_AMD64
  Characteristics: [ IMAGE_FILE_LINE_NUMS_STRIPPED ]
sections:
  - Name:            .text
    Characteristics: [ IMAGE_SCN_CNT_CODE, IMAGE_SCN_MEM_EXECUTE, IMAGE_SCN_MEM_READ ]
    Alignment:       16
    SectionData:     ''
  - Name:            .data
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
    Alignment:       16
    SectionData:     ''
  - Name:            .bss
    Characteristics: [ IMAGE_SCN_CNT_UNINITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]   
    Alignment:       16
    SectionData:     ''
  - Name:            '.idata$2'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
    Alignment:       4 
    SectionData:     '0000000000000000000000000000000000000000'
    SizeOfRawData:   20
    Relocations:     
      - VirtualAddress:  0
        SymbolName:      '.idata$4'
        Type:            IMAGE_REL_AMD64_ADDR32NB
      - VirtualAddress:  12
        SymbolName:      __test_long_lib_iname
        Type:            IMAGE_REL_AMD64_ADDR32NB
      - VirtualAddress:  16
        SymbolName:      '.idata$5'
        Type:            IMAGE_REL_AMD64_ADDR32NB
  - Name:            '.idata$5'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]     
    Alignment:       4
    SectionData:     ''
  - Name:            '.idata$4'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
    Alignment:       4 
    SectionData:     ''
symbols:
  - Name:            .file
    Value:           0 
    SectionNumber:   -2
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_FILE
    File:            fake
  - Name:            hname
    Value:           0
    SectionNumber:   6
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
  - Name:            fthunk
    Value:           0
    SectionNumber:   5
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
  - Name:            .text
    Value:           0
    SectionNumber:   1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
    SectionDefinition:
      Length:          0
      NumberOfRelocations: 0
      NumberOfLinenumbers: 0
      CheckSum:        0
      Number:          0
  - Name:            .data
    Value:           0
    SectionNumber:   2
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
    SectionDefinition:
      Length:          0
      NumberOfRelocations: 0
      NumberOfLinenumbers: 0
      CheckSum:        0
      Number:          0
  - Name:            .bss
    Value:           0
    SectionNumber:   3
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
    SectionDefinition:
      Length:          0
      NumberOfRelocations: 0
      NumberOfLinenumbers: 0
      CheckSum:        0
      Number:          0
  - Name:            '.idata$2'
    Value:           0
    SectionNumber:   4
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
    SectionDefinition:
      Length:          20
      NumberOfRelocations: 3
      NumberOfLinenumbers: 0
      CheckSum:        0
      Number:          0
  - Name:            '.idata$4'
    Value:           0
    SectionNumber:   6
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
  - Name:            '.idata$5'
    Value:           0
    SectionNumber:   5
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
  - Name:            _head_test_long_lib
    Value:           0
    SectionNumber:   4
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
  - Name:            __test_long_lib_iname
    Value:           0
    SectionNumber:   0
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
...

In the latter case, we have actual sections .idata$4 and .idata$5, while they are empty, but in the former case, we have IMAGE_SYM_CLASS_STATIC symbols with SectionNumber: 0.

Copy link
Contributor

@cjacek cjacek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks overall reasonable to me.

I was curious about the behavior of other linkers. The MSVC linker doesn't seem to allow linking entire import libraries this way. Either I made a mistake during testing, Rust has some specific handling for this, or it’s simply not expected to work there.

On the other hand, ld.bfd does permit linking this way.

I also experimented with a simplified version of your yaml file to isolate the issue from import libraries (and archives in general):

--- !COFF
header:
  Machine:         IMAGE_FILE_MACHINE_AMD64
  Characteristics: [  ]
sections:
  - Name:            '.itest$2'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
    Alignment:       4
    SectionData:     '000000000000000000000000'
    SizeOfRawData:   8
    Relocations:
      - VirtualAddress:  0
        SymbolName:      '.itest$4'
        Type:            IMAGE_REL_AMD64_ADDR64
  - Name:            '.itest$6'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
    Alignment:       2
    SectionData:     01000000
    SizeOfRawData:   4
symbols:
  - Name:            '.itest$2'
    Value:           0
    SectionNumber:   1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_SECTION
  - Name:            '.itest$6'
    Value:           0
    SectionNumber:   2
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
  - Name:            '.itest$4'
    Value:           0
    SectionNumber:   0
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_SECTION
...

It might be worth adding something similar, paired with checks on the map file or by dumping section content as an additional test (that’s how I noticed the missing alignment).

Comment on lines 750 to 766
// As there is no coff_section in the object file for these, make a
// new virtual one, with everything zeroed out (i.e. an empty section),
// with only the name and characteristics set.
StringRef name = getName();
auto *hdr = make<coff_section>();
memset(hdr, 0, sizeof(*hdr));
strncpy(hdr->Name, name.data(),
std::min(name.size(), (size_t)COFF::NameSize));
// We have no idea what characteristics should be assumed here; pick
// a default. This matches what is used for .idata sections in the regular
// object files in import libraries.
hdr->Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ |
IMAGE_SCN_MEM_WRITE;
auto *sc = make<SectionChunk>(this, hdr);
chunks.push_back(sc);
return make<DefinedRegular>(this, /*Name*/ "", /*IsCOMDAT*/ false,
/*IsExternal*/ false, sym.getGeneric(), sc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was to use EmptyChunk, but it likely doesn't propagate partial section information, so this approach seems reasonable. We should ensure alignment is set explicitly to avoid defaulting to 16 (at least that's what ld.bfd seems to do).

@ChrisDenton
Copy link

The MSVC linker doesn't seem to allow linking entire import libraries this way.

It only doesn't work due to the symbol name __NULL_IMPORT_DESCRIPTOR not being unique (because every import library has at least one of them). That conflict can be worked around by renaming it to something that's more unique.

@mstorsjo
Copy link
Member Author

I was curious about the behavior of other linkers. The MSVC linker doesn't seem to allow linking entire import libraries this way. Either I made a mistake during testing, Rust has some specific handling for this, or it’s simply not expected to work there.

Hmm, at least the testcase I'm adding here seems to work if I swap in link.exe instead of lld-link. (There one also sees the ideal behaviour where the import directory doesn't contain a duplicate entry for this DLL.)

I also experimented with a simplified version of your yaml file to isolate the issue from import libraries (and archives in general):

--- !COFF
header:
  Machine:         IMAGE_FILE_MACHINE_AMD64
  Characteristics: [  ]
sections:
  - Name:            '.itest$2'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
    Alignment:       4
    SectionData:     '000000000000000000000000'
    SizeOfRawData:   8
    Relocations:
      - VirtualAddress:  0
        SymbolName:      '.itest$4'
        Type:            IMAGE_REL_AMD64_ADDR64
  - Name:            '.itest$6'
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
    Alignment:       2
    SectionData:     01000000
    SizeOfRawData:   4
symbols:
  - Name:            '.itest$2'
    Value:           0
    SectionNumber:   1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_SECTION
  - Name:            '.itest$6'
    Value:           0
    SectionNumber:   2
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
  - Name:            '.itest$4'
    Value:           0
    SectionNumber:   0
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_SECTION
...

It might be worth adding something similar, paired with checks on the map file or by dumping section content as an additional test (that’s how I noticed the missing alignment).

Yes, a standalone test like that might be good to have as well. As the real use case is kinda entangled with with how LLD handles import libraries (and the curiosity where LLD normally entirely ignores these object files), I think it's good to test that case as well, but let's add such a standalone case too.

@mstorsjo mstorsjo force-pushed the lld-empty-section-ref branch from aa4da95 to ed06ce5 Compare January 15, 2025 09:36
@mstorsjo
Copy link
Member Author

Your example yaml has got this small contradiction:

    SectionData:     '000000000000000000000000'
    SizeOfRawData:   8

That's 12 bytes of zeros. To make the testcase more interesting, I tweaked this to contain 10 bytes, while setting the empty section to align to 4 bytes.

When testing this synthetic case with MS link.exe, I'm hitting this error:

LINK : fatal error LNK1000: unknown error at 0000000140102A09; consult documentation for technical support options

I guess this is what you hit and refer to?

(Side note: Inspecting the short import libraries is quite messy, as all members have the same name; I have an old longstanding custom patch for llvm-ar that adds an extra option, that allows extracting them as foo.dll, foo.dll-2, foo.dll-3 etc, for inspecting import libraries. If you have any suggestion on a clean way to achieve the same with upstream functionality, or a way to handle this, I'm all ears.)

It seems that link.exe gives this error if there's no other object file defining the empty section. If we do add a separate object that adds a concrete definition of the section, even if it is entirely empty, linking succeeds - and we can observe that link.exe doesn't give these empty sections any extra alignment at all.

I guess that shows a small discrepancy to how link.exe handles it - it defers the section chunk definition entirely to another object file, inferring the alignment from there. While with the way I implement it here, we have to invent a section and make up alignment. I chose to set 4 byte alignment here, as that's what is needed for .idata$4 anyway; if we'd choose something less than what the individual IAT entries use, we'd end up with a gap between the header's reference to .idata$4 and the actual contents.

@mstorsjo
Copy link
Member Author

It seems that link.exe gives this error if there's no other object file defining the empty section. If we do add a separate object that adds a concrete definition of the section, even if it is entirely empty, linking succeeds - and we can observe that link.exe doesn't give these empty sections any extra alignment at all.

However, while I can get MS link.exe to link this example by adding such an extra object file, I note curiously that the value in the relocation ends up incorrect - it seems to set the address of the start of the whole .idata section, not the start of the .idata$4 section chunk as intended. This happens both with the IMAGE_REL_AMD64_ADDR64 relocation you used in your example yaml, and with IMAGE_REL_AMD64_ADDR32NB as the real import directory uses. That's weird...

@cjacek
Copy link
Contributor

cjacek commented Jan 15, 2025

I guess this is what you hit and refer to?

Yes, a few variations of that combined with your findings about requiring it in a separate object file defining the section explains the behavior. Thanks! One variant I encountered with an import library happened because I used an overly simplified test case, passing only an import library with -dll -noentry and no accompanying object file. As a result, no actual imports were made, and .idata$4 wasn’t created through other means. This led to the error:

wholearchive-implib.s.tmp.lib : fatal error LNK1127: library is corrupt

I don’t think this is an issue in practice, though, since there would typically be actual imports present.

Copy link
Contributor

@cjacek cjacek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

IMAGE_SCN_MEM_WRITE | IMAGE_SCN_ALIGN_4BYTES;
auto *sc = make<SectionChunk>(this, hdr);
chunks.push_back(sc);
return make<DefinedRegular>(this, /*Name*/ "", /*IsCOMDAT*/ false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjacek
Copy link
Contributor

cjacek commented Jan 15, 2025

Side note: Inspecting the short import libraries is quite messy, as all members have the same name; I have an old longstanding custom patch for llvm-ar that adds an extra option, that allows extracting them as foo.dll, foo.dll-2, foo.dll-3 etc, for inspecting import libraries. If you have any suggestion on a clean way to achieve the same with upstream functionality, or a way to handle this, I'm all ears.

A custom option sounds good to me. Based on the description, it seems like you had to detect duplicate names. Perhaps a simpler approach would be to prepend the member index to each file unconditionally (when the new option is used).

When LLD links against an import library (for the regular, short
import libraries), it doesn't actually link in the header/trailer
object files at all, but synthesizes new corresponding data
structures into the right sections.

If the whole of such an import library is forced to be linked,
e.g. with the -wholearchive: option, we actually end up linking
in those header/trailer objects. The header objects contain a
construct which LLD fails to handle; previously we'd error out
with the error ".idata$4 should not refer to special section 0".

Within the import library header object, in the import directory
we have relocations towards the IAT (.idata$4 and .idata$5),
but the header object itself doesn't contain any data for those
sections.

In the case of GNU generated import libraries, the header objects
contain zero length sections .idata$4 and .idata$5, with
relocations against them. However in the case of LLVM generated
import libraries, the sections .idata$4 and .idata$5 are not
included in the list of sections. The symbol table does contain
section symbols for these sections, but without any actual
associated section. This can probably be seen as a declaration of
an empty section.

If the header/trailer objects of a short import library are
linked forcibly and we also reference other functions in the
library, we end up with two import directory entries for this
DLL, one that gets synthesized by LLD, and one from the actual
header object file. This is inelegant, but should be acceptable.

While it would seem unusual to link import libraries with the
-wholearchive: option, this can happen in certain scenarios.

Rust builds libraries that contain relevant import libraries bundled
along with compiled Rust code as regular object files, all within
one single archive. Such an archive can then end up linked with the
-wholarchive: option, if build systems decide to use such an option
for including static libraries.

This should fix msys2/MINGW-packages#21017.

This works for the header/trailer object files in import libraries
generated by LLVM; import libraries generated by MSVC are vaguely
different. ecb5ea6 did an attempt
at fixing the issue for MSVC generated libraries, but it's not
entirely correct, and isn't enough for making things work for
that case.
@mstorsjo mstorsjo force-pushed the lld-empty-section-ref branch from ed06ce5 to 812ba2f Compare January 15, 2025 12:00
@mstorsjo mstorsjo merged commit 4a4a8a1 into llvm:main Jan 15, 2025
8 checks passed
@mstorsjo mstorsjo deleted the lld-empty-section-ref branch January 15, 2025 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

librsvg problem in clang
4 participants