-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WebAssembly] Support multiple .init_array
fragments when writing Wasm objects
#111008
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-webassembly @llvm/pr-subscribers-mc Author: George Stagg (georgestagg) ChangesFor WebAssembly objects, only functions listed in the first symbol in the Consider the C program: #import <stdio.h>
void init1() { puts("init1"); }
void init2() { puts("init2"); }
void init3() { puts("init3"); }
typedef void (*f)(void);
__attribute__((section(".init_array"))) void (*p_init1)(void) = &init1;
__attribute__((section(".init_array"))) f p_init2[2] = { &init2, &init3 };
int main() {
puts("main");
return 0;
} Compiling for Linux results in each function running before
In previous LLVM versions (<=18, I think?) this also used to throw an error message: This makes things difficult for certain tools that rely on writing multiple objects in the This PR makes two changes to
This picks up any functions listed in later objects in the However, a related issue is revealed by changing the functions in the example above to read: void init1() { puts("1"); }
void init2() { puts("2"); }
void init3() { puts("3"); } Then, with our patched LLVM,
The symbols in the Is this a mistake? AFAICT I don't think there's an appropriate kind for these symbols to be written to the Interestingly, you can also reveal this problem in an unpatched LLVM by writing a Wasm object file that has no data section but does supply an void init1() {}
__attribute__((section(".init_array"))) void (*p_init1)(void) = &init1;
int main() { return 0; }
So, this leads me to the related second change:
With the two changes above, the modified test program above compiles with Emscripten and runs on Wasm with the expected behaviour:
Finally, I have tried to include a unit test. Hopefully it looks reasonable -- it's based on copying similar test files in that directory. Full diff: https://github.com/llvm/llvm-project/pull/111008.diff 2 Files Affected:
diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp
index f25dc92fa235a2..034e058690744a 100644
--- a/llvm/lib/MC/WasmObjectWriter.cpp
+++ b/llvm/lib/MC/WasmObjectWriter.cpp
@@ -1769,6 +1769,11 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
WS.setIndex(InvalidIndex);
continue;
}
+ // Contents of .init_array sections are handled elsewhere.
+ if (WS.isDefined() &&
+ WS.getSection().getName().starts_with(".init_array")) {
+ continue;
+ }
LLVM_DEBUG(dbgs() << "adding to symtab: " << WS << "\n");
uint32_t Flags = 0;
@@ -1853,49 +1858,54 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
if (EmptyFrag.getKind() != MCFragment::FT_Data)
report_fatal_error(".init_array section should be aligned");
- const MCFragment &AlignFrag = *EmptyFrag.getNext();
- if (AlignFrag.getKind() != MCFragment::FT_Align)
- report_fatal_error(".init_array section should be aligned");
- if (cast<MCAlignFragment>(AlignFrag).getAlignment() !=
- Align(is64Bit() ? 8 : 4))
- report_fatal_error(".init_array section should be aligned for pointers");
-
- const MCFragment &Frag = *AlignFrag.getNext();
- if (Frag.hasInstructions() || Frag.getKind() != MCFragment::FT_Data)
- report_fatal_error("only data supported in .init_array section");
-
- uint16_t Priority = UINT16_MAX;
- unsigned PrefixLength = strlen(".init_array");
- if (WS.getName().size() > PrefixLength) {
- if (WS.getName()[PrefixLength] != '.')
+ const MCFragment *nextFrag = EmptyFrag.getNext();
+ while (nextFrag != nullptr) {
+ const MCFragment &AlignFrag = *nextFrag;
+ if (AlignFrag.getKind() != MCFragment::FT_Align)
+ report_fatal_error(".init_array section should be aligned");
+ if (cast<MCAlignFragment>(AlignFrag).getAlignment() !=
+ Align(is64Bit() ? 8 : 4))
report_fatal_error(
- ".init_array section priority should start with '.'");
- if (WS.getName().substr(PrefixLength + 1).getAsInteger(10, Priority))
- report_fatal_error("invalid .init_array section priority");
- }
- const auto &DataFrag = cast<MCDataFragment>(Frag);
- const SmallVectorImpl<char> &Contents = DataFrag.getContents();
- for (const uint8_t *
- P = (const uint8_t *)Contents.data(),
- *End = (const uint8_t *)Contents.data() + Contents.size();
- P != End; ++P) {
- if (*P != 0)
- report_fatal_error("non-symbolic data in .init_array section");
- }
- for (const MCFixup &Fixup : DataFrag.getFixups()) {
- assert(Fixup.getKind() ==
- MCFixup::getKindForSize(is64Bit() ? 8 : 4, false));
- const MCExpr *Expr = Fixup.getValue();
- auto *SymRef = dyn_cast<MCSymbolRefExpr>(Expr);
- if (!SymRef)
- report_fatal_error("fixups in .init_array should be symbol references");
- const auto &TargetSym = cast<const MCSymbolWasm>(SymRef->getSymbol());
- if (TargetSym.getIndex() == InvalidIndex)
- report_fatal_error("symbols in .init_array should exist in symtab");
- if (!TargetSym.isFunction())
- report_fatal_error("symbols in .init_array should be for functions");
- InitFuncs.push_back(
- std::make_pair(Priority, TargetSym.getIndex()));
+ ".init_array section should be aligned for pointers");
+
+ const MCFragment &Frag = *AlignFrag.getNext();
+ nextFrag = Frag.getNext();
+ if (Frag.hasInstructions() || Frag.getKind() != MCFragment::FT_Data)
+ report_fatal_error("only data supported in .init_array section");
+
+ uint16_t Priority = UINT16_MAX;
+ unsigned PrefixLength = strlen(".init_array");
+ if (WS.getName().size() > PrefixLength) {
+ if (WS.getName()[PrefixLength] != '.')
+ report_fatal_error(
+ ".init_array section priority should start with '.'");
+ if (WS.getName().substr(PrefixLength + 1).getAsInteger(10, Priority))
+ report_fatal_error("invalid .init_array section priority");
+ }
+ const auto &DataFrag = cast<MCDataFragment>(Frag);
+ const SmallVectorImpl<char> &Contents = DataFrag.getContents();
+ for (const uint8_t *
+ P = (const uint8_t *)Contents.data(),
+ *End = (const uint8_t *)Contents.data() + Contents.size();
+ P != End; ++P) {
+ if (*P != 0)
+ report_fatal_error("non-symbolic data in .init_array section");
+ }
+ for (const MCFixup &Fixup : DataFrag.getFixups()) {
+ assert(Fixup.getKind() ==
+ MCFixup::getKindForSize(is64Bit() ? 8 : 4, false));
+ const MCExpr *Expr = Fixup.getValue();
+ auto *SymRef = dyn_cast<MCSymbolRefExpr>(Expr);
+ if (!SymRef)
+ report_fatal_error(
+ "fixups in .init_array should be symbol references");
+ const auto &TargetSym = cast<const MCSymbolWasm>(SymRef->getSymbol());
+ if (TargetSym.getIndex() == InvalidIndex)
+ report_fatal_error("symbols in .init_array should exist in symtab");
+ if (!TargetSym.isFunction())
+ report_fatal_error("symbols in .init_array should be for functions");
+ InitFuncs.push_back(std::make_pair(Priority, TargetSym.getIndex()));
+ }
}
}
diff --git a/llvm/test/MC/WebAssembly/init-array.ll b/llvm/test/MC/WebAssembly/init-array.ll
new file mode 100644
index 00000000000000..d19ddcf10d5d9c
--- /dev/null
+++ b/llvm/test/MC/WebAssembly/init-array.ll
@@ -0,0 +1,56 @@
+; RUN: llc -mcpu=mvp -filetype=obj %s -o - | obj2yaml | FileCheck %s
+
+target triple = "wasm32-unknown-unknown"
+
+@p_init1 = hidden global ptr @init1, section ".init_array", align 4
+@p_init2 = hidden global ptr @init2, section ".init_array", align 4
+
+define hidden void @init1() #0 { ret void }
+define hidden void @init2() #0 { ret void }
+
+
+; CHECK: - Type: IMPORT
+; CHECK-NEXT: Imports:
+; CHECK-NEXT: - Module: env
+; CHECK-NEXT: Field: __linear_memory
+; CHECK-NEXT: Kind: MEMORY
+; CHECK-NEXT: Memory:
+; CHECK-NEXT: Minimum: 0x0
+; CHECK-NEXT: - Module: env
+; CHECK-NEXT: Field: __indirect_function_table
+; CHECK-NEXT: Kind: TABLE
+; CHECK-NEXT: Table:
+; CHECK-NEXT: Index: 0
+; CHECK-NEXT: ElemType: FUNCREF
+; CHECK-NEXT: Limits:
+; CHECK-NEXT: Minimum: 0x0
+; CHECK-NEXT: - Type: FUNCTION
+; CHECK-NEXT: FunctionTypes: [ 0, 0 ]
+; CHECK-NEXT: - Type: CODE
+; CHECK-NEXT: Functions:
+; CHECK-NEXT: - Index: 0
+; CHECK-NEXT: Locals: []
+; CHECK-NEXT: Body: 0B
+; CHECK-NEXT: - Index: 1
+; CHECK-NEXT: Locals: []
+; CHECK-NEXT: Body: 0B
+; CHECK-NEXT: - Type: CUSTOM
+; CHECK-NEXT: Name: linking
+; CHECK-NEXT: Version: 2
+; CHECK-NEXT: SymbolTable:
+; CHECK-NEXT: - Index: 0
+; CHECK-NEXT: Kind: FUNCTION
+; CHECK-NEXT: Name: init1
+; CHECK-NEXT: Flags: [ VISIBILITY_HIDDEN ]
+; CHECK-NEXT: Function: 0
+; CHECK-NEXT: - Index: 1
+; CHECK-NEXT: Kind: FUNCTION
+; CHECK-NEXT: Name: init2
+; CHECK-NEXT: Flags: [ VISIBILITY_HIDDEN ]
+; CHECK-NEXT: Function: 1
+; CHECK-NEXT: InitFunctions:
+; CHECK-NEXT: - Priority: 65535
+; CHECK-NEXT: Symbol: 0
+; CHECK-NEXT: - Priority: 65535
+; CHECK-NEXT: Symbol: 1
+; CHECK-NEXT: ...
|
@sbc100 who should we ask to review this? |
From git blame looks like @sbc100 would be the most appropriate reviewer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. I suppose that C++ and rust always generate just a single fragment which is why we haven't seen issues with this before?
@@ -1769,6 +1769,11 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm, | |||
WS.setIndex(InvalidIndex); | |||
continue; | |||
} | |||
// Contents of .init_array sections are handled elsewhere. | |||
if (WS.isDefined() && | |||
WS.getSection().getName().starts_with(".init_array")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it possible to define a symbol within this section? Can you give and example? It seems like it should be just a sequence of pointers, but I guess somehow you can have symbol that points into that sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the following C code does so:
#import <stdio.h>
void init1() { puts("init1"); }
__attribute__((section(".init_array"))) void (*p_init[3])(void) = { &init1, &init1, &init1 };
int main() { return 0; }
Under (e.g.) Linux, a list of pointers is written to the .init_array
section and a symbol p_init
points to the sequence.
$ clang -c symbol.c -o symbol.o
$ obj2yaml symbol.o
[...]
Symbols:
[...]
- Name: p_init
Type: STT_OBJECT
Section: .init_array
Binding: STB_GLOBAL
Size: 0x18
However, with a WebAssembly object this does not make as much sense. WasmObjectWriter.cpp
converts the sequence of pointers in .init_array
section into entries in the WASM_INIT_FUNC
subsection of the custom linking
section instead. There they are no longer written as a list of pointers, but indexes into a table.
So, this particular change makes it so that the p_init
symbol is not written into the WebAssembly object file. In the current version a symbol is actually written, but it is broken in any case since it points to the incorrect section (leading to the invalid data segment index: 0
bug shown in the OP).
BTW viewing this diff with "ignore whitespace" helped a lot to see what was going on. Regarding the test case, can it be written in assembly instead? |
I guess so, let me throw together a quick C++ example and see how it currently handles the constructors...
First, compile and test this C++ code:
#include <iostream>
using namespace std;
class Init1 {
public: Init1() { cout << "Init1" << endl; }
};
class Init2 {
public: Init2() { cout << "Init2" << endl; }
};
Init1 init1a;
Init1 init1b;
Init2 init2;
int main() {
cout << "main" << endl;
return 0;
}
Let's make sure the
Yes, but just with a single function. I suppose the constructor functions have been coalesced into a single function. Let's take a look in
Indeed, the global constructors are combined into a single function. Note that LLVM generates code using Yes, at least for simple C++.
Yes, apologies. The diff is smaller than it looks due to the change in indentation!
Unfortunately, I'm not sure how to express this in WebAssembly text format directly. There is an open issue about defining custom sections (WebAssembly/design#1153) in the text format and I'm not entirely sure how to express the test, since the bug occurs in the code that converts the I would be willing to change the test, but would need specific help in how to proceed. |
For WebAssembly objects, only functions listed in the first symbol in the
.init_array
section are invoked.Consider the C program:
Compiling for Linux results in each function running before
main()
. But for WebAssembly (here via Emscripten) only the first function is invoked:In previous LLVM versions (<=18, I think?) this also used to throw an error message:
fatal error: error in backend: only one .init_array section fragment supported
.This makes things difficult for certain tools that rely on writing multiple objects in the
.init_array
section for life-before-main initialisation functions.This PR makes two changes to
WasmObjectWriter
:.init_array
section into start functions, (adding them to anInitFuncs
vector for writing out later), now loops through all the subsequent fragments of the section until there are none remaining.This picks up any functions listed in later objects in the
.init_array
section, and they are now successfully invoked before main.However, a related issue is revealed by changing the functions in the example above to read:
Then, with our patched LLVM,
The symbols in the
.init_array
section are written to theWASM_SYMBOL_TABLE
in the customlinking
section, with kindWASM_SYMBOL_TYPE_DATA
. But, these symbols do not reference content in the DATA section, and so the written offset makes no sense.Is this a mistake? AFAICT I don't think there's an appropriate kind for these symbols to be written to the
WASM_SYMBOL_TABLE
subsection with. In fact, symbols for the functions referenced by the.init_array
sections are independently written to a differentWASM_INIT_FUNC
subsection in any case.Interestingly, you can also reveal this problem in an unpatched LLVM by writing a Wasm object file that has no data section but does supply an
.init_array
function:So, this leads me to the related second change:
.init_array
section to theWASM_SYMBOL_TABLE
subsection of thelinking
custom section, because these are handled elsewhere and written to theWASM_INIT_FUNCS
subsection instead.With the two changes above, the modified test program above compiles with Emscripten and runs on Wasm with the expected behaviour:
Finally, I have tried to include a unit test. Hopefully it looks reasonable -- it's based on copying similar test files in that directory.