- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
          [WebAssembly] Support multiple .init_array fragments when writing Wasm objects
          #111008
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using  If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. | 
| @llvm/pr-subscribers-backend-webassembly @llvm/pr-subscribers-mc Author: George Stagg (georgestagg) ChangesFor WebAssembly objects, only functions listed in the first symbol in the  Consider the C program: #import <stdio.h>
void init1() { puts("init1"); }
void init2() { puts("init2"); }
void init3() { puts("init3"); }
typedef void (*f)(void);
__attribute__((section(".init_array"))) void (*p_init1)(void) = &init1;
__attribute__((section(".init_array"))) f p_init2[2] = { &init2, &init3 };
int main() {
	puts("main");
	return 0;
}Compiling for Linux results in each function running before  In previous LLVM versions (<=18, I think?) this also used to throw an error message:  This makes things difficult for certain tools that rely on writing multiple objects in the  This PR makes two changes to  
 This picks up any functions listed in later objects in the  However, a related issue is revealed by changing the functions in the example above to read: void init1() { puts("1"); }
void init2() { puts("2"); }
void init3() { puts("3"); }Then, with our patched LLVM, The symbols in the  Is this a mistake? AFAICT I don't think there's an appropriate kind for these symbols to be written to the  Interestingly, you can also reveal this problem in an unpatched LLVM by writing a Wasm object file that has no data section but does supply an  void init1() {}
__attribute__((section(".init_array"))) void (*p_init1)(void) = &init1;
int main() { return 0; }So, this leads me to the related second change: 
 With the two changes above, the modified test program above compiles with Emscripten and runs on Wasm with the expected behaviour: Finally, I have tried to include a unit test. Hopefully it looks reasonable -- it's based on copying similar test files in that directory. Full diff: https://github.com/llvm/llvm-project/pull/111008.diff 2 Files Affected: 
 diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp
index f25dc92fa235a2..034e058690744a 100644
--- a/llvm/lib/MC/WasmObjectWriter.cpp
+++ b/llvm/lib/MC/WasmObjectWriter.cpp
@@ -1769,6 +1769,11 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
       WS.setIndex(InvalidIndex);
       continue;
     }
+    // Contents of .init_array sections are handled elsewhere.
+    if (WS.isDefined() &&
+        WS.getSection().getName().starts_with(".init_array")) {
+      continue;
+    }
     LLVM_DEBUG(dbgs() << "adding to symtab: " << WS << "\n");
 
     uint32_t Flags = 0;
@@ -1853,49 +1858,54 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
     if (EmptyFrag.getKind() != MCFragment::FT_Data)
       report_fatal_error(".init_array section should be aligned");
 
-    const MCFragment &AlignFrag = *EmptyFrag.getNext();
-    if (AlignFrag.getKind() != MCFragment::FT_Align)
-      report_fatal_error(".init_array section should be aligned");
-    if (cast<MCAlignFragment>(AlignFrag).getAlignment() !=
-        Align(is64Bit() ? 8 : 4))
-      report_fatal_error(".init_array section should be aligned for pointers");
-
-    const MCFragment &Frag = *AlignFrag.getNext();
-    if (Frag.hasInstructions() || Frag.getKind() != MCFragment::FT_Data)
-      report_fatal_error("only data supported in .init_array section");
-
-    uint16_t Priority = UINT16_MAX;
-    unsigned PrefixLength = strlen(".init_array");
-    if (WS.getName().size() > PrefixLength) {
-      if (WS.getName()[PrefixLength] != '.')
+    const MCFragment *nextFrag = EmptyFrag.getNext();
+    while (nextFrag != nullptr) {
+      const MCFragment &AlignFrag = *nextFrag;
+      if (AlignFrag.getKind() != MCFragment::FT_Align)
+        report_fatal_error(".init_array section should be aligned");
+      if (cast<MCAlignFragment>(AlignFrag).getAlignment() !=
+          Align(is64Bit() ? 8 : 4))
         report_fatal_error(
-            ".init_array section priority should start with '.'");
-      if (WS.getName().substr(PrefixLength + 1).getAsInteger(10, Priority))
-        report_fatal_error("invalid .init_array section priority");
-    }
-    const auto &DataFrag = cast<MCDataFragment>(Frag);
-    const SmallVectorImpl<char> &Contents = DataFrag.getContents();
-    for (const uint8_t *
-             P = (const uint8_t *)Contents.data(),
-            *End = (const uint8_t *)Contents.data() + Contents.size();
-         P != End; ++P) {
-      if (*P != 0)
-        report_fatal_error("non-symbolic data in .init_array section");
-    }
-    for (const MCFixup &Fixup : DataFrag.getFixups()) {
-      assert(Fixup.getKind() ==
-             MCFixup::getKindForSize(is64Bit() ? 8 : 4, false));
-      const MCExpr *Expr = Fixup.getValue();
-      auto *SymRef = dyn_cast<MCSymbolRefExpr>(Expr);
-      if (!SymRef)
-        report_fatal_error("fixups in .init_array should be symbol references");
-      const auto &TargetSym = cast<const MCSymbolWasm>(SymRef->getSymbol());
-      if (TargetSym.getIndex() == InvalidIndex)
-        report_fatal_error("symbols in .init_array should exist in symtab");
-      if (!TargetSym.isFunction())
-        report_fatal_error("symbols in .init_array should be for functions");
-      InitFuncs.push_back(
-          std::make_pair(Priority, TargetSym.getIndex()));
+            ".init_array section should be aligned for pointers");
+
+      const MCFragment &Frag = *AlignFrag.getNext();
+      nextFrag = Frag.getNext();
+      if (Frag.hasInstructions() || Frag.getKind() != MCFragment::FT_Data)
+        report_fatal_error("only data supported in .init_array section");
+
+      uint16_t Priority = UINT16_MAX;
+      unsigned PrefixLength = strlen(".init_array");
+      if (WS.getName().size() > PrefixLength) {
+        if (WS.getName()[PrefixLength] != '.')
+          report_fatal_error(
+              ".init_array section priority should start with '.'");
+        if (WS.getName().substr(PrefixLength + 1).getAsInteger(10, Priority))
+          report_fatal_error("invalid .init_array section priority");
+      }
+      const auto &DataFrag = cast<MCDataFragment>(Frag);
+      const SmallVectorImpl<char> &Contents = DataFrag.getContents();
+      for (const uint8_t *
+               P = (const uint8_t *)Contents.data(),
+              *End = (const uint8_t *)Contents.data() + Contents.size();
+           P != End; ++P) {
+        if (*P != 0)
+          report_fatal_error("non-symbolic data in .init_array section");
+      }
+      for (const MCFixup &Fixup : DataFrag.getFixups()) {
+        assert(Fixup.getKind() ==
+               MCFixup::getKindForSize(is64Bit() ? 8 : 4, false));
+        const MCExpr *Expr = Fixup.getValue();
+        auto *SymRef = dyn_cast<MCSymbolRefExpr>(Expr);
+        if (!SymRef)
+          report_fatal_error(
+              "fixups in .init_array should be symbol references");
+        const auto &TargetSym = cast<const MCSymbolWasm>(SymRef->getSymbol());
+        if (TargetSym.getIndex() == InvalidIndex)
+          report_fatal_error("symbols in .init_array should exist in symtab");
+        if (!TargetSym.isFunction())
+          report_fatal_error("symbols in .init_array should be for functions");
+        InitFuncs.push_back(std::make_pair(Priority, TargetSym.getIndex()));
+      }
     }
   }
 
diff --git a/llvm/test/MC/WebAssembly/init-array.ll b/llvm/test/MC/WebAssembly/init-array.ll
new file mode 100644
index 00000000000000..d19ddcf10d5d9c
--- /dev/null
+++ b/llvm/test/MC/WebAssembly/init-array.ll
@@ -0,0 +1,56 @@
+; RUN: llc -mcpu=mvp -filetype=obj %s -o - | obj2yaml | FileCheck %s
+
+target triple = "wasm32-unknown-unknown"
+
+@p_init1 = hidden global ptr @init1, section ".init_array", align 4
+@p_init2 = hidden global ptr @init2, section ".init_array", align 4
+
+define hidden void @init1() #0 { ret void }
+define hidden void @init2() #0 { ret void }
+
+
+; CHECK:        - Type:            IMPORT
+; CHECK-NEXT:     Imports:
+; CHECK-NEXT:       - Module:          env
+; CHECK-NEXT:         Field:           __linear_memory
+; CHECK-NEXT:         Kind:            MEMORY
+; CHECK-NEXT:         Memory:
+; CHECK-NEXT:           Minimum:         0x0
+; CHECK-NEXT:       - Module:          env
+; CHECK-NEXT:         Field:           __indirect_function_table
+; CHECK-NEXT:         Kind:            TABLE
+; CHECK-NEXT:         Table:
+; CHECK-NEXT:           Index:           0
+; CHECK-NEXT:           ElemType:        FUNCREF
+; CHECK-NEXT:           Limits:
+; CHECK-NEXT:             Minimum:         0x0
+; CHECK-NEXT:   - Type:            FUNCTION
+; CHECK-NEXT:     FunctionTypes:   [ 0, 0 ]
+; CHECK-NEXT:   - Type:            CODE
+; CHECK-NEXT:     Functions:
+; CHECK-NEXT:       - Index:           0
+; CHECK-NEXT:         Locals:          []
+; CHECK-NEXT:         Body:            0B
+; CHECK-NEXT:       - Index:           1
+; CHECK-NEXT:         Locals:          []
+; CHECK-NEXT:         Body:            0B
+; CHECK-NEXT:    - Type:            CUSTOM
+; CHECK-NEXT:      Name:            linking
+; CHECK-NEXT:      Version:         2
+; CHECK-NEXT:      SymbolTable:
+; CHECK-NEXT:        - Index:           0
+; CHECK-NEXT:          Kind:            FUNCTION
+; CHECK-NEXT:          Name:            init1
+; CHECK-NEXT:          Flags:           [ VISIBILITY_HIDDEN ]
+; CHECK-NEXT:          Function:        0
+; CHECK-NEXT:        - Index:           1
+; CHECK-NEXT:          Kind:            FUNCTION
+; CHECK-NEXT:          Name:            init2
+; CHECK-NEXT:          Flags:           [ VISIBILITY_HIDDEN ]
+; CHECK-NEXT:          Function:        1
+; CHECK-NEXT:      InitFunctions:
+; CHECK-NEXT:        - Priority:        65535
+; CHECK-NEXT:          Symbol:          0
+; CHECK-NEXT:        - Priority:        65535
+; CHECK-NEXT:          Symbol:          1
+; CHECK-NEXT: ...
 | 
| @sbc100 who should we ask to review this? | 
| From git blame looks like @sbc100 would be the most appropriate reviewer. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. I suppose that C++ and rust always generate just a single fragment which is why we haven't seen issues with this before?
        
          
                llvm/lib/MC/WasmObjectWriter.cpp
              
                Outdated
          
        
      | } | ||
| // Contents of .init_array sections are handled elsewhere. | ||
| if (WS.isDefined() && | ||
| WS.getSection().getName().starts_with(".init_array")) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it possible to define a symbol within this section? Can you give and example? It seems like it should be just a sequence of pointers, but I guess somehow you can have symbol that points into that sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the following C code does so:
#import <stdio.h>
void init1() { puts("init1"); }
__attribute__((section(".init_array"))) void (*p_init[3])(void) = { &init1, &init1, &init1 };
int main() { return 0; }
Under (e.g.) Linux, a list of pointers is written to the .init_array section and a symbol p_init points to the sequence.
$ clang -c symbol.c -o symbol.o
$ obj2yaml symbol.o
[...]
Symbols:
[...]
  - Name:            p_init
    Type:            STT_OBJECT
    Section:         .init_array
    Binding:         STB_GLOBAL
    Size:            0x18
However, with a WebAssembly object this does not make as much sense. WasmObjectWriter.cpp converts the sequence of pointers in .init_array section into entries in the WASM_INIT_FUNC subsection of the custom linking section instead. There they are no longer written as a list of pointers, but indexes into a table.
So, this particular change makes it so that the p_init symbol is not written into the WebAssembly object file. In the current version a symbol is actually written, but it is broken in any case since it points to the incorrect section (leading to the invalid data segment index: 0 bug shown in the OP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we error out int this maybe?   It seems like if somebody had a usage of p_init somewhere it would simply not work.  How about something like  "symbols within .init_array sections are not supported"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we error out int this maybe? It seems like if somebody had a usage of
p_initsomewhere it would simply not work. How about something like "symbols within .init_array sections are not supported"?
A reasonable argument, but I think it would get in the way.
It seems to me the p_init symbol is just a side-effect of this pattern in C, where the real intention is just to add the list of function pointers to the .init_array section. We don't actually need to reference p_init anywhere, but if we throw an error here we unfortunately break the pattern.
Ultimately what I am interested in is some Rust code that uses a similar pattern. Essentially:
unsafe extern "C" fn __ctor() {
    [...]
}
#[cfg_attr(link_section = ".init_array")]
static __CTOR: unsafe extern "C" fn() = __ctor;So I would be grateful if we could continue to allow for this use case and not error out, even if the side-effect symbol cannot be referenced.
Another option is to keep the current situation, but it means somehow otherwise handling the invalid data segment index error when using obj2yaml in the unit test for this PR. I think by avoiding emitting the symbols we are doing no worse than before, and perhaps a little better in that regard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen though if there is a reference to the p_init symbol though, either within the object itself, or in some other object.   Would this result in a relocation with against a non-existent symbol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about landing this PR without this part of the patch and then following up with another PR for e.g. "Handle symbols in init_array sections".
IIUC the test in this PR doesn't contain such symbols anyway so we would likely want a separate test anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsafe extern "C" fn __ctor() { [...] } #[cfg_attr(link_section = ".init_array")] static __CTOR: unsafe extern "C" fn() = __ctor;
In this example which is the symbol that lives in the .init_array section?  I guess it would be __CTOR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen though if there is a reference to the p_init symbol though, either within the object itself, or in some other object. Would this result in a relocation with against a non-existent symbol?
Looks like in both cases you get an error at link time:
wasm-ld: error: symbol.o: invalid data relocation: init1
I see, that's probably not ideal.
In this example which is the symbol that lives in the
.init_arraysection? I guess it would be __CTOR?
Yes. Though Rust mangles the name.
What do you think about landing this PR without this part of the patch and then following up with another PR for e.g. "Handle symbols in init_array sections".
I think I'm OK with that, a fresh set of eyes on it in a new PR might be good in any case. I'll push a commit to remove this part of the PR.
| BTW viewing this diff with "ignore whitespace" helped a lot to see what was going on. Regarding the test case, can it be written in assembly instead? | 
| 
 I guess so, let me throw together a quick C++ example and see how it currently handles the constructors... 
First, compile and test this C++ code:
 #include <iostream>
using namespace std;
class Init1 {
  public: Init1() { cout << "Init1" << endl; }
};
class Init2 {
  public: Init2() { cout << "Init2" << endl; }
};
Init1 init1a;
Init1 init1b;
Init2 init2;
int main() {
  cout << "main" << endl;
  return 0;
}Let's make sure the  Yes, but just with a single function. I suppose the constructor functions have been coalesced into a single function. Let's take a look in  Indeed, the global constructors are combined into a single function. Note that LLVM generates code using  Yes, at least for simple C++. 
 Yes, apologies. The diff is smaller than it looks due to the change in indentation! 
 Unfortunately, I'm not sure how to express this in WebAssembly text format directly. There is an open issue suggesting it is not currently possible to define custom section content (WebAssembly/design#1153) in the text format. I'm also not entirely sure how I would express the test, since the bug occurs in the code that converts the  I would be willing to change the test if it is possible, but would need specific help in how to proceed. | 
| Ping | 
| 
 I think thinking you could convert the  I think you can get a starting point using  | 
| 
 Ahhh, gotcha! Yes, done and pushed. | 
783024e    to
    d5495ac      
    Compare
  
    | Ping. Hi, just bumping this. I've removed the unsure change that we discussed in the inline thread above and will return to it in a future PR once this one is merged. | 
| @georgestagg Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! | 
| LLVM Buildbot has detected a new failure on builder  Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/11585 Here is the relevant piece of the build log for the reference | 
…asm objects (llvm#111008) (cherry picked from commit ac5dd45)
Follow on from llvm#111008. (cherry picked from commit ed91843)
For WebAssembly objects, only functions listed in the first symbol in the
.init_arraysection are invoked.Consider the C program:
Compiling for Linux results in each function running before
main(). But for WebAssembly (here via Emscripten) only the first function is invoked:In previous LLVM versions (<=18, I think?) this also used to throw an error message:
fatal error: error in backend: only one .init_array section fragment supported.This makes things difficult for certain tools that rely on writing multiple objects in the
.init_arraysection for life-before-main initialisation functions.This PR makes two changes to
WasmObjectWriter:.init_arraysection into start functions, (adding them to anInitFuncsvector for writing out later), now loops through all the subsequent fragments of the section until there are none remaining.This picks up any functions listed in later objects in the
.init_arraysection, and they are now successfully invoked before main.However, a related issue is revealed by changing the functions in the example above to read:
Then, with our patched LLVM,
The symbols in the
.init_arraysection are written to theWASM_SYMBOL_TABLEin the customlinkingsection, with kindWASM_SYMBOL_TYPE_DATA. But, these symbols do not reference content in the DATA section, and so the written offset makes no sense.Is this a mistake? AFAICT I don't think there's an appropriate kind for these symbols to be written to the
WASM_SYMBOL_TABLEsubsection with. In fact, symbols for the functions referenced by the.init_arraysections are independently written to a differentWASM_INIT_FUNCsubsection in any case.Interestingly, you can also reveal this problem in an unpatched LLVM by writing a Wasm object file that has no data section but does supply an
.init_arrayfunction:So, this leads me to the related second change:
.init_arraysection to theWASM_SYMBOL_TABLEsubsection of thelinkingcustom section, because these are handled elsewhere and written to theWASM_INIT_FUNCSsubsection instead.With the two changes above, the modified test program above compiles with Emscripten and runs on Wasm with the expected behaviour:
Finally, I have tried to include a unit test. Hopefully it looks reasonable -- it's based on copying similar test files in that directory.