Mutli-Memories Support in IR #4811

ashleynh · 2022-07-18T16:29:06Z

This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction.

It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format.

There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.

juntao · 2022-07-26T09:44:35Z

Would love to see this feature! We can really use it at WasmEdge. 😂 Please let us know how we can be of assistance.

tlively · 2022-07-29T20:38:46Z

🎉 🎉 🎉

tlively

Here are some comments on everything before src/passes in GitHub's review order. Taking a look at everything under src/passes next.

tlively · 2022-07-29T20:42:50Z

src/binaryen-c.cpp

@@ -1036,29 +1036,41 @@ BinaryenExpressionRef BinaryenLoad(BinaryenModuleRef module,
                                   uint32_t offset,
                                   uint32_t align,
                                   BinaryenType type,
-                                   BinaryenExpressionRef ptr) {
+                                   BinaryenExpressionRef ptr,
+                                   const char* name) {


Maybe memName instead of name?

tlively · 2022-07-29T20:48:03Z

src/binaryen-c.cpp

+  // Maintaining compatibility for instructions with a single memory
+  if (name == nullptr && module->memories.size() == 1) {
+    name = module->memories[0]->name.c_str();
+  }


Since this is repeated so often, it might make sense to factor out into a function so we can make this a one-liner like name = getDefaultName(name, module);

tlively · 2022-07-29T20:50:25Z

src/binaryen-c.cpp

                       BinaryenExpressionRef* segmentOffsets,
                       BinaryenIndex* segmentSizes,
                       BinaryenIndex numSegments,


In a follow-up we'll want to refactor this API so that segments are not so closely tied to memories. Looking at the behavior below, it looks like this function blows away any existing segments and memories, so this API isn't really safe to use in a multi-memory context at all. Given that, I think it would make sense to assume this will only be used with non-multi-memory code and have it unconditionally use name "0" rather than taking an explicit name.

tlively · 2022-07-29T20:58:11Z

src/binaryen-c.cpp

+  auto memory = std::make_unique<Memory>();
+  memory->name = name ? name : "0";
+  memory->initial = initial;
+  memory->max = int32_t(maximum); // Make sure -1 extends.


It might be clearer to make this more explicit by doing something like memory->max = maximum == -1 ? Address::kUnlimitedSize : maximum; (although I see this was preexisting code).

tlively · 2022-07-29T21:03:57Z

src/binaryen-c.cpp

    memoryExport->kind = ExternalKind::Memory;
-    wasm->addExport(memoryExport.release());
+    ((Module*)module)->addExport(memoryExport.release());


It's more efficient to use the existing unique ptr rather than releasing it just for a new one to be created inside addExport.

Suggested change

((Module*)module)->addExport(memoryExport.release());

((Module*)module)->addExport(std::move(memoryExport));

tlively · 2022-07-29T21:12:05Z

src/binaryen-c.cpp

  }
+  ((Module*)module)->removeMemories([&](Memory* curr) { return true; });


We should also do something with any existing memory exports. Probably deleting them makes the most sense, but that would be a functional change, since previously they would have implicitly switched to exporting the new memory.

tlively · 2022-07-29T21:14:26Z

src/binaryen-c.cpp

+  auto* memory = ((Module*)module)->getMemoryOrNull(name);
+  if (memory == nullptr) {
+    Fatal() << "invalid memory '" << name << "'.";
+  }


Since we are producing a fatal error anyway, might as well call getMemory instead of getMemoryOrNull and skip the error checking here.

src/ir/memory-utils.cpp

tlively · 2022-07-29T21:32:08Z

src/ir/memory-utils.h

+  if (module.memories.size() > 1) {
+    return false;
+  }


I wonder if we should produce a fatal error here (and in similar places) so we don't forget to come back and fix it up later. @kripken, wdyt?

I agree that fatal errors are better in places like this.

tlively · 2022-07-29T21:43:05Z

src/js/binaryen.js-post.js

-    'size'() {
-      return Module['_BinaryenMemorySize'](module);
+    'size'(name) {
+      return Module['_BinaryenMemorySize'](module, strToStack(name));


I see we have existing bugs here, but every strToStack call needs to be wrapped in a preserveStack call to avoid permanently leaking stack space.

kripken

Really nice work! Relative to the massive amount of code here I have very few comments. (So far I read the source files from src/shell-interface.h - about the middle - in the github order here, to try to avoid overlapping with @tlively 's comments.)

Also none of my comments are really serious. So I think perhaps this can land without addressing all of them, and the rest could be fixed in followups. On a massive PR like this that might be the most practical thing. @tlively what do you think?

One critical thing before landing would be fuzzing, though - we shouldn't break that.

kripken · 2022-07-29T20:55:32Z

src/abi/stack.h

@@ -50,7 +50,8 @@ getStackSpace(Index local, Function* func, Index size, Module& wasm) {
  }
  // align the size
  size = stackAlign(size);
-  auto pointerType = wasm.memory.indexType;
+  auto pointerType =
+    !wasm.memories.empty() ? wasm.memories[0]->indexType : Type::i32;


This pass can probably halt with an error (like on line 49 above) if there are no memories. Please also add a comment that we are assuming that memory 0 is the "main" memory here.

kripken · 2022-07-29T20:57:00Z

src/binaryen-c.cpp

+                                   const char* name) {
+  // Maintaining compatibility for instructions with a single memory
+  if (name == nullptr && module->memories.size() == 1) {
+    name = module->memories[0]->name.c_str();


Perhaps a helper function in this file, static Name getMemoryName(module, name)?

kripken · 2022-07-29T20:58:05Z

src/binaryen-c.h

@@ -736,7 +736,8 @@ BINARYEN_API BinaryenExpressionRef BinaryenLoad(BinaryenModuleRef module,
                                                uint32_t offset,
                                                uint32_t align,
                                                BinaryenType type,
-                                                BinaryenExpressionRef ptr);
+                                                BinaryenExpressionRef ptr,
+                                                const char* name);


I'm not sure name is clear enough here. Perhaps memoryName?

kripken · 2022-07-29T21:11:15Z

src/shell-interface.h

+  int8_t load8s(Address addr, Name memoryName) override {
+    auto it = memories.find(memoryName);
+    if (it == memories.end()) {
+      trap("load8s on non-existing memory");


Please replace this (and below) with an assertion. A trap is not needed here since this should be impossible in a valid wasm module, and we can assume it is valid basically anywhere (except for the validator itself).

kripken · 2022-07-29T21:14:04Z

src/tools/wasm-ctor-eval.cpp

+  template<typename T> T* getMemory(Address address, Name memoryName) {
+    auto it = memories.find(memoryName);
+    if (it == memories.end()) {
+      Fatal() << "memory not found: " << memoryName;


(can also be an assertion)

kripken · 2022-07-29T22:19:20Z

src/wasm/wasm-debug.cpp

@@ -1065,7 +1065,8 @@ void writeDWARFSections(Module& wasm, const BinaryLocations& newLocations) {

  updateDebugLines(data, locationUpdater);

-  updateCompileUnits(info, data, locationUpdater, wasm.memory.is64());
+  bool is64 = wasm.memories.size() > 0 ? wasm.memories[0]->is64() : false;


Worth a comment here that we assume all memories have the same type (all 32 or all 64).

kripken · 2022-07-29T22:21:58Z

src/wasm/wasm-s-parser.cpp

+  if (hasMemoryIdx(s, 2, i)) {
+    memory = getMemoryName(*s[i++]);
+  } else {
+    memory = getMemoryNameAtIdx(0);
+  }


These 5 lines seem to repeat a lot - maybe a helper function in this file?

kripken · 2022-07-29T22:34:30Z

src/wasm/wasm-validator.cpp

+  if (module.memories.empty()) {
+    return;
+  }
+  auto& curr = module.memories[0];


maybe a TODO comment for looping over the memories?

kripken · 2022-07-29T22:40:05Z

src/wasm/wasm.cpp

+  for (auto& curr : dataSegments) {
+    dataSegmentsMap[curr->name] = curr.get();
+  }
+}


We don't seem to have this for ElementSegments? Why is this not symmetric to that?

It's a bug that I never updated the maps when writing the PR for First Class Data Segments. In the future, I'd make this a PR that comes before multi-memories, and it still can!

This is not symmetric to ElementSegments, because we don't do vector swaps for ElementSegments like we do for DataSegments: Ex1 Ex2

I thought it might be wasteful to call updateMaps() just for DataSegments, but there is precedent for doing so with Functions. Happy to remove this function and just call updateMaps(), wdyt?

Hmm, I get 404s on both the first 2 links?

Makes sense, the multi-memories branch is deleted now. Fixed the links to point to main now. Ex1 Ex2

I see, thanks! Yeah, maybe it's worth having a function for it, if we have multiple places that don't need to update all the maps. lgtm as it is then.

test/atomics-unshared.wast.from-wast

tlively

Finished looking at src/passes

tlively · 2022-08-04T20:18:33Z

src/passes/Asyncify.cpp


  Expression* makeGetStackPos() {
    return makeLoad(4,
                    false,
                    int32_t(DataOffset::BStackPos),
                    4,
                    makeGlobalGet(ASYNCIFY_DATA, Type::i32),
-                    Type::i32);
+                    Type::i32,
+                    wasm.memories[0]->name);


I wonder if it's worth having a method like wasm.getOnlyMemory that asserts that there is exactly one memory and returns is. Then finding all the places that need to be updated to handle multi-memory could be found by searching for uses of that method.

This sounds like a good idea to me. I think getSingleMemory would be perhaps a better name given existing getSingle* stuff we have. Also the method could be on MemoryUtils.

tlively · 2022-08-04T20:30:05Z

src/passes/MemoryPacking.cpp

@@ -123,7 +125,8 @@ struct MemoryPacking : public Pass {
 };

 void MemoryPacking::run(PassRunner* runner, Module* module) {
-  if (!canOptimize(module->memory, module->dataSegments, runner->options)) {
+  // Does not have multi-memories support


Maybe we should put an assertion here so we don't forget about this.

src/passes/Print.cpp

tlively · 2022-08-04T20:33:48Z

src/passes/RemoveUnusedModuleElements.cpp

@@ -38,6 +38,7 @@ typedef std::pair<ModuleElementKind, Name> ModuleElement;

 // Finds reachabilities
 // TODO: use Effects to determine if a memory is used
+// This pass does not have multi-memories support


Should we add an assertion?

tlively · 2022-08-04T21:31:24Z

I would be ok with landing this as-is and applying fixups in follow-on PRs after the fuzzer passes 👍

Edit: It might be useful to first update printing to elide names where possible, though, to minimize test churn.

kripken

Looks like there are some new test files here, test/multi-memories*? It would be good to make those wast files instead of wasm, unless I'm missing a reason not to? As text they are easier to review, and look at if a test fails in the future.

kripken · 2022-08-16T15:30:47Z

test/lit/wasm-split/instrument-in-memory.wast


 ;; Check that the profiling function is correct.

-;; CHECK:      (func $__write_profile (param $addr i32) (param $size i32) (result i32)


Why is this function no longer shown in the test output? Maybe the new first line here needs to be updated (there are some flags like --all-items I think, but I don't remember them offhand).

I reverted this test file back to main, because it was manually created and just added the memory line I needed for the test to pass. We're all good now 👍🏾

test/lit/passes/O4_disable-bulk-memory.wast

kripken · 2022-08-16T15:33:46Z

test/lit/multi-memories-simd.wast

+ ;; CHECK-NEXT:  )
+ ;; CHECK-NEXT: )
+ (func $v128.load (param $0 i32) (result v128)
+  (v128.load offset=0 align=16


Comments in these tests would be good, like without an explicit name, the first memory is used in this one, etc.

kripken · 2022-08-16T15:35:50Z

test/example/relooper-fuzz2.c

@@ -23,7 +23,8 @@ int main() {
                   0,
                   0,
                   BinaryenTypeInt32(),
-                   BinaryenConst(module, BinaryenLiteralInt32(4))),
+                   BinaryenConst(module, BinaryenLiteralInt32(4)),
+                   "0"),


Could these changes in these files be avoided? I think we wanted to automatically use the single memory when there is a single memory.

For the JS API, null is appended for the memory name and this is handled inside the different calls (BinaryenLoad, BinaryenStore, etc.). For C calls, we can set a default parameter of null for the memName to avoid updating these tests.

I see, right... a null is the best we could do, since we must still change the C API to add the new param either way. Sounds ok to leave null for later then.

kripken · 2022-08-16T15:45:18Z

src/ir/memory-utils.h

+  if (module.memories.size() > 1) {
+    return false;
+  }


I agree that fatal errors are better in places like this.

kripken · 2022-08-16T15:47:48Z

src/passes/Asyncify.cpp


  Expression* makeGetStackPos() {
    return makeLoad(4,
                    false,
                    int32_t(DataOffset::BStackPos),
                    4,
                    makeGlobalGet(ASYNCIFY_DATA, Type::i32),
-                    Type::i32);
+                    Type::i32,
+                    wasm.memories[0]->name);


This sounds like a good idea to me. I think getSingleMemory would be perhaps a better name given existing getSingle* stuff we have. Also the method could be on MemoryUtils.

ashleynh · 2022-08-17T18:39:10Z

Looks like there are some new test files here, test/multi-memories*? It would be good to make those wast files instead of wasm, unless I'm missing a reason not to? As text they are easier to review, and look at if a test fails in the future.

@kripken I added the multi-memories wasm test files to verify correct parsing of binary format and printing to text format. There are additional ".wasm.fromBinary" text files added for each binary to verify the wast is as expected. Happy to make any change needed here.

kripken · 2022-08-17T19:51:07Z

@ashleynh

I added the multi-memories wasm test files to verify correct parsing of binary format and printing to text format.

Makes sense, we do want that coverage. But adding a wast file in the same directory should get the same test coverage (it will also emit .fromBinary outputs etc. - it will write to binary, read, then print that). And text tests are generally easier to handle.

We do have some actual .wasm files in the test suite, but mainly to handle code that can't be directly represented in the text format for some reason.

Due to missing test coverage, we missed in #4811 that some memory operations needed to get make64() called on them.

This is a pretty subtle point that was missed in #4811 - we need to first visit the child, then compute the size, as the child may alter that size. Found by the fuzzer.

ashleynh closed this Jul 18, 2022

ashleynh reopened this Jul 18, 2022

ashleynh marked this pull request as draft July 18, 2022 18:04

ashleynh changed the title ~~Running CI, do not review~~ Mutli-Memories, Running CI, do not review Jul 18, 2022

ashleynh force-pushed the multi-memories branch 2 times, most recently from 847551d to 4eff0f5 Compare July 22, 2022 00:02

ashleynh changed the title ~~Mutli-Memories, Running CI, do not review~~ Mutli-Memories Jul 25, 2022

ashleynh changed the title ~~Mutli-Memories~~ Mutli-Memories Support in IR Jul 25, 2022

ashleynh force-pushed the multi-memories branch from 165d555 to 1d5fc9a Compare July 25, 2022 22:59

ashleynh force-pushed the multi-memories branch 4 times, most recently from b9e3d67 to c81a4f6 Compare July 29, 2022 19:50

ashleynh marked this pull request as ready for review July 29, 2022 20:24

ashleynh requested review from tlively and kripken July 29, 2022 20:25

tlively reviewed Jul 29, 2022

View reviewed changes

kripken reviewed Jul 29, 2022

View reviewed changes

tlively reviewed Aug 4, 2022

View reviewed changes

ashleynh force-pushed the multi-memories branch 3 times, most recently from 6cd391e to 669fb89 Compare August 10, 2022 17:52

ashleynh force-pushed the multi-memories branch 2 times, most recently from a3b4063 to 19c4e92 Compare August 15, 2022 21:58

kripken reviewed Aug 16, 2022

View reviewed changes

ashleynh force-pushed the multi-memories branch from 2985924 to e535ad7 Compare August 17, 2022 23:38

ashleynh mentioned this pull request Aug 17, 2022

Convert multi-memories .wasm test files to .wast #4914

Closed

clang-format

b849e74

updating test with memory

94d52a5

ashleynh merged commit 3aff4c6 into main Aug 18, 2022

ashleynh deleted the multi-memories branch August 18, 2022 01:44

kripken mentioned this pull request Aug 18, 2022

Fix Memory64 binary parsing after #4811 #4933

Merged

kripken added a commit that referenced this pull request Aug 18, 2022

Fix Memory64 binary parsing after #4811 (#4933)

296a63e

Due to missing test coverage, we missed in #4811 that some memory operations needed to get make64() called on them.

MaxGraey mentioned this pull request Aug 24, 2022

[C-API / JS-API] Binaryen crash if recent "memoryName" is null for memory ops #4962

Open

ashleynh mentioned this pull request Aug 27, 2022

Changing Fatal() to assert() #4982

Merged

MaxGraey mentioned this pull request Aug 29, 2022

Import memory caused to Fatal: Module::addMemory: 0 already exists #4988

Closed

kripken mentioned this pull request Oct 3, 2022

Fix ordering of visit() in MemoryGrow interpretation #5108

Merged

	((Module*)module)->addExport(memoryExport.release());
	((Module*)module)->addExport(std::move(memoryExport));

		}
		((Module)module)->removeMemories([&](Memory curr) { return true; });


		;; Check that the profiling function is correct.

		;; CHECK: (func $__write_profile (param $addr i32) (param $size i32) (result i32)

Mutli-Memories Support in IR #4811

Mutli-Memories Support in IR #4811

Uh oh!

Conversation

ashleynh commented Jul 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juntao commented Jul 26, 2022

Uh oh!

tlively commented Jul 29, 2022

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashleynh commented Jul 18, 2022 •

edited

Loading

tlively commented Aug 4, 2022 •

edited

Loading