Skip to content

Mutli-Memories Support in IR #4811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 78 commits into from
Aug 18, 2022
Merged

Mutli-Memories Support in IR #4811

merged 78 commits into from
Aug 18, 2022

Conversation

ashleynh
Copy link
Collaborator

@ashleynh ashleynh commented Jul 18, 2022

Multiple Memories for Wasm Proposal

This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction.

It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format.

There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.

@ashleynh ashleynh closed this Jul 18, 2022
@ashleynh ashleynh reopened this Jul 18, 2022
@ashleynh ashleynh marked this pull request as draft July 18, 2022 18:04
@ashleynh ashleynh changed the title Running CI, do not review Mutli-Memories, Running CI, do not review Jul 18, 2022
@ashleynh ashleynh force-pushed the multi-memories branch 2 times, most recently from 847551d to 4eff0f5 Compare July 22, 2022 00:02
@ashleynh ashleynh changed the title Mutli-Memories, Running CI, do not review Mutli-Memories Jul 25, 2022
@ashleynh ashleynh changed the title Mutli-Memories Mutli-Memories Support in IR Jul 25, 2022
@juntao
Copy link

juntao commented Jul 26, 2022

Would love to see this feature! We can really use it at WasmEdge. 😂 Please let us know how we can be of assistance.

@ashleynh ashleynh force-pushed the multi-memories branch 4 times, most recently from b9e3d67 to c81a4f6 Compare July 29, 2022 19:50
@ashleynh ashleynh marked this pull request as ready for review July 29, 2022 20:24
@ashleynh ashleynh requested review from tlively and kripken July 29, 2022 20:25
@tlively
Copy link
Member

tlively commented Jul 29, 2022

🎉 🎉 🎉

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some comments on everything before src/passes in GitHub's review order. Taking a look at everything under src/passes next.

@@ -1036,29 +1036,41 @@ BinaryenExpressionRef BinaryenLoad(BinaryenModuleRef module,
uint32_t offset,
uint32_t align,
BinaryenType type,
BinaryenExpressionRef ptr) {
BinaryenExpressionRef ptr,
const char* name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe memName instead of name?

Comment on lines +1240 to +1275
// Maintaining compatibility for instructions with a single memory
if (name == nullptr && module->memories.size() == 1) {
name = module->memories[0]->name.c_str();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is repeated so often, it might make sense to factor out into a function so we can make this a one-liner like name = getDefaultName(name, module);

Comment on lines 3944 to 3976
BinaryenExpressionRef* segmentOffsets,
BinaryenIndex* segmentSizes,
BinaryenIndex numSegments,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up we'll want to refactor this API so that segments are not so closely tied to memories. Looking at the behavior below, it looks like this function blows away any existing segments and memories, so this API isn't really safe to use in a multi-memory context at all. Given that, I think it would make sense to assume this will only be used with non-multi-memory code and have it unconditionally use name "0" rather than taking an explicit name.

auto memory = std::make_unique<Memory>();
memory->name = name ? name : "0";
memory->initial = initial;
memory->max = int32_t(maximum); // Make sure -1 extends.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be clearer to make this more explicit by doing something like memory->max = maximum == -1 ? Address::kUnlimitedSize : maximum; (although I see this was preexisting code).

memoryExport->kind = ExternalKind::Memory;
wasm->addExport(memoryExport.release());
((Module*)module)->addExport(memoryExport.release());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more efficient to use the existing unique ptr rather than releasing it just for a new one to be created inside addExport.

Suggested change
((Module*)module)->addExport(memoryExport.release());
((Module*)module)->addExport(std::move(memoryExport));

}
((Module*)module)->removeMemories([&](Memory* curr) { return true; });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also do something with any existing memory exports. Probably deleting them makes the most sense, but that would be a functional change, since previously they would have implicitly switched to exporting the new memory.

Comment on lines +4024 to +4057
auto* memory = ((Module*)module)->getMemoryOrNull(name);
if (memory == nullptr) {
Fatal() << "invalid memory '" << name << "'.";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are producing a fatal error anyway, might as well call getMemory instead of getMemoryOrNull and skip the error checking here.

Comment on lines +49 to +51
if (module.memories.size() > 1) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should produce a fatal error here (and in similar places) so we don't forget to come back and fix it up later. @kripken, wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that fatal errors are better in places like this.

'size'() {
return Module['_BinaryenMemorySize'](module);
'size'(name) {
return Module['_BinaryenMemorySize'](module, strToStack(name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we have existing bugs here, but every strToStack call needs to be wrapped in a preserveStack call to avoid permanently leaking stack space.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work! Relative to the massive amount of code here I have very few comments. (So far I read the source files from src/shell-interface.h - about the middle - in the github order here, to try to avoid overlapping with @tlively 's comments.)

Also none of my comments are really serious. So I think perhaps this can land without addressing all of them, and the rest could be fixed in followups. On a massive PR like this that might be the most practical thing. @tlively what do you think?

One critical thing before landing would be fuzzing, though - we shouldn't break that.

@@ -50,7 +50,8 @@ getStackSpace(Index local, Function* func, Index size, Module& wasm) {
}
// align the size
size = stackAlign(size);
auto pointerType = wasm.memory.indexType;
auto pointerType =
!wasm.memories.empty() ? wasm.memories[0]->indexType : Type::i32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass can probably halt with an error (like on line 49 above) if there are no memories. Please also add a comment that we are assuming that memory 0 is the "main" memory here.

const char* name) {
// Maintaining compatibility for instructions with a single memory
if (name == nullptr && module->memories.size() == 1) {
name = module->memories[0]->name.c_str();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a helper function in this file, static Name getMemoryName(module, name)?

@@ -736,7 +736,8 @@ BINARYEN_API BinaryenExpressionRef BinaryenLoad(BinaryenModuleRef module,
uint32_t offset,
uint32_t align,
BinaryenType type,
BinaryenExpressionRef ptr);
BinaryenExpressionRef ptr,
const char* name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure name is clear enough here. Perhaps memoryName?

int8_t load8s(Address addr, Name memoryName) override {
auto it = memories.find(memoryName);
if (it == memories.end()) {
trap("load8s on non-existing memory");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace this (and below) with an assertion. A trap is not needed here since this should be impossible in a valid wasm module, and we can assume it is valid basically anywhere (except for the validator itself).

template<typename T> T* getMemory(Address address, Name memoryName) {
auto it = memories.find(memoryName);
if (it == memories.end()) {
Fatal() << "memory not found: " << memoryName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(can also be an assertion)

@@ -1065,7 +1065,8 @@ void writeDWARFSections(Module& wasm, const BinaryLocations& newLocations) {

updateDebugLines(data, locationUpdater);

updateCompileUnits(info, data, locationUpdater, wasm.memory.is64());
bool is64 = wasm.memories.size() > 0 ? wasm.memories[0]->is64() : false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a comment here that we assume all memories have the same type (all 32 or all 64).

Comment on lines +2006 to +1963
if (hasMemoryIdx(s, 2, i)) {
memory = getMemoryName(*s[i++]);
} else {
memory = getMemoryNameAtIdx(0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 5 lines seem to repeat a lot - maybe a helper function in this file?

if (module.memories.empty()) {
return;
}
auto& curr = module.memories[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a TODO comment for looping over the memories?

for (auto& curr : dataSegments) {
dataSegmentsMap[curr->name] = curr.get();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to have this for ElementSegments? Why is this not symmetric to that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bug that I never updated the maps when writing the PR for First Class Data Segments. In the future, I'd make this a PR that comes before multi-memories, and it still can!

This is not symmetric to ElementSegments, because we don't do vector swaps for ElementSegments like we do for DataSegments: Ex1 Ex2

I thought it might be wasteful to call updateMaps() just for DataSegments, but there is precedent for doing so with Functions. Happy to remove this function and just call updateMaps(), wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I get 404s on both the first 2 links?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, the multi-memories branch is deleted now. Fixed the links to point to main now. Ex1 Ex2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks! Yeah, maybe it's worth having a function for it, if we have multiple places that don't need to update all the maps. lgtm as it is then.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished looking at src/passes


Expression* makeGetStackPos() {
return makeLoad(4,
false,
int32_t(DataOffset::BStackPos),
4,
makeGlobalGet(ASYNCIFY_DATA, Type::i32),
Type::i32);
Type::i32,
wasm.memories[0]->name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth having a method like wasm.getOnlyMemory that asserts that there is exactly one memory and returns is. Then finding all the places that need to be updated to handle multi-memory could be found by searching for uses of that method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good idea to me. I think getSingleMemory would be perhaps a better name given existing getSingle* stuff we have. Also the method could be on MemoryUtils.

@@ -123,7 +125,8 @@ struct MemoryPacking : public Pass {
};

void MemoryPacking::run(PassRunner* runner, Module* module) {
if (!canOptimize(module->memory, module->dataSegments, runner->options)) {
// Does not have multi-memories support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should put an assertion here so we don't forget about this.

@@ -38,6 +38,7 @@ typedef std::pair<ModuleElementKind, Name> ModuleElement;

// Finds reachabilities
// TODO: use Effects to determine if a memory is used
// This pass does not have multi-memories support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an assertion?

@tlively
Copy link
Member

tlively commented Aug 4, 2022

I would be ok with landing this as-is and applying fixups in follow-on PRs after the fuzzer passes 👍

Edit: It might be useful to first update printing to elide names where possible, though, to minimize test churn.

@ashleynh ashleynh force-pushed the multi-memories branch 3 times, most recently from 6cd391e to 669fb89 Compare August 10, 2022 17:52
@ashleynh ashleynh force-pushed the multi-memories branch 2 times, most recently from a3b4063 to 19c4e92 Compare August 15, 2022 21:58
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are some new test files here, test/multi-memories*? It would be good to make those wast files instead of wasm, unless I'm missing a reason not to? As text they are easier to review, and look at if a test fails in the future.


;; Check that the profiling function is correct.

;; CHECK: (func $__write_profile (param $addr i32) (param $size i32) (result i32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function no longer shown in the test output? Maybe the new first line here needs to be updated (there are some flags like --all-items I think, but I don't remember them offhand).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this test file back to main, because it was manually created and just added the memory line I needed for the test to pass. We're all good now 👍🏾

;; CHECK-NEXT: )
;; CHECK-NEXT: )
(func $v128.load (param $0 i32) (result v128)
(v128.load offset=0 align=16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments in these tests would be good, like without an explicit name, the first memory is used in this one, etc.

@@ -23,7 +23,8 @@ int main() {
0,
0,
BinaryenTypeInt32(),
BinaryenConst(module, BinaryenLiteralInt32(4))),
BinaryenConst(module, BinaryenLiteralInt32(4)),
"0"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these changes in these files be avoided? I think we wanted to automatically use the single memory when there is a single memory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the JS API, null is appended for the memory name and this is handled inside the different calls (BinaryenLoad, BinaryenStore, etc.). For C calls, we can set a default parameter of null for the memName to avoid updating these tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, right... a null is the best we could do, since we must still change the C API to add the new param either way. Sounds ok to leave null for later then.

Comment on lines +49 to +51
if (module.memories.size() > 1) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that fatal errors are better in places like this.


Expression* makeGetStackPos() {
return makeLoad(4,
false,
int32_t(DataOffset::BStackPos),
4,
makeGlobalGet(ASYNCIFY_DATA, Type::i32),
Type::i32);
Type::i32,
wasm.memories[0]->name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good idea to me. I think getSingleMemory would be perhaps a better name given existing getSingle* stuff we have. Also the method could be on MemoryUtils.

@ashleynh
Copy link
Collaborator Author

Looks like there are some new test files here, test/multi-memories*? It would be good to make those wast files instead of wasm, unless I'm missing a reason not to? As text they are easier to review, and look at if a test fails in the future.

@kripken I added the multi-memories wasm test files to verify correct parsing of binary format and printing to text format. There are additional ".wasm.fromBinary" text files added for each binary to verify the wast is as expected. Happy to make any change needed here.

@kripken
Copy link
Member

kripken commented Aug 17, 2022

@ashleynh

I added the multi-memories wasm test files to verify correct parsing of binary format and printing to text format.

Makes sense, we do want that coverage. But adding a wast file in the same directory should get the same test coverage (it will also emit .fromBinary outputs etc. - it will write to binary, read, then print that). And text tests are generally easier to handle.

We do have some actual .wasm files in the test suite, but mainly to handle code that can't be directly represented in the text format for some reason.

@ashleynh ashleynh merged commit 3aff4c6 into main Aug 18, 2022
@ashleynh ashleynh deleted the multi-memories branch August 18, 2022 01:44
kripken added a commit that referenced this pull request Aug 18, 2022
Due to missing test coverage, we missed in #4811 that some memory operations
needed to get make64() called on them.
kripken added a commit that referenced this pull request Oct 3, 2022

This is a pretty subtle point that was missed in #4811 - we need to first visit the
child, then compute the size, as the child may alter that size.

Found by the fuzzer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants