-
Notifications
You must be signed in to change notification settings - Fork 787
Mutli-Memories Support in IR #4811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
847551d
to
4eff0f5
Compare
Would love to see this feature! We can really use it at WasmEdge. 😂 Please let us know how we can be of assistance. |
b9e3d67
to
c81a4f6
Compare
🎉 🎉 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments on everything before src/passes
in GitHub's review order. Taking a look at everything under src/passes
next.
@@ -1036,29 +1036,41 @@ BinaryenExpressionRef BinaryenLoad(BinaryenModuleRef module, | |||
uint32_t offset, | |||
uint32_t align, | |||
BinaryenType type, | |||
BinaryenExpressionRef ptr) { | |||
BinaryenExpressionRef ptr, | |||
const char* name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe memName
instead of name
?
// Maintaining compatibility for instructions with a single memory | ||
if (name == nullptr && module->memories.size() == 1) { | ||
name = module->memories[0]->name.c_str(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is repeated so often, it might make sense to factor out into a function so we can make this a one-liner like name = getDefaultName(name, module);
BinaryenExpressionRef* segmentOffsets, | ||
BinaryenIndex* segmentSizes, | ||
BinaryenIndex numSegments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a follow-up we'll want to refactor this API so that segments are not so closely tied to memories. Looking at the behavior below, it looks like this function blows away any existing segments and memories, so this API isn't really safe to use in a multi-memory context at all. Given that, I think it would make sense to assume this will only be used with non-multi-memory code and have it unconditionally use name "0"
rather than taking an explicit name.
auto memory = std::make_unique<Memory>(); | ||
memory->name = name ? name : "0"; | ||
memory->initial = initial; | ||
memory->max = int32_t(maximum); // Make sure -1 extends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be clearer to make this more explicit by doing something like memory->max = maximum == -1 ? Address::kUnlimitedSize : maximum;
(although I see this was preexisting code).
memoryExport->kind = ExternalKind::Memory; | ||
wasm->addExport(memoryExport.release()); | ||
((Module*)module)->addExport(memoryExport.release()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more efficient to use the existing unique ptr rather than releasing it just for a new one to be created inside addExport
.
((Module*)module)->addExport(memoryExport.release()); | |
((Module*)module)->addExport(std::move(memoryExport)); |
} | ||
((Module*)module)->removeMemories([&](Memory* curr) { return true; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also do something with any existing memory exports. Probably deleting them makes the most sense, but that would be a functional change, since previously they would have implicitly switched to exporting the new memory.
auto* memory = ((Module*)module)->getMemoryOrNull(name); | ||
if (memory == nullptr) { | ||
Fatal() << "invalid memory '" << name << "'."; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are producing a fatal error anyway, might as well call getMemory
instead of getMemoryOrNull
and skip the error checking here.
if (module.memories.size() > 1) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should produce a fatal error here (and in similar places) so we don't forget to come back and fix it up later. @kripken, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that fatal errors are better in places like this.
'size'() { | ||
return Module['_BinaryenMemorySize'](module); | ||
'size'(name) { | ||
return Module['_BinaryenMemorySize'](module, strToStack(name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we have existing bugs here, but every strToStack
call needs to be wrapped in a preserveStack
call to avoid permanently leaking stack space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work! Relative to the massive amount of code here I have very few comments. (So far I read the source files from src/shell-interface.h
- about the middle - in the github order here, to try to avoid overlapping with @tlively 's comments.)
Also none of my comments are really serious. So I think perhaps this can land without addressing all of them, and the rest could be fixed in followups. On a massive PR like this that might be the most practical thing. @tlively what do you think?
One critical thing before landing would be fuzzing, though - we shouldn't break that.
@@ -50,7 +50,8 @@ getStackSpace(Index local, Function* func, Index size, Module& wasm) { | |||
} | |||
// align the size | |||
size = stackAlign(size); | |||
auto pointerType = wasm.memory.indexType; | |||
auto pointerType = | |||
!wasm.memories.empty() ? wasm.memories[0]->indexType : Type::i32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pass can probably halt with an error (like on line 49 above) if there are no memories. Please also add a comment that we are assuming that memory 0 is the "main" memory here.
const char* name) { | ||
// Maintaining compatibility for instructions with a single memory | ||
if (name == nullptr && module->memories.size() == 1) { | ||
name = module->memories[0]->name.c_str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a helper function in this file, static Name getMemoryName(module, name)
?
@@ -736,7 +736,8 @@ BINARYEN_API BinaryenExpressionRef BinaryenLoad(BinaryenModuleRef module, | |||
uint32_t offset, | |||
uint32_t align, | |||
BinaryenType type, | |||
BinaryenExpressionRef ptr); | |||
BinaryenExpressionRef ptr, | |||
const char* name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure name is clear enough here. Perhaps memoryName
?
int8_t load8s(Address addr, Name memoryName) override { | ||
auto it = memories.find(memoryName); | ||
if (it == memories.end()) { | ||
trap("load8s on non-existing memory"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace this (and below) with an assertion. A trap is not needed here since this should be impossible in a valid wasm module, and we can assume it is valid basically anywhere (except for the validator itself).
template<typename T> T* getMemory(Address address, Name memoryName) { | ||
auto it = memories.find(memoryName); | ||
if (it == memories.end()) { | ||
Fatal() << "memory not found: " << memoryName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(can also be an assertion)
@@ -1065,7 +1065,8 @@ void writeDWARFSections(Module& wasm, const BinaryLocations& newLocations) { | |||
|
|||
updateDebugLines(data, locationUpdater); | |||
|
|||
updateCompileUnits(info, data, locationUpdater, wasm.memory.is64()); | |||
bool is64 = wasm.memories.size() > 0 ? wasm.memories[0]->is64() : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth a comment here that we assume all memories have the same type (all 32 or all 64).
if (hasMemoryIdx(s, 2, i)) { | ||
memory = getMemoryName(*s[i++]); | ||
} else { | ||
memory = getMemoryNameAtIdx(0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 5 lines seem to repeat a lot - maybe a helper function in this file?
if (module.memories.empty()) { | ||
return; | ||
} | ||
auto& curr = module.memories[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a TODO comment for looping over the memories?
for (auto& curr : dataSegments) { | ||
dataSegmentsMap[curr->name] = curr.get(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to have this for ElementSegments? Why is this not symmetric to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bug that I never updated the maps when writing the PR for First Class Data Segments. In the future, I'd make this a PR that comes before multi-memories, and it still can!
This is not symmetric to ElementSegments, because we don't do vector swaps for ElementSegments like we do for DataSegments: Ex1 Ex2
I thought it might be wasteful to call updateMaps() just for DataSegments, but there is precedent for doing so with Functions. Happy to remove this function and just call updateMaps(), wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I get 404s on both the first 2 links?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks! Yeah, maybe it's worth having a function for it, if we have multiple places that don't need to update all the maps. lgtm as it is then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished looking at src/passes
|
||
Expression* makeGetStackPos() { | ||
return makeLoad(4, | ||
false, | ||
int32_t(DataOffset::BStackPos), | ||
4, | ||
makeGlobalGet(ASYNCIFY_DATA, Type::i32), | ||
Type::i32); | ||
Type::i32, | ||
wasm.memories[0]->name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth having a method like wasm.getOnlyMemory
that asserts that there is exactly one memory and returns is. Then finding all the places that need to be updated to handle multi-memory could be found by searching for uses of that method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a good idea to me. I think getSingleMemory
would be perhaps a better name given existing getSingle*
stuff we have. Also the method could be on MemoryUtils
.
@@ -123,7 +125,8 @@ struct MemoryPacking : public Pass { | |||
}; | |||
|
|||
void MemoryPacking::run(PassRunner* runner, Module* module) { | |||
if (!canOptimize(module->memory, module->dataSegments, runner->options)) { | |||
// Does not have multi-memories support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should put an assertion here so we don't forget about this.
@@ -38,6 +38,7 @@ typedef std::pair<ModuleElementKind, Name> ModuleElement; | |||
|
|||
// Finds reachabilities | |||
// TODO: use Effects to determine if a memory is used | |||
// This pass does not have multi-memories support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an assertion?
I would be ok with landing this as-is and applying fixups in follow-on PRs after the fuzzer passes 👍 Edit: It might be useful to first update printing to elide names where possible, though, to minimize test churn. |
6cd391e
to
669fb89
Compare
a3b4063
to
19c4e92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are some new test files here, test/multi-memories*
? It would be good to make those wast
files instead of wasm
, unless I'm missing a reason not to? As text they are easier to review, and look at if a test fails in the future.
|
||
;; Check that the profiling function is correct. | ||
|
||
;; CHECK: (func $__write_profile (param $addr i32) (param $size i32) (result i32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this function no longer shown in the test output? Maybe the new first line here needs to be updated (there are some flags like --all-items
I think, but I don't remember them offhand).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted this test file back to main, because it was manually created and just added the memory line I needed for the test to pass. We're all good now 👍🏾
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: ) | ||
(func $v128.load (param $0 i32) (result v128) | ||
(v128.load offset=0 align=16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments in these tests would be good, like without an explicit name, the first memory is used
in this one, etc.
@@ -23,7 +23,8 @@ int main() { | |||
0, | |||
0, | |||
BinaryenTypeInt32(), | |||
BinaryenConst(module, BinaryenLiteralInt32(4))), | |||
BinaryenConst(module, BinaryenLiteralInt32(4)), | |||
"0"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these changes in these files be avoided? I think we wanted to automatically use the single memory when there is a single memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the JS API, null is appended for the memory name and this is handled inside the different calls (BinaryenLoad, BinaryenStore, etc.). For C calls, we can set a default parameter of null for the memName to avoid updating these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, right... a null is the best we could do, since we must still change the C API to add the new param either way. Sounds ok to leave null for later then.
if (module.memories.size() > 1) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that fatal errors are better in places like this.
|
||
Expression* makeGetStackPos() { | ||
return makeLoad(4, | ||
false, | ||
int32_t(DataOffset::BStackPos), | ||
4, | ||
makeGlobalGet(ASYNCIFY_DATA, Type::i32), | ||
Type::i32); | ||
Type::i32, | ||
wasm.memories[0]->name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a good idea to me. I think getSingleMemory
would be perhaps a better name given existing getSingle*
stuff we have. Also the method could be on MemoryUtils
.
@kripken I added the multi-memories wasm test files to verify correct parsing of binary format and printing to text format. There are additional ".wasm.fromBinary" text files added for each binary to verify the wast is as expected. Happy to make any change needed here. |
Makes sense, we do want that coverage. But adding a We do have some actual |
2985924
to
e535ad7
Compare
This is a pretty subtle point that was missed in #4811 - we need to first visit the child, then compute the size, as the child may alter that size. Found by the fuzzer.
Multiple Memories for Wasm Proposal
This PR removes the single memory restriction in IR, adding support for a single module to reference multiple memories. To support this change, a new memory name field was added to 13 memory instructions in order to identify the memory for the instruction.
It is a goal of this PR to maintain backwards compatibility with existing text and binary wasm modules, so memory indexes remain optional for memory instructions. Similarly, the JS API makes assumptions about which memory is intended when only one memory is present in the module. Another goal of this PR is that existing tests behavior be unaffected. That said, tests must now explicitly define a memory before invoking memory instructions or exporting a memory, and memory names are now printed for each memory instruction in the text format.
There remain quite a few places where a hardcoded reference to the first memory persist (memory flattening, for example, will return early if more than one memory is present in the module). Many of these call-sites, particularly within passes, will require us to rethink how the optimization works in a multi-memories world. Other call-sites may necessitate more invasive code restructuring to fully convert away from relying on a globally available, single memory pointer.