-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow not copying the wasm binary into the module when using WASM C API #3389
Allow not copying the wasm binary into the module when using WASM C API #3389
Conversation
not very sure about it. the binary content(in wasm_module_t) will be used during execution. It's on purpose to release the binary when destroying store. |
I think we can have a try, it really increases the memory usage if wasm-c-api clones another copy, and if it doesn't do that, the content of input buffer from developer may be modified and the buffer will be referred by runtime after loading, the difference is that developer can not use the input buffer to new a module again. |
Is that the case? When opening this PR, I had it tested only in interpreter mode and I thought it wouldn't work with AOT (that's why I created it as a draft and didn't bother fixing the CI errors). |
d1bb328
to
293cfcd
Compare
if this means IIUC, the big concern is function bytecodes. Classic-interp always needs the binary content. fast-interp doesn't because of recompilation. xxx_jit needs the content only during bytecodes to IRs translation. aot is there ⬆️. XIP always depends on the binary content. Then, next problem will be const strings. Luck for us, some function, like |
iirc, it isn't by luck. |
Done that, it works fine. I wasn't precise there, you can't release right after module instantiation, you have to wait until wasm execution (e.g.
Just tried with classic interpreter (I had only tried fast interpreter and AOT successfully) and as you say this PR doesn't work there.
What about those functions? Why do they need special treatment? During testing I didn't run into any problems with those. So changes in this PR only work for fast interpreter and AOT (without XIP). Possibly JIT too, but I haven't tested it yet. |
This PR is not about stream-loading. It tries to save memory when using WAMR by allowing to release the wasm binary buffer before starting the wasm execution. |
293cfcd
to
627cff4
Compare
the "make a copy instead of keeping references" aspect is basically same. |
iirc, one of implications of is_load_from_file_buf=false is to make a copy. i guess you can somehow use it. |
I see that a copy is done anyway regardless of that flag
That flag seems to be used to shift the string
But again, if we avoid freeing the module buffer until the wasm execution that shouldn't be needed. |
627cff4
to
7971818
Compare
in the latter case, the user-given buffer is used.
i don't understand what you mean. can you explain a bit? |
I was wrong, I wasn't considering cases like this one #3389 (comment) |
24cfd0b
to
56a4f5d
Compare
56a4f5d
to
20416a9
Compare
doc/memory_tune.md
Outdated
@@ -30,3 +30,4 @@ Normally there are some methods to tune the memory usage: | |||
- set the app heap size with `wasm_runtime_instantiate` | |||
- use `nostdlib` mode, add `-Wl,--strip-all`: refer to [How to reduce the footprint](./build_wasm_app.md#2-how-to-reduce-the-footprint) of building wasm app for more details | |||
- use XIP mode, refer to [WAMR XIP (Execution In Place) feature introduction](./xip.md) for more details | |||
- when using the Wasm C API, set `clone_wasm_binary=false` in `LoadArgs` and free the wasm binary buffer (with `wasm_byte_vec_delete`) after module loading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed with @lum1n0us, per our understanding, currently it is only available for fast interpreter mode and AOT mode. For classic interpreter and all JIT modes, the bytecode is used by the interpreter or by the JIT compiler after loading, we may try to clone the bytecode to fix it, but we can do it in another PR.
So could you mention this here to avoid misunderstanding?
And could you also add another item for wasm/AOT loader, since now we can set wasm_binary_freeable=true
in LoadArgs
and free the wasm binary buffer after module loading. And similar, it's only available for fast interpreter mode and AOT mode now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed with @lum1n0us, per our understanding, currently it is only available for fast interpreter mode and AOT mode
What is supposed to break with classic interpreter? Because I gave it a quick try and seemed to run fine.
Anyway, I updated the the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bytecodes in code section is used in classic interpreter after loading:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/interpreter/wasm_loader.c#L3700-L3710
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3389 (comment) So, yes, I actually free the wasm binary buffer after module instantiation, not loading. In that way, it works with classic interpreter too.
In my case it doesn't make much of a difference if the binary is freed after loading or after instantiation. The main goal is to free it before wasm execution, because the execution will start using additional memory and we don't want to overhead of the binary buffer.
20416a9
to
5abaac0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please correct me if I am wrong. Your examples will free/release/pollute binary content after module loading (before instantiation). And it works well in classic-interp, fast-interp, jit and aot. |
my impression is that this requires users too much implementation knowledge. how about providing an api to query if it's safe to free the buffer? eg. bool has_reference_to_underlying_byte_vec(const wasm_module_t *module); |
ca604e6
to
afaef9b
Compare
afaef9b
to
7e94c60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
*/ | ||
WASM_RUNTIME_API_EXTERN bool | ||
wasm_runtime_is_underlying_binary_freeable( | ||
const wasm_module_inst_t module_inst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be to clone the data segments in wasm file but not to clone the data_dropped bitmap. Anyway, let's do it in another PR.
i still think the load_arg->wasm_binary_freeable api is not a good idea. btw, can you stop marking comments "resolved" so eagerly? |
@yamt, do you mean that developer doesn't need to use wasm_runtime_is_underlying_binary_freeable/wasm_runtime_is_underlying_binary_freeable API to check again? For example, if he wants to save memory, he can just set option in LoadArgs to express the intent, and runtime does all the things, and then he can just free the input binary after loading. I strongly agree that since it simplifies the operations and makes less confusing, and we may clone data segments for wasm module, and even more, clone bytecode if needed, e.g. for classic interpreter and lazy jit. How do you think? |
I'm not deleting them, you can still check them if needed. If I already addressed the comment, I prefer to resolved it so that I can better track what still needs to be discussed or addressed.
I guess it's more about avoiding the extra arguments in |
with the current api, a user needs something complex to decide wasm_binary_freeable. if (get_package_type(buffer) == aot) {
loadarg.wasm_binary_freeable = true;
} else {
#if WASM_ENABLE_FAST_INTERP != 0
loadarg.wasm_binary_freeable = true;
#else
loadarg.wasm_binary_freeable = false;
#endif
} i meant it can be loadarg.want_early_free = true; /* just a hint to the runtime */ |
Would it make sense to assume it to always be true? So that we don't need |
Is that to allow the developer to free the wasm binary after wasm/aot loading? And do you mean to use the same option for both wasm-c-api and wamr api? And what should the behavior be for wasm/aot loader and wamr-c-api loader? |
I think sometimes developer cannot free the input buffer, seems we had better add an option. |
it would have a negative effect to increase memory usage unless a user actually tries to free the buffer early with wasm_runtime_is_underlying_binary_freeable. |
it tells the runtime to perform something which possibly allows early free (typically making extra copies)
maybe. i haven't thought much about wasm-c-api yet.
i'm not sure if i understand this sentence. |
Yes, how about
By default wasm-c-api will clone the binary and already allows to early free, so using the option doesn't make sense for it. Maybe we can also keep
I mean the wamr API wasm_runtime_load, wasm_runtime_load_ex and wasm-c-api wasm_module_new, wasm_module_new_ex. Suggest to keep the behavior of wasm_runtime_load and wasm_module_new unchanged, and change the behavior of wasm_runtime_load_ex and wasm_module_new_ex according to the option. |
Been trying to use some c sample with threads (https://github.com/bytecodealliance/wasm-micro-runtime/tree/591a20b91741c24cdb9c031d3c5eec1d80bd72dc/core/iwasm/libraries/lib-wasi-threads/test) compiled with wasi-sdk. I see that in such case passive data segments are used and not all are dropped after
Both memory.init and memory.drop are called during instantiation in the wasm start function.
Cloning the data segments into the data module would be the only way to free the wasm binary buffer in that case. |
the test only makes sense after instantiation. i'm not sure what you are trying to test.
it's expected for the data segment for shared memory initialization.
what's "the data module"? |
either names are fine for me. my point is to make it a hint so that the runtime can make the decision by itself.
i guess it has pros and cons.
i agree the user-visible semantics of old/standard api should not be changed. |
for (i = 0; i < module->data_seg_count; i++) | ||
if (!bh_bitmap_get_bit( | ||
((WASMModuleInstance *)module_inst)->e->common.data_dropped, | ||
i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you assuming this module instance is the only one which is based on the underlying buffer?
it seems wrong because our instantiation api allows to instantiate multiple instances from a single wasm_module_t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that was a simplification, but as you say it doesn't work if there are multiple instances created from the same module. We should be able to remove that check if we clone the data segments into the module, if I understand correctly.
Yes, I was calling
Yes, I was reading the specs and that's expected when using
Sorry, I just meant the module. Since passive data segments can be referred to after module instantiation, I think we need to clone them into the module if we want to free the wasm binary buffer. |
Follow-up on #3389, specifically: #3389 (comment) If we want to free the wasm binary buffer early, we need to clone the data segments into the module. That's because, in case of [passive data segments](https://webassembly.github.io/threads/core/syntax/modules.html#syntax-data), they can be referred during wasm execution.
@yamt since I want to merge branch main into dev/checkpoint_and_restore for @victoryang00 to have a rebase for PR #3289, I merged this PR. I think it basically implemented the ideas we discussed above, (maybe except change the option name loadarg.wasm_binary_freeable to loadard.allow_early_free?). If you prefer to use |
i still think it should be runtime's decision, not user's. |
Not sure what runtime decision means, do you mean that we don't provide |
with the current api, the decision to use the loagarg option is a bit too complex for users to make. as the runtime itself should have a better knowledge, it's better to make users just express the intention |
OK, IIUC, I think you mean that developer sets a flag in loadarg to express that he wants to reduce the memory usage, and runtime tries to achieve it, there might be two options for runtime, one is to clone the data and allow the developer to free the input wasm buffer, the other is to let wasm module refer to the input buffer and not to allow freeing it. But I wonder it may be very complex for runtime, what the strategy should be? @eloparco any idea from you? Thanks. |
Isn't that similar to what we do already? We set |
the problem in the current api is that it's difficult for a generic embedder (eg. iwasm) to decide optimal load_args parameters. |
The
wasm_module_new
function copies the WASM binary passed as an argument into the module.This PR allows passing an additional flag to
wasm_module_new_ex
, to avoid that copy. In that way, the binary can be manually released after module loading (instead of having to wait for the store to be destroyed).