-
Notifications
You must be signed in to change notification settings - Fork 57
[Interface Types] Avoid memory copy #88
Comments
How would you access the memory at the returned pointer from WebAssembly if it is not part of WebAssembly's linear memory? WebAssembly, by design, cannot access memory outside of linear memory, and removing that restriction would have a severe impact on performance and security. |
T * data() (pointer to a data) will return pointer to a linear memory, because List in all languages has a property: Okay maybe T * data() is not best example for all languages, consider instead the following interface for List:
Anyway this example is for List, but it is obvious that it could be done for any type Consider Dictionary, common api:
Using such technique we will have better performance, because we will delegate real access to data to underlining abi (abi of programming language that compiled application) ;) |
I am not sure what you mean. If you create a list in Java, JavaScript, python etc., it will be allocated on the process heap, not in WebAssembly linear memory. And in many languages, a list is not a "continuous array of bytes". For example, a list of strings would only contain pointers/references to strings, which are, again, not in linear memory, but instead somewhere on the process heap.
What would
What would |
Even in your example when list stores pointers, it stores them in linear memory layout ;)
T in case of string would also represent interface type string which could be accessed by the following api:
You will get interface type string and apply to it the same rule as above ;) |
No, not necessarily. That would be inefficient for languages that support sparse arrays, e.g., JavaScript. In other languages such as Java, it is up to the virtual machine implementation to choose an appropriate memory layout. Even if the host stores the pointers in a "linear memory layout", that is not the same as WebAssembly linear memory. In simple terms, WebAssembly cannot access memory that it did not allocate itself. There is no way for WebAssembly to access memory outside of its own address range. So let's assume you create a list in some high-level programming language. The list is somewhere on the process heap, outside of the WebAssembly address range. We can copy the list into WebAssembly linear memory, so now we have an array of pointers in WebAssembly linear memory. But now WebAssembly cannot access the memory behind the pointers, because these pointers point outside of WebAssembly's address range. So in order to access the values, we have to copy them to WebAssembly's address range, too.
Would |
Actually you according JavaDoc ArrayList use underneath array that is continuous memory:
Do not stick to Java Virtual Machine, my example is generic !!
String from whatever programming language will provide address by memaddr to it underlying linear memory and size without requiring the coping from on abi to another one ... |
Ah, I might have misunderstood you. Are you talking about WebAssembly code that was compiled from multiple programming languages, but that shares the same WebAssembly memory, meaning that all data exchange happens within WebAssembly, and not between WebAssembly and the host system?
You do realize that programming languages use different representations of strings in memory?
So if one programming language uses one encoding, and passes the address to the string to a different language that uses a different encoding, how would the second language read the string? |
Yes, you understood me correctly !!
Yes, of course I understood it, but it is not an issue from my point of view ) Anyway we will have better solution of than just coping memories from one place to another ... As I told previously in case of ten million element we will have performance hit, but with interface methods to container (string, vector, list, dictionary and ...) we would have the same perfomance as in module written by single language ;) |
Oh, okay, I don't know if that is a primary concern of this proposal. I was under the impression that this proposal is supposed to help with host-to-WebAssembly data transfer (and the other way around), not within WebAssembly. Isn't that already possible? You essentially only want additional functions, but couldn't you implement them without any extensions of the WebAssembly specification?
But in most programming languages (Java, JavaScript, Python, C++, C, ...), there is only one internal representation of strings that is universally understood. And in most languages, you would have to convert strings that use a different encoding to the internal encoding. For example, if you want to read a UTF-8 string in Java/JavaScript/Python, it will be converted to a string in the internal representation. And that conversion essentially copies the string, while re-encoding it. So either way you end up copying the entire string. |
When I've read interface types proposal from @linclark https://hacks.mozilla.org/2019/08/webassembly-interface-types/ article I've got that it will be done by coping type of one abi to another abi |
I am afraid that you are maybe both missing a key aspect of the problem. We are targeting interface types to the scenario involving limited trust. This is true for both accessing host functionality and for ‘shared nothing linking’ involving wasm - wasm modules. |
I don't think I am missing that aspect. That is why, from the very beginning, I suspected that this approach might not work because the modules don't have any way of addressing each other's memory. I only abandoned that point after @redradist insisted that both modules have access to the same memory. |
Actually it is not an issue either !! Please, do not mix two scenario by providing one solution, you're breaking SOLID principles, even first one S, - single responsibility ;) |
(I probably should have replied sooner than a month later, sorry!)
There are more trust boundaries than just between host and wasm module. Shared-nothing linking is a scheme by which two wasm modules can not trust each other but still call each others' APIs. If they agree on interface types as their ABI, they can 1) have their own separate data formats under the hood, and 2) make more guarantees about how their data gets exfiltrated. In particular, if module A doesn't export its memory, it can guarantee that module B cannot read it (modulo VM bugs). If two wasm modules are ok with sharing memory, they don't need ITs and can just share an ABI. If they aren't, there's very little they can do to interact today, and ITs can expand the set of capabilities we permit while not expanding the permissions we need to expose. So that's a reason to not special-case the Host when thinking about IT. |
Guys, will be some meeting regarding this issue again ? |
Sure, I'll add it to the issue-which-I-haven't-made-yet. Looking at this again with fresh eyes, I think there's a blended approach that kind-of-works in a way you describe. In particular: it should be possible to expose an interface to an object without needing to have access to the underlying memory. Now, this won't be efficient either, because you'll need to interact with the object exclusively through cross-module function calls, which should be low-ish overhead, but not no-overhead. Consider a module that exports:
(T can refer to either generics, or anyref, or one hardcoded type) In this scheme, we only copy on a per-element basis, but the array is managed completely by the owning module. So we don't need to share memory, but still have access to an interface published by a module, and can pass around handles to these Array objects. Note that this doesn't replace the need for array-like object support in Interface Types itself. There are some APIs where you do indeed want to copy the memory across an ownership boundary. In theory we could defer deciding what to do there until after MVP... but given the number of browser APIs that expect and return arrays, that puts a serious damper on the "viable" part of that, so I think we do still need a first-class memory-copying array primitive sooner rather than later. |
This is exact approach what I wanted that tried to introduce )
I agree with this, for some small structures we may want copying type complete across an ownership boundary. It could depend on size of structure: if size more than some boundary access through interface otherwise copy it |
I cannot see why strings must be copied from linear memory, can we just have a Wasm interface string_view type that would create a view onto memory or another string, without a copy? Why do strings even need to be immutable again? Solely because JavaScript requires it? |
Interface-typed values aren't normal first-class values, they are lazy expression which, when evaluated, enable copying from some concrete string representation (in wasm or the host) to some concrete string representation (in wasm or the host). Moreover, since interface values are affine, there's not even really a notion of them being "immutable", since they are only observed only the one time. If what you want is to pass around views of memory, that would be a separate proposal. There are a number of challenges with view-based approaches, though. One is that languages compiled to linear memory don't have a good way to access views without first copying them into linear memory. E.g., in C, a random |
The affinity of interface types seems only necessary because of destructors, yet, if I were not to associate a destructor with an interface value, why would it need to be affine?
This couldn't happen if the type never used a destructor in the first place; how often is it excepted that one would associate extraneous data with an interface value? As for viewing other memory, that might be better solved by mmapping or something else. |
There is also the issue of lazy lifting being potentially effectful (in the extreme: executing user code, e.g., as part of a generator).
Having considered these options over the years, I think there is a place for mmap and memory sharing, but for more-advanced situations and not as the universal basis for module composition. |
@00ff0000red This exactly what I want to lobby here ))) |
Hi all,
Seems like from proposal of interface type I've understood that adapter functions would copy types from one abi to another abi ?
If so than it can hit performance, consider case of one million elements in some data type ...
... and I would suggest the better approach ...
What if instead of coping we will use just interface like in Java, C#, C++ (abstract type) or Rust (traits) ?
Lets imagine that list (vector or some other type in language) will be presented through the bunch on functions, it will provide additional wrapper functions that should be called instead of native api of some language
Lets consider List ...
In the following language it has the following types:
From all this types we can see common semantic and instead of coping types between platforms lets add additional functions ...
I think will be better if these functions will be just interface and provide mechanism of reading and writing data in native abi of the wasm module
For List we can add the following common api:
This approach will increase overall performance, because it does not require coping data between languages abi, but just provides mechanism of accessing data from native abi
I wrote this idea to @linclark at twitter, but seems like she has lots of messages and she did not see mine, that is why I decided to write here ;)
The text was updated successfully, but these errors were encountered: