Support WebAssembly memory instructions on big-endian platforms

#### Feature


Current wasmtime maps the WebAssembly memory instructions (t.load, t.store etc.) directly to Cranelift IR memory instructions (load, store, uloadN, etc.).

This causes problems on big-endian platforms, because the Cranelift IR instruction are implemented as native load and store instructions using the machine byte order, while the WebAssembly memory instructions are specified to use little-endian byte order always.

Now, I initially thought that one way to solve this problem could be to treat Cranelift IR memory instructions also as always little-endian by specification.  However, that does not work, because there are many other uses of these instructions that do require native byte order.

Some examples of those include:

- Memory accesses added by platform ABI code (implicit pointers for argument or return values), in particular if this needs to be compatible with native code.

- Memory accesses to values prepared by trampoline code at the boundaries of VM native code and JITted code.

- Memory accesses to parts of the VMContext that is also accessed by VM native code.

In addition, there are cases where -while not strictly necessary for correctness- it is preferable for performance reasons to use native byte order, e.g. for spill code, for accessing variables on the stack, when implementing code such as inlined copies for small memcpy etc.

So, I believe we need some way of representing *both* always-little-endian memory operations (used to translate
the WebAssembly instructions), and native memory operation (used for everything else).

#### Benefit



Enabling support for Wasmtime on big-endian platforms like IBM Z.

#### Implementation


My current implementation of this approach simply duplicates all Cranelift IR memory instructions to create always-LE versions.  So in addition to "load" there is "load_le" etc. The full list is:
  load_le
  load_le_complex
  store_le
  store_le_complex
  uload16_le
  uload16_le_complex
  sload16_le
  sload16_le_complex
  istore16_le
  istore16_le_complex
  uload32_le
  uload32_le_complex
  sload32_le
  sload32_le_complex
  istore32_le
  istore32_le_complex

Advantages of this approach include:
- Most code that creates load/store instructions can remain unchanged, the WebAssembly translator simply always uses the new instructions.
- It's already implemented and working :-)

But there are disadvantages:
- All back-ends must implement all the new instructions (usually by   just mapping them back to normal loads/stores), or else the target will stop working.
- Middle-end code changes operating on loads/stores (e.g. the code that recognizes and creates _complex operations) should be adapted or else we can get performance regressions.


#### Alternatives


There's various alternative ways this could be implemented:

A) Add an additional flag argument to load/store instructions that
specifies the requested byte order.  A detail question is whether the flag is
  little-endian vs. native
  little-endian vs. big-endian
  little-endian vs. big-endian vs. native

Advantages:
- no new IR instructions required
- existing back-ends could simply ignore the flag

Disadvantages:
- it's still an IR change as that flag must be considered part of the IR (e.g. parsing IR, serialization ...)
- All creators of loads/stores (including outside of cranelift, and possibly even outside of wasmtime!) must be updated.  If there is no "native" flag setting, all those updates must include finding out the native byte order somehow.

B) Add an additional flag bit to the existing MemFlags
Advantages:
- no new IR (should be covered by existing serialization ...)
- can be ignored by existing back-ends
- no change to (most) creators of loads/stores necessary

Disadvantages:
- MemFlags can no longer be dropped, it becomes required for correctness to always preserve in in the middle end

C) Open-code the conversion in the WebAssembly translator
Only emit a plain "load" if the target is little-endian.  Otherwise emit a load followed by a byte-swap (possibly followed by an extension). Vice-versa for stores.  This would probably require addition of a new "bswap" Cranelift IR instruction, unless we want to open-code bswap itself as well (possible, but a bit tedious).

Advantages:
- Only "bswap" as new IR element, can be ignored by back-ends for little-endian architectures and everywhere else.

Disadvantages:
- No major ones  I can see - this would be my preferred approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support WebAssembly memory instructions on big-endian platforms #2124

Feature

Benefit

Implementation

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support WebAssembly memory instructions on big-endian platforms #2124

Description

Feature

Benefit

Implementation

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions