Optimize `array.fill` with more libcalls, optimize `array.copy` for more collectors

In https://github.com/bytecodealliance/wasmtime/pull/13382 I'm applying an optimization where `array.fill` for `i8`-element arrays to be optimized to a `memset` on the host. This is relatively easy to do because `memory.fill` already has the infrastructure for this on the host and `array.fill` is just reusing it. The intended benefit of this is that we get to use the host's vectorized routines for `array.fill` as opposed to a per-byte-loop within CLIF. This benefit, however, is also theoretically applicable for elements of other sizes (e.g. all the way up to 128-bits). Implementing this, however, would require new libcalls on the host, for example `memory.fill{16,32,64,128}`. 

This is doable without too too much effort, but this was left out of #13382 because it's not clear whether this is worth it. It'd likely be useful to investigate sibling peer compilers to see what they do in the face of `array.fill` or similar for larger-than-8-bit-types.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `array.fill` with more libcalls, optimize `array.copy` for more collectors #13386

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize array.fill with more libcalls, optimize array.copy for more collectors #13386

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Optimize `array.fill` with more libcalls, optimize `array.copy` for more collectors #13386