`wasmi_core`: add support for the Wasm `simd` proposal #1395

Robbepop · 2025-03-16T10:24:01Z

Implements wasmi_core part of #1364.

This PR implements the simd submodule in wasmi_core which provides basic types and functionality for Wasm simd proposal support in Wasmi. This includes the V128 type, several lane types, and the entire Wasm simd proposal API which can then be used in Wasmi in const-evaluation, execution and initializer expressions.

Removes the value128 crate feature from wasmi_core.
Adds the simd crate feature to wasmi_core.

ToDo: Instructions

Status	Wasm `simd` Instruction
✅	`v128.const(imm: ImmByte[16]) -> v128`
✅	`i8x16.splat(x: i32) -> v128`
✅	`i16x8.splat(x: i32) -> v128`
✅	`i32x4.splat(x: i32) -> v128`
✅	`i64x2.splat(x: i64) -> v128`
✅	`f32x4.splat(x: f32) -> v128`
✅	`f64x2.splat(x: f64) -> v128`

✅	`i8x16.extract_lane_s(a: v128, imm: ImmLaneIdx16) -> i32`
✅	`i8x16.extract_lane_u(a: v128, imm: ImmLaneIdx16) -> i32`
✅	`i16x8.extract_lane_s(a: v128, imm: ImmLaneIdx8) -> i32`
✅	`i16x8.extract_lane_u(a: v128, imm: ImmLaneIdx8) -> i32`
✅	`i32x4.extract_lane(a: v128, imm: ImmLaneIdx4) -> i32`
✅	`i64x2.extract_lane(a: v128, imm: ImmLaneIdx2) -> i64`
✅	`f32x4.extract_lane(a: v128, imm: ImmLaneIdx4) -> f32`
✅	`f64x2.extract_lane(a: v128, imm: ImmLaneIdx2) -> f64`
✅	`i8x16.replace_lane(a: v128, imm: ImmLaneIdx16, x: i32) -> v128`
✅	`i16x8.replace_lane(a: v128, imm: ImmLaneIdx8, x: i32) -> v128`
✅	`i32x4.replace_lane(a: v128, imm: ImmLaneIdx4, x: i32) -> v128`
✅	`i64x2.replace_lane(a: v128, imm: ImmLaneIdx2, x: i64) -> v128`
✅	`f32x4.replace_lane(a: v128, imm: ImmLaneIdx4, x: f32) -> v128`
✅	`f64x2.replace_lane(a: v128, imm: ImmLaneIdx2, x: f64) -> v128`

✅	`i8x16.shuffle(a: v128, b: v128, imm: ImmLaneIdx32[16]) -> v128`
✅	`i8x16.swizzle(a: v128, s: v128) -> v128`

✅	`i8x16.add(a: v128, b: v128) -> v128`
✅	`i16x8.add(a: v128, b: v128) -> v128`
✅	`i32x4.add(a: v128, b: v128) -> v128`
✅	`i64x2.add(a: v128, b: v128) -> v128`
✅	`i8x16.sub(a: v128, b: v128) -> v128`
✅	`i16x8.sub(a: v128, b: v128) -> v128`
✅	`i32x4.sub(a: v128, b: v128) -> v128`
✅	`i64x2.sub(a: v128, b: v128) -> v128`
✅	`i16x8.mul(a: v128, b: v128) -> v128`
✅	`i32x4.mul(a: v128, b: v128) -> v128`
✅	`i64x2.mul(a: v128, b: v128) -> v128`
✅	`i32x4.dot_i16x8_s(a: v128, b: v128) -> v128`
✅	`i8x16.neg(a: v128) -> v128`
✅	`i16x8.neg(a: v128) -> v128`
✅	`i32x4.neg(a: v128) -> v128`
✅	`i64x2.neg(a: v128) -> v128`

✅	`i16x8.extmul_low_i8x16_s(a: v128, b: v128) -> v128`
✅	`i16x8.extmul_high_i8x16_s(a: v128, b: v128) -> v128`
✅	`i16x8.extmul_low_i8x16_u(a: v128, b: v128) -> v128`
✅	`i16x8.extmul_high_i8x16_u(a: v128, b: v128) -> v128`
✅	`i32x4.extmul_low_i16x8_s(a: v128, b: v128) -> v128`
✅	`i32x4.extmul_high_i16x8_s(a: v128, b: v128) -> v128`
✅	`i32x4.extmul_low_i16x8_u(a: v128, b: v128) -> v128`
✅	`i32x4.extmul_high_i16x8_u(a: v128, b: v128) -> v128`
✅	`i64x2.extmul_low_i32x4_s(a: v128, b: v128) -> v128`
✅	`i64x2.extmul_high_i32x4_s(a: v128, b: v128) -> v128`
✅	`i64x2.extmul_low_i32x4_u(a: v128, b: v128) -> v128`
✅	`i64x2.extmul_high_i32x4_u(a: v128, b: v128) -> v128`
✅	`i16x8.extadd_pairwise_i8x16_s(a: v128) -> v128`
✅	`i16x8.extadd_pairwise_i8x16_u(a: v128) -> v128`
✅	`i32x4.extadd_pairwise_i16x8_s(a: v128) -> v128`
✅	`i32x4.extadd_pairwise_i16x8_u(a: v128) -> v128`

✅	`i8x16.add_sat_s(a: v128, b: v128) -> v128`
✅	`i8x16.add_sat_u(a: v128, b: v128) -> v128`
✅	`i16x8.add_sat_s(a: v128, b: v128) -> v128`
✅	`i16x8.add_sat_u(a: v128, b: v128) -> v128`
✅	`i8x16.sub_sat_s(a: v128, b: v128) -> v128`
✅	`i8x16.sub_sat_u(a: v128, b: v128) -> v128`
✅	`i16x8.sub_sat_s(a: v128, b: v128) -> v128`
✅	`i16x8.sub_sat_u(a: v128, b: v128) -> v128`

✅	`i16x8.q15mulr_sat_s(a: v128, b: v128) -> v128`

✅	`i8x16.min_s(a: v128, b: v128) -> v128`
✅	`i8x16.min_u(a: v128, b: v128) -> v128`
✅	`i16x8.min_s(a: v128, b: v128) -> v128`
✅	`i16x8.min_u(a: v128, b: v128) -> v128`
✅	`i32x4.min_s(a: v128, b: v128) -> v128`
✅	`i32x4.min_u(a: v128, b: v128) -> v128`
✅	`i8x16.max_s(a: v128, b: v128) -> v128`
✅	`i8x16.max_u(a: v128, b: v128) -> v128`
✅	`i16x8.max_s(a: v128, b: v128) -> v128`
✅	`i16x8.max_u(a: v128, b: v128) -> v128`
✅	`i32x4.max_s(a: v128, b: v128) -> v128`
✅	`i32x4.max_u(a: v128, b: v128) -> v128`

✅	`i8x16.avgr_u(a: v128, b: v128) -> v128`
✅	`i16x8.avgr_u(a: v128, b: v128) -> v128`
✅	`i8x16.abs(a: v128) -> v128`
✅	`i16x8.abs(a: v128) -> v128`
✅	`i32x4.abs(a: v128) -> v128`
✅	`i64x2.abs(a: v128) -> v128`

✅	`i8x16.shl(a: v128, y: i32) -> v128`
✅	`i16x8.shl(a: v128, y: i32) -> v128`
✅	`i32x4.shl(a: v128, y: i32) -> v128`
✅	`i64x2.shl(a: v128, y: i32) -> v128`
✅	`i8x16.shr_s(a: v128, y: i32) -> v128`
✅	`i8x16.shr_u(a: v128, y: i32) -> v128`
✅	`i16x8.shr_s(a: v128, y: i32) -> v128`
✅	`i16x8.shr_u(a: v128, y: i32) -> v128`
✅	`i32x4.shr_s(a: v128, y: i32) -> v128`
✅	`i32x4.shr_u(a: v128, y: i32) -> v128`
✅	`i64x2.shr_s(a: v128, y: i32) -> v128`
✅	`i64x2.shr_u(a: v128, y: i32) -> v128`

✅	`v128.and(a: v128, b: v128) -> v128`
✅	`v128.or(a: v128, b: v128) -> v128`
✅	`v128.xor(a: v128, b: v128) -> v128`
✅	`v128.not(a: v128) -> v128`
✅	`v128.andnot(a: v128, b: v128) -> v128`

✅	`v128.bitselect(v1: v128, v2: v128, c: v128) -> v128`
✅	`i8x16.popcnt(v: v128) -> v128`
✅	`v128.any_true(a: v128) -> i32`
✅	`i8x16.all_true(a: v128) -> i32`
✅	`i16x8.all_true(a: v128) -> i32`
✅	`i32x4.all_true(a: v128) -> i32`
✅	`i64x2.all_true(a: v128) -> i32`
✅	`i8x16.bitmask(a: v128) -> i32`
✅	`i16x8.bitmask(a: v128) -> i32`
✅	`i32x4.bitmask(a: v128) -> i32`
✅	`i64x2.bitmask(a: v128) -> i32`

✅	`i8x16.eq(a: v128, b: v128) -> v128`
✅	`i16x8.eq(a: v128, b: v128) -> v128`
✅	`i32x4.eq(a: v128, b: v128) -> v128`
✅	`i64x2.eq(a: v128, b: v128) -> v128`
✅	`f32x4.eq(a: v128, b: v128) -> v128`
✅	`f64x2.eq(a: v128, b: v128) -> v128`
✅	`i8x16.ne(a: v128, b: v128) -> v128`
✅	`i16x8.ne(a: v128, b: v128) -> v128`
✅	`i32x4.ne(a: v128, b: v128) -> v128`
✅	`i64x2.ne(a: v128, b: v128) -> v128`
✅	`f32x4.ne(a: v128, b: v128) -> v128`
✅	`f64x2.ne(a: v128, b: v128) -> v128`
✅	`i8x16.lt_s(a: v128, b: v128) -> v128`
✅	`i8x16.lt_u(a: v128, b: v128) -> v128`
✅	`i16x8.lt_s(a: v128, b: v128) -> v128`
✅	`i16x8.lt_u(a: v128, b: v128) -> v128`
✅	`i32x4.lt_s(a: v128, b: v128) -> v128`
✅	`i32x4.lt_u(a: v128, b: v128) -> v128`
✅	`i64x2.lt_s(a: v128, b: v128) -> v128`
✅	`f32x4.lt(a: v128, b: v128) -> v128`
✅	`f64x2.lt(a: v128, b: v128) -> v128`
✅	`i8x16.le_s(a: v128, b: v128) -> v128`
✅	`i8x16.le_u(a: v128, b: v128) -> v128`
✅	`i16x8.le_s(a: v128, b: v128) -> v128`
✅	`i16x8.le_u(a: v128, b: v128) -> v128`
✅	`i32x4.le_s(a: v128, b: v128) -> v128`
✅	`i32x4.le_u(a: v128, b: v128) -> v128`
✅	`i64x2.le_s(a: v128, b: v128) -> v128`
✅	`f32x4.le(a: v128, b: v128) -> v128`
✅	`f64x2.le(a: v128, b: v128) -> v128`
✅	`i8x16.gt_s(a: v128, b: v128) -> v128`
✅	`i8x16.gt_u(a: v128, b: v128) -> v128`
✅	`i16x8.gt_s(a: v128, b: v128) -> v128`
✅	`i16x8.gt_u(a: v128, b: v128) -> v128`
✅	`i32x4.gt_s(a: v128, b: v128) -> v128`
✅	`i32x4.gt_u(a: v128, b: v128) -> v128`
✅	`i64x2.gt_s(a: v128, b: v128) -> v128`
✅	`f32x4.gt(a: v128, b: v128) -> v128`
✅	`f64x2.gt(a: v128, b: v128) -> v128`
✅	`i8x16.ge_s(a: v128, b: v128) -> v128`
✅	`i8x16.ge_u(a: v128, b: v128) -> v128`
✅	`i16x8.ge_s(a: v128, b: v128) -> v128`
✅	`i16x8.ge_u(a: v128, b: v128) -> v128`
✅	`i32x4.ge_s(a: v128, b: v128) -> v128`
✅	`i32x4.ge_u(a: v128, b: v128) -> v128`
✅	`i64x2.ge_s(a: v128, b: v128) -> v128`
✅	`f32x4.ge(a: v128, b: v128) -> v128`
✅	`f64x2.ge(a: v128, b: v128) -> v128`

✅	`v128.load(m: memarg) -> v128`
✅	`v128.load32_zero(m: memarg) -> v128`
✅	`v128.load64_zero(m: memarg) -> v128`
✅	`v128.load8_splat(m: memarg) -> v128`
✅	`v128.load16_splat(m: memarg) -> v128`
✅	`v128.load32_splat(m: memarg) -> v128`
✅	`v128.load64_splat(m: memarg) -> v128`
✅	`v128.load8_lane(m: memarg, x: v128, imm: ImmLaneIdx16) -> v128`
✅	`v128.load16_lane(m: memarg, x: v128, imm: ImmLaneIdx8) -> v128`
✅	`v128.load32_lane(m: memarg, x: v128, imm: ImmLaneIdx4) -> v128`
✅	`v128.load64_lane(m: memarg, x: v128, imm: ImmLaneIdx2) -> v128`
✅	`v128.load8x8_s(m: memarg)`
✅	`v128.load8x8_u(m: memarg)`
✅	`v128.load16x4_s(m: memarg)`
✅	`v128.load16x4_u(m: memarg)`
✅	`v128.load32x2_s(m: memarg)`
✅	`v128.load32x2_u(m: memarg)`

✅	`v128.store(m: memarg, data: v128)`
✅	`v128.store8_lane(m: memarg, data: v128, imm: ImmLaneIdx16)`
✅	`v128.store16_lane(m: memarg, data: v128, imm: ImmLaneIdx8)`
✅	`v128.store32_lane(m: memarg, data: v128, imm: ImmLaneIdx4)`
✅	`v128.store64_lane(m: memarg, data: v128, imm: ImmLaneIdx2)`

✅	`f32x4.neg(a: v128) -> v128`
✅	`f64x2.neg(a: v128) -> v128`
✅	`f32x4.abs(a: v128) -> v128`
✅	`f64x2.abs(a: v128) -> v128`
✅	`f32x4.min(a: v128, b: v128) -> v128`
✅	`f64x2.min(a: v128, b: v128) -> v128`
✅	`f32x4.max(a: v128, b: v128) -> v128`
✅	`f64x2.max(a: v128, b: v128) -> v128`
✅	`f32x4.pmin(a: v128, b: v128) -> v128`
✅	`f64x2.pmin(a: v128, b: v128) -> v128`
✅	`f32x4.pmax(a: v128, b: v128) -> v128`
✅	`f64x2.pmax(a: v128, b: v128) -> v128`
✅	`f32x4.add(a: v128, b: v128) -> v128`
✅	`f64x2.add(a: v128, b: v128) -> v128`
✅	`f32x4.sub(a: v128, b: v128) -> v128`
✅	`f64x2.sub(a: v128, b: v128) -> v128`
✅	`f32x4.div(a: v128, b: v128) -> v128`
✅	`f64x2.div(a: v128, b: v128) -> v128`
✅	`f32x4.mul(a: v128, b: v128) -> v128`
✅	`f64x2.mul(a: v128, b: v128) -> v128`
✅	`f32x4.sqrt(a: v128) -> v128`
✅	`f64x2.sqrt(a: v128) -> v128`
✅	`f32x4.ceil(a: v128) -> v128`
✅	`f64x2.ceil(a: v128) -> v128`
✅	`f32x4.floor(a: v128) -> v128`
✅	`f64x2.floor(a: v128) -> v128`
✅	`f32x4.trunc(a: v128) -> v128`
✅	`f64x2.trunc(a: v128) -> v128`
✅	`f32x4.nearest(a: v128) -> v128`
✅	`f64x2.nearest(a: v128) -> v128`

✅	`f32x4.convert_i32x4_s(a: v128) -> v128`
✅	`f32x4.convert_i32x4_u(a: v128) -> v128`
✅	`f64x2.convert_low_i32x4_s(a: v128) -> v128`
✅	`f64x2.convert_low_i32x4_u(a: v128) -> v128`
✅	`i32x4.trunc_sat_f32x4_s(a: v128) -> v128`
✅	`i32x4.trunc_sat_f32x4_u(a: v128) -> v128`
✅	`i32x4.trunc_sat_f64x2_s_zero(a: v128) -> v128`
✅	`i32x4.trunc_sat_f64x2_u_zero(a: v128) -> v128`
✅	`f32x4.demote_f64x2_zero(a: v128) -> v128`
✅	`f64x2.promote_low_f32x4(a: v128) -> v128`

✅	`i8x16.narrow_i16x8_s(a: v128, b: v128) -> v128`
✅	`i8x16.narrow_i16x8_u(a: v128, b: v128) -> v128`
✅	`i16x8.narrow_i32x4_s(a: v128, b: v128) -> v128`
✅	`i16x8.narrow_i32x4_u(a: v128, b: v128) -> v128`

✅	`i16x8.extend_low_i8x16_s(a: v128) -> v128`
✅	`i16x8.extend_high_i8x16_s(a: v128) -> v128`
✅	`i16x8.extend_low_i8x16_u(a: v128) -> v128`
✅	`i16x8.extend_high_i8x16_u(a: v128) -> v128`
✅	`i32x4.extend_low_i16x8_s(a: v128) -> v128`
✅	`i32x4.extend_high_i16x8_s(a: v128) -> v128`
✅	`i32x4.extend_low_i16x8_u(a: v128) -> v128`
✅	`i32x4.extend_high_i16x8_u(a: v128) -> v128`
✅	`i64x2.extend_low_i32x4_s(a: v128) -> v128`
✅	`i64x2.extend_high_i32x4_s(a: v128) -> v128`
✅	`i64x2.extend_low_i32x4_u(a: v128) -> v128`
✅	`i64x2.extend_high_i32x4_u(a: v128) -> v128`

codecov · 2025-03-16T10:28:35Z

Codecov Report

Attention: Patch coverage is 9.64286% with 253 lines in your changes missing coverage. Please review.

Project coverage is 69.35%. Comparing base (6735bf9) to head (b9c5446).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/core/src/simd.rs	9.64%	253 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1395      +/-   ##
==========================================
- Coverage   70.52%   69.35%   -1.18%     
==========================================
  Files         157      158       +1     
  Lines       14414    14695     +281     
==========================================
+ Hits        10165    10191      +26     
- Misses       4249     4504     +255

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This is the skeleton with which is it going to be possible to implement most of the V128 API in an efficient way.

This will help later to remove the lint again.

This now exposes `load`, `load_at`, `store` and `store_at` functions additionally to the load_extend and store_wrap ones. This way we no longer require the weird `WrapInto` and `ExtendInto` impls for `T -> T`.

This allows to infer the LaneIdx type from more types such as i8, u32, f64, etc..

Robbepop · 2025-03-20T14:08:28Z

The wasmi_core features required for Wasm simd proposal support have all been implemented by now.

The next step is to add Wasm simd instruction variants to wasmi_ir's Instruction type.

The new `simd` crate feature is a full replacement for it.

- Fixes an overflow issue in the avgr_u SIMD instructions. - Now uses div_ceil as suggested by clippy. - Deduplicated logic via macro.

add simd crate feature to wasmi_core crate

86a9890

Robbepop added 2 commits March 16, 2025 21:28

implement V128 skeleton

5c7a578

This is the skeleton with which is it going to be possible to implement most of the V128 API in an efficient way.

rename IntoLane and Lane (appending 's')

53e526f

Robbepop force-pushed the rf-implement-simd-proposal branch from 0095b78 to 53e526f Compare March 16, 2025 20:33

Robbepop added 25 commits March 16, 2025 21:54

add docs to most (or all) simd.rs definitions

769727a

add note comments to unsafe transmutes

a8c0ae9

implement SIMD splat instructions

bc87e4c

add extract_lane SIMD APIs

d437879

expect dead_code warnings

64d3c18

This will help later to remove the lint again.

remove empty line

1deaba6

remove non-existing extract_lane APIs

d8b74ad

add replace_lane SIMD APIs

5952ab1

add V128 From impls for bytes and i128

9336f6b

add SIMD binary integer math ops

9c15875

add unary integer math SIMD ops

f532fed

add SIMD integer abs ops

5e55754

add unsigned SIMD lanes types

2181ab5

use unsigned SIMD types where applicable for now

43635a3

improve docs for IntoLanes and IntoLaneIdx

0b134b3

add helper IntoLanewiseWidening trait

42cc71c

clean-up API design

ca36c58

add SIMD ext_mul_{low,high} ops

62dea1c

generate docs for new SIMD extmul ops

01cf099

add extadd_pairwise SIMD ops

431ee78

implement add_sat and sub_sat SIMD ops

7dece0a

add i16x8_q15mulr_sat_s SIMD operation

2f3c753

add SIMD integer min and max operations

edd0ea9

add SIMD avgr_u ops

9edb2aa

implement Wasm shift SIMD ops

2f9f0c0

Robbepop mentioned this pull request Mar 19, 2025

Move wasmi_core memory utilities into its own memory submodule #1400

Merged

Robbepop added 7 commits March 19, 2025 18:12

Merge branch 'main' into rf-implement-simd-proposal

cd11fe2

remove ImmByte since it is unused

ba56cb9

rename OutOfBoundsLaneId

aa72956

merge macro invocations

48c4d4a

re-design extmul macros

8055b3d

refactor macros

68da596

refactor memory submodule

52e266e

This now exposes `load`, `load_at`, `store` and `store_at` functions additionally to the load_extend and store_wrap ones. This way we no longer require the weird `WrapInto` and `ExtendInto` impls for `T -> T`.

Robbepop mentioned this pull request Mar 19, 2025

Refactor wasmi_core::memory submodule #1401

Merged

Robbepop added 11 commits March 19, 2025 20:35

Merge branch 'main' into rf-implement-simd-proposal

630481a

add v128 store instructions

22cd724

add v128.load evaluators

b6b355b

add LaneIndex trait

63601ae

add more IntoLaneIdx trait impls

29ea33b

This allows to infer the LaneIdx type from more types such as i8, u32, f64, etc..

add impls for v128.load{32,64}_zero

f23072c

update docs

8f01d8a

add v128.loadN_splat instruction impls

f375a29

add v128.loadN_lane SIMD instruction impls

62f2895

update docs error sections

1ac376a

implement V128 load_mxn_{s,u} ops

c5895d0

Robbepop added 4 commits March 20, 2025 15:23

configure Cargo to generate simd docs

fd2d37a

remove value128 crate feature

c80fcb0

The new `simd` crate feature is a full replacement for it.

fix avgr SIMD instructions and clippy warnings

d25333b

- Fixes an overflow issue in the avgr_u SIMD instructions. - Now uses div_ceil as suggested by clippy. - Deduplicated logic via macro.

rename some FromNarrow methods to please clippy

d4d95eb

Robbepop mentioned this pull request Mar 13, 2025

Implement the Wasm simd proposal #1364

Open

5 tasks

Robbepop changed the title ~~Add support for the Wasm simd proposal~~ wasmi_core: add support for the Wasm simd proposal Mar 20, 2025

add SIMD shuffle and swizzle ops

b9c5446

Robbepop merged commit fc0d538 into main Mar 20, 2025
17 of 19 checks passed

Robbepop deleted the rf-implement-simd-proposal branch March 20, 2025 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`wasmi_core`: add support for the Wasm `simd` proposal #1395

`wasmi_core`: add support for the Wasm `simd` proposal #1395

Robbepop commented Mar 16, 2025 •

edited

Loading

codecov bot commented Mar 16, 2025 •

edited

Loading

Robbepop commented Mar 20, 2025

wasmi_core: add support for the Wasm simd proposal #1395

wasmi_core: add support for the Wasm simd proposal #1395

Conversation

Robbepop commented Mar 16, 2025 • edited Loading

ToDo: Instructions

codecov bot commented Mar 16, 2025 • edited Loading

Codecov Report

Robbepop commented Mar 20, 2025

`wasmi_core`: add support for the Wasm `simd` proposal #1395

`wasmi_core`: add support for the Wasm `simd` proposal #1395

Robbepop commented Mar 16, 2025 •

edited

Loading

codecov bot commented Mar 16, 2025 •

edited

Loading