Description
- What are the instructions being proposed?
Relaxed versions of:
i32x4.trunc_sat_f32x4_s
i32x4.trunc_sat_f32x4_u
i32x4.trunc_sat_f64x2_s_zero
i32x4.trunc_sat_f64x2_u_zero
from Simd128. (Names undecided)
- What are the semantics of these instructions?
Convert f32x4/f64x2 to i32x4 with truncation (signed/unsigned). If the inputs are out of range or NaNs, the result is implementation-defined.
- How will these instructions be implemented? Give examples for at least
x86-64 and ARM64. Also provide reference implementation in terms of 128-bit
Wasm SIMD.
x86/64
relaxed i32x4.trunc_sat_f32x4_s
= CVTTPS2DQ
relaxed i32x4.trunc_sat_f32x4_u
= VCVTTPS2UDQ (AVX512), Simd128 i32x4.trunc_sat_f32x4_u
otherwise (can be slightly optimized to ignore NaNs)
relaxed i32x4.trunc_sat_f64x2_s_zero
= CVTTPD2DQ
relaxed i32x4.trunc_sat_f64x2_u_zero
= VCVTTPD2UDQ (AVX512), Simd128 i32x4.trunc_sat_f64x2_u_zero
ARM64
relaxed i32x4.trunc_sat_f32x4_s
= FCVTZS
relaxed i32x4.trunc_sat_f32x4_u
= FCVTZU
relaxed i32x4.trunc_sat_f64x2_s_zero
= FCVTZS + SQXTN
relaxed i32x4.trunc_sat_f64x2_u_zero
= FCVTZU + UQXTN
ARM NEON
relaxed i32x4.trunc_sat_f32x4_s
= vcvt.S32.F32
relaxed i32x4.trunc_sat_f32x4_u
= vcvt.U32.F32
relaxed i32x4.trunc_sat_f64x2_s_zero
= vcvt.S32.F64 + vcvt.S32.F64 + vmov
relaxed i32x4.trunc_sat_f64x2_u_zero
= vcvt.U32.F64 + vcvt.U32.F64 + vmov
Note: On ARM MVE, double precision conversions require Armv8-M Floating-point Extension (FPv5), MVE can be implemented with or without such an extension.
simd128
respective non-relaxed versions i32x4.trunc_sat_f32x4_s
, i32x4.trunc_sat_f32x4_u
, i32x4.trunc_sat_f64x2_s_zero
, i32x4.trunc_sat_f64x2_u_zero
.
- How does behavior differ across processors? What new fingerprinting surfaces will be exposed?
For i32x4.trunc_sat_f32x4_s
:
- x86/64 will return
0x8000000
in lanes for out of range or NaNs - ARM/ARM64 will return 0 for NaNs and saturated results of out of range
For i32x4.trunc_sat_f32x4_u
:
- x86/64 will return
0xFFFFFFFF
in lanes for out of range or NaNs, if if AVX512 is available,0
otherwise (but require more instruction counts) - ARM/ARM64 will return 0 for NaNs and saturated results of out of range
For i32x4.trunc_sat_f64x2_s_zero
:
- x86/64,
0x80000000
for out of range or NaNs - ARM/ARM64 will return 0 for NaNs and saturated results of out of range
For i32x4.trunc_sat_f64x2_u_zero
:
- x86/64,
0xFFFFFFFF
for out of range or NaNs if AVX512 is available,0
otherwise - ARM/ARM64 will return 0 for NaNs and saturated results of out of range
- What use cases are there?
Conversion instructions are common, if the application can guarantee the input range we can get good performance on all architectures.