Skip to content

relaxed i32x4.trunc_sat_f32x4_{s,u} i32x4.trunc_sat_f64x2_{s,u}_zero #21

Open
@ngzhian

Description

@ngzhian
  1. What are the instructions being proposed?

Relaxed versions of:

  • i32x4.trunc_sat_f32x4_s
  • i32x4.trunc_sat_f32x4_u
  • i32x4.trunc_sat_f64x2_s_zero
  • i32x4.trunc_sat_f64x2_u_zero

from Simd128. (Names undecided)

  1. What are the semantics of these instructions?

Convert f32x4/f64x2 to i32x4 with truncation (signed/unsigned). If the inputs are out of range or NaNs, the result is implementation-defined.

  1. How will these instructions be implemented? Give examples for at least
    x86-64 and ARM64. Also provide reference implementation in terms of 128-bit
    Wasm SIMD.

x86/64

relaxed i32x4.trunc_sat_f32x4_s = CVTTPS2DQ
relaxed i32x4.trunc_sat_f32x4_u = VCVTTPS2UDQ (AVX512), Simd128 i32x4.trunc_sat_f32x4_u otherwise (can be slightly optimized to ignore NaNs)
relaxed i32x4.trunc_sat_f64x2_s_zero = CVTTPD2DQ
relaxed i32x4.trunc_sat_f64x2_u_zero = VCVTTPD2UDQ (AVX512), Simd128 i32x4.trunc_sat_f64x2_u_zero

ARM64

relaxed i32x4.trunc_sat_f32x4_s = FCVTZS
relaxed i32x4.trunc_sat_f32x4_u = FCVTZU
relaxed i32x4.trunc_sat_f64x2_s_zero = FCVTZS + SQXTN
relaxed i32x4.trunc_sat_f64x2_u_zero = FCVTZU + UQXTN

ARM NEON

relaxed i32x4.trunc_sat_f32x4_s = vcvt.S32.F32
relaxed i32x4.trunc_sat_f32x4_u = vcvt.U32.F32
relaxed i32x4.trunc_sat_f64x2_s_zero = vcvt.S32.F64 + vcvt.S32.F64 + vmov
relaxed i32x4.trunc_sat_f64x2_u_zero = vcvt.U32.F64 + vcvt.U32.F64 + vmov

Note: On ARM MVE, double precision conversions require Armv8-M Floating-point Extension (FPv5), MVE can be implemented with or without such an extension.

simd128

respective non-relaxed versions i32x4.trunc_sat_f32x4_s, i32x4.trunc_sat_f32x4_u, i32x4.trunc_sat_f64x2_s_zero, i32x4.trunc_sat_f64x2_u_zero.

  1. How does behavior differ across processors? What new fingerprinting surfaces will be exposed?

For i32x4.trunc_sat_f32x4_s:

  • x86/64 will return 0x8000000 in lanes for out of range or NaNs
  • ARM/ARM64 will return 0 for NaNs and saturated results of out of range

For i32x4.trunc_sat_f32x4_u:

  • x86/64 will return 0xFFFFFFFF in lanes for out of range or NaNs, if if AVX512 is available, 0 otherwise (but require more instruction counts)
  • ARM/ARM64 will return 0 for NaNs and saturated results of out of range

For i32x4.trunc_sat_f64x2_s_zero:

  • x86/64, 0x80000000 for out of range or NaNs
  • ARM/ARM64 will return 0 for NaNs and saturated results of out of range

For i32x4.trunc_sat_f64x2_u_zero:

  • x86/64, 0xFFFFFFFF for out of range or NaNs if AVX512 is available, 0 otherwise
  • ARM/ARM64 will return 0 for NaNs and saturated results of out of range
  1. What use cases are there?

Conversion instructions are common, if the application can guarantee the input range we can get good performance on all architectures.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions