Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Inefficient x64 codegen for conversion instructions #190

Open
@abrown

Description

@abrown

Certain SIMD conversions seem to have inefficient lowerings in x64. f32x4.convert_i32x4_u is lowered to 8 instruction by v8. The signed version, f32x4.convert_i32x4_s, on the other hand, is lowered to a single instruction.

I can't find the v8 implementation for i32x4.trunc_sat_f32x4_s and i32x4.trunc_sat_f32x4_u but I think the situation is the same: the signed version should have a single instruction lowering to CVTTPS2DQ and the unsigned version will require some longer sequence. [edit: this is incorrect, see #173 for a more correct discussion of this inefficiency]

The 64x2 versions of these instructions were dropped in #178. For similar reasons (@ngzhian: "because it is uncommon for such instructions to be used, and hardware support is not widespread"), should we remove the unsigned versions?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions