This repository was archived by the owner on Dec 22, 2021. It is now read-only.
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
Inefficient x64 codegen for float->int truncation #173
Closed
Description
Unfortunately, x64 codegen - at least as employed by v8 - for i32x4.trunc_sat_f32x4_s is really elaborate:
This is 7 instructions for what could be 1 instruction in x64 if NaN handling or overflow behavior didn't have to match the specified one.
Is there any way this can be improved? I don't have a specific suggestion, but this costs ~10% of instructions (not sure how to measure cycle impact accurately) on one of my functions.