This repository was archived by the owner on Dec 22, 2021. It is now read-only.
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
Inefficient x64 codegen for 8x16 shifts #117
Open
Description
While attempting to lower shl
and shr
(https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#bit-shifts) in cranelift, I observed that following instructions would involve a non-optimal lowering to x86:
i8x16.shl
i8x16.shr_s
i8x16.shr_u
i64x2.shr_s
I see that, e.g., v8 lowers i8x16.shl
to 10 instructions (https://github.com/v8/v8/blob/5097dcb706b4438cf2ba3da5dfacfbc36643759c/src/compiler/backend/x64/code-generator-x64.cc#L3358-L3398) and am concerned that unsuspecting users may be hit with performance cliffs on x86. Are there better ideas out there on how to lower this? Are there workloads out there that would use these instructions?