Inefficient x64 codegen for 8x16 shifts

While attempting to lower `shl` and `shr` (https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#bit-shifts) in cranelift, I observed that following instructions would involve a non-optimal lowering to x86:
 - `i8x16.shl`
 - `i8x16.shr_s`
 - `i8x16.shr_u`
 - `i64x2.shr_s`

I see that, e.g., v8 lowers `i8x16.shl` to 10 instructions (https://github.com/v8/v8/blob/5097dcb706b4438cf2ba3da5dfacfbc36643759c/src/compiler/backend/x64/code-generator-x64.cc#L3358-L3398) and am concerned that unsuspecting users may be hit with performance cliffs on x86. Are there better ideas out there on how to lower this? Are there workloads out there that would use these instructions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inefficient x64 codegen for 8x16 shifts #117

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inefficient x64 codegen for 8x16 shifts #117

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions