Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Inefficient x64 codegen for 8x16 shifts #117

Open
@abrown

Description

@abrown

While attempting to lower shl and shr (https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#bit-shifts) in cranelift, I observed that following instructions would involve a non-optimal lowering to x86:

  • i8x16.shl
  • i8x16.shr_s
  • i8x16.shr_u
  • i64x2.shr_s

I see that, e.g., v8 lowers i8x16.shl to 10 instructions (https://github.com/v8/v8/blob/5097dcb706b4438cf2ba3da5dfacfbc36643759c/src/compiler/backend/x64/code-generator-x64.cc#L3358-L3398) and am concerned that unsuspecting users may be hit with performance cliffs on x86. Are there better ideas out there on how to lower this? Are there workloads out there that would use these instructions?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions