Skip to content

[wasm][aot] Optimize 64 bit const shuffles. Otherwise prefer vector swizzle. #115351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

lewing
Copy link
Member

@lewing lewing commented May 7, 2025

Fall back to element access for 64x2 const elements, otherwise prefer the vectorized version.

Copy link
Contributor

Tagging subscribers to this area: @steveisok, @vitek-karas
See info in area-owners.md if you want to be subscribed.

@lewing lewing requested a review from kg May 7, 2025 03:09
@lewing lewing marked this pull request as ready for review May 7, 2025 03:09
@Copilot Copilot AI review requested due to automatic review settings May 7, 2025 03:09
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the implementation of the OP_WASM_SIMD_SWIZZLE operation for constant indices. Key changes include a revised handling of constant versus non‐constant swizzle index vectors, the removal of an early bitcast of rhs, and an updated combination of computed index vectors using a bitwise OR instead of addition.

@lewing lewing requested a review from radekdoulik May 7, 2025 03:53
@lewing lewing added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 8, 2025
@lewing
Copy link
Member Author

lewing commented May 8, 2025

The codegen for the const case is not pretty but it is roughly equivalent to the old codegen.

out of curiosity I checked what the codegen for i64x2 non-const case looks like where llvm synthesizes min

 local $4 v128
 local.get $0
 local.get $1
 v128.load align:4    [SIMD]
 local.get $2
 v128.load align:4    [SIMD]
 local.tee $4
 v128.const 0x00000000000000020000000000000002    [SIMD]
 v128.const 0xffffffffffffffffffffffffffffffff    [SIMD]
 v128.const 0x00000000000000000000000000000000    [SIMD]
 local.get $4
 i64x2.extract.lane 0    [SIMD]
 i64.const 2
 i64.lt.u
 select
 i64.const -1
 i64.const 0
 local.get $4
 i64x2.extract.lane 1    [SIMD]
 i64.const 2
 i64.lt.u
 select
 i64x2.replace.lane 1    [SIMD]
 v128.bitselect    [SIMD]
 i32.const 3
 i8x16.shl    [SIMD]
 v128.const 0x08080808080808080000000000000000    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.const 0x07060504030201000706050403020100    [SIMD]
 v128.or    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.store    [SIMD]

when the intrinsic exists

 local.get $0
 local.get $1
 v128.load align:4    [SIMD]
 local.get $2
 v128.load align:4    [SIMD]
 v128.const 0x00000004000000040000000400000004    [SIMD]
 i32x4.min.u    [SIMD]
 i32.const 2
 i8x16.shl    [SIMD]
 v128.const 0x0c0c0c0c080808080404040400000000    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.const 0x03020100030201000302010003020100    [SIMD]
 v128.or    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.store    [SIMD]

@lewing lewing changed the title [wasm][aot] Optimize OP_WASM_SIMD_SWIZZLE for constant indices [wasm][aot] Optimize 64 bit const shuffles. Otherwise prefer vector swizzle. May 8, 2025
@lewing lewing removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 8, 2025
@lewing
Copy link
Member Author

lewing commented May 8, 2025

I made it fall back to the old code only for 64x2, but that case should really just be written by hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants