[wasm][aot] Optimize 64 bit const shuffles. Otherwise prefer vector swizzle. #115351

lewing · 2025-05-07T00:53:51Z

Fall back to element access for 64x2 const elements, otherwise prefer the vectorized version.

dotnet-policy-service · 2025-05-07T00:54:35Z

Tagging subscribers to this area: @steveisok, @vitek-karas
See info in area-owners.md if you want to be subscribed.

Copilot

Pull Request Overview

This PR optimizes the implementation of the OP_WASM_SIMD_SWIZZLE operation for constant indices. Key changes include a revised handling of constant versus non‐constant swizzle index vectors, the removal of an early bitcast of rhs, and an updated combination of computed index vectors using a bitwise OR instead of addition.

src/mono/mono/mini/mini-llvm.c

lewing · 2025-05-08T21:30:06Z

The codegen for the const case is not pretty but it is roughly equivalent to the old codegen.

out of curiosity I checked what the codegen for i64x2 non-const case looks like where llvm synthesizes min

 local $4 v128
 local.get $0
 local.get $1
 v128.load align:4    [SIMD]
 local.get $2
 v128.load align:4    [SIMD]
 local.tee $4
 v128.const 0x00000000000000020000000000000002    [SIMD]
 v128.const 0xffffffffffffffffffffffffffffffff    [SIMD]
 v128.const 0x00000000000000000000000000000000    [SIMD]
 local.get $4
 i64x2.extract.lane 0    [SIMD]
 i64.const 2
 i64.lt.u
 select
 i64.const -1
 i64.const 0
 local.get $4
 i64x2.extract.lane 1    [SIMD]
 i64.const 2
 i64.lt.u
 select
 i64x2.replace.lane 1    [SIMD]
 v128.bitselect    [SIMD]
 i32.const 3
 i8x16.shl    [SIMD]
 v128.const 0x08080808080808080000000000000000    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.const 0x07060504030201000706050403020100    [SIMD]
 v128.or    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.store    [SIMD]

when the intrinsic exists

 local.get $0
 local.get $1
 v128.load align:4    [SIMD]
 local.get $2
 v128.load align:4    [SIMD]
 v128.const 0x00000004000000040000000400000004    [SIMD]
 i32x4.min.u    [SIMD]
 i32.const 2
 i8x16.shl    [SIMD]
 v128.const 0x0c0c0c0c080808080404040400000000    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.const 0x03020100030201000302010003020100    [SIMD]
 v128.or    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.store    [SIMD]

lewing · 2025-05-08T21:57:50Z

I made it fall back to the old code only for 64x2, but that case should really just be written by hand.

Optimize OP_WASM_SIMD_SWIZZLE for constant indices

b206e6e

github-actions bot added the area-Codegen-LLVM-mono label May 7, 2025

dotnet-policy-service bot assigned lewing May 7, 2025

Fix invalid handling

fb38dca

lewing requested a review from kg May 7, 2025 03:09

lewing marked this pull request as ready for review May 7, 2025 03:09

Copilot AI review requested due to automatic review settings May 7, 2025 03:09

lewing requested review from steveisok and vitek-karas as code owners May 7, 2025 03:09

Copilot AI reviewed May 7, 2025

View reviewed changes

src/mono/mono/mini/mini-llvm.c Outdated Show resolved Hide resolved

src/mono/mono/mini/mini-llvm.c Outdated Show resolved Hide resolved

lewing requested a review from radekdoulik May 7, 2025 03:53

kg reviewed May 7, 2025

View reviewed changes

src/mono/mono/mini/mini-llvm.c Outdated Show resolved Hide resolved

This was referenced May 8, 2025

[mono][wasm] intrinsics improvements #114295

Open

[Perf] Linux/x64: 3 Regressions on 5/5/2025 10:55:09 PM +00:00 dotnet/perf-autofiling-issues#55187

Open

[Perf] Linux/x64: 1 Regression on 5/5/2025 5:34:36 PM +00:00 dotnet/perf-autofiling-issues#55186

Open

Add invalid indices improvements to the non-const case

1a0f2df

lewing added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 8, 2025

Only extract/insert for 64x2

a11ba1f

lewing changed the title ~~[wasm][aot] Optimize OP_WASM_SIMD_SWIZZLE for constant indices~~ [wasm][aot] Optimize 64 bit const shuffles. Otherwise prefer vector swizzle. May 8, 2025

reformat

e1eb5d9

lewing removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 8, 2025

This was referenced May 9, 2025

[Mono/linux-arm64] int32_t CryptoNative_EvpPKeyBits(EVP_PKEY *): Assertion `pkey != NULL' failed. #110952

Open

android-arm Release AllSubsets_Mono crashes in System.Runtime.Tests with missing opcode #115404

Closed

Merge branch 'main' into aot-shuffle-const

11af10b

build-analysis bot mentioned this pull request Jun 2, 2025

ExplicitConversion_FromSingle failing due to NaN != NaN #103347

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wasm][aot] Optimize 64 bit const shuffles. Otherwise prefer vector swizzle. #115351

[wasm][aot] Optimize 64 bit const shuffles. Otherwise prefer vector swizzle. #115351

Uh oh!

lewing commented May 7, 2025 •

edited

Loading

Uh oh!

dotnet-policy-service bot commented May 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lewing commented May 8, 2025 •

edited

Loading

Uh oh!

lewing commented May 8, 2025

Uh oh!

Uh oh!

[wasm][aot] Optimize 64 bit const shuffles. Otherwise prefer vector swizzle. #115351

Are you sure you want to change the base?

[wasm][aot] Optimize 64 bit const shuffles. Otherwise prefer vector swizzle. #115351

Uh oh!

Conversation

lewing commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented May 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lewing commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lewing commented May 8, 2025

Uh oh!

Uh oh!

lewing commented May 7, 2025 •

edited

Loading

lewing commented May 8, 2025 •

edited

Loading