Skip to content

Commit 96ec4b9

Browse files
hamarb123tannergoodingEgorBo
authored
Implement ShuffleNative methods and optimise Shuffle for non-constant indices (#99596)
* Squash into 1 commit * Remove internal dependency on ShuffleUnsafe's behaviour wrt high bit * Optimise some codegen - Optimise comparison in `gtNewSimdShuffleNodeVariable` for xarch - Optimise for constant vector in Vector256.Shuffle{Unsafe} when have AVX2 only * jit format * jit format * Simplify logic for using Shuffle for ShuffleUnsafe * Move `ShuffleUnsafeModified` out of `Base64Helper` - This was requested feedback via Discord * Remove unnecessary `CompExactlyDependsOn` and `using`s * Update SearchValues.cs * Support AVX-512/AVX-10.1 acceleration of Shuffle V128<ulong/long/double> * Additional optimisation for V512 constant index shuffle - When 128-bit lanes are not crossed, emit vpshufb instead of vperm* * jit format & typo * Fix operand order * Changes to `IsValidForShuffle` & jit format - Ensure we use IsValidForShuffle correctly (i.e., ensure all cases that could be emittet at some point, are able to be done) - jit format changes * jit format * Update hwintrinsicxarch.cpp * Update hwintrinsicxarch.cpp * Update gentree.cpp - Implement preferring `vshufpd`/`vpshufd`/`vshufps` (when same size & possible & no zeroing) over `vpshufb` (when no crossing lane) over `vpermd`/`vpermps`/`vpermpd`/the ushort & byte equivalents in constant index case Update gentree.cpp - Continuation of previous commit, for the variable index shuffle stuff Update gentree.cpp Update gentree.cpp Fix typo * Make `op2DupSafe` be consistently ordered - Adjust code such that `op2DupSafe` is consistently ordered with respect to `retNode`(it happens to end up before now) - this is the likely cause of the test failures * jit format * Use `compIsEvexOpportunisticallySupported` instead of explicit AVX-10 in remaining places * jit format * Update gentree.cpp - Ensure `op1` side effects are done before `op2` side effects * Update Vector128.cs * Use `BlockNonDeterministicIntrinsics` instead of `CompExactlyDependsOn` - As per feedback * Ensure V128<byte> ShuffleUnsafe is not regressed on mono * Update gentree.cpp * Rename `ShuffleUnsafe` to `ShuffleNative` - And do a full review of the logic & comments of all my code - Fix 3 bugs (2 pre-existing) - Simplify/improve some code * jit format * jit format again * um * Move assertion on arm64 to correct spot * Normalize indices for `vpshufb` in all cases of constant indices Shuffle - This is a nice-to-have, not a correctness issue - it matches what we're doing throughout the rest of the function, and keeps the asm more readable imo * jit format * Revert last pair of commits - There's other places where they're not normalised. could be done, but not really worth the trouble probably * Feedback * Address some feedback & a bug fix - Implement a bunch of feedback - Fix an oversight in `IsValidForShuffle` where we don't treat something as variable indices that gets emitted as such * Ensure ShuffleNative's behaviour with reflection/function pointers/etc. is the same as a normal call - Not a particularly pretty solution to the problem, but it should work correctly at least * jit format & compilation error * jit format & fix buuld * jit format & fix build * jit format * Update hwintrinsicxarch.cpp * Update gentree.cpp Missing `!= nullptr` & typo in comment & add additional comments in a few places. * Fix mono code * Mono: also cast arguments * Move mono impl back into c# - This should fix the mono issues (wasm impl assumes indices are constant) - Additionally ensure V128.Shuffle(byte/sbyte) is vectorised for all mono platforms with ssse3 or arm64 advsimd or packedsimd, when given variable indices * Update Vector128.cs * um --------- Co-authored-by: Tanner Gooding <tagoo@outlook.com> Co-authored-by: Egor Bogatov <egorbo@gmail.com>
1 parent e9ebf41 commit 96ec4b9

File tree

23 files changed

+4694
-253
lines changed

23 files changed

+4694
-253
lines changed

src/coreclr/jit/compiler.h

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3340,11 +3340,19 @@ class Compiler
33403340
GenTree* gtNewSimdRoundNode(
33413341
var_types type, GenTree* op1, CorInfoType simdBaseJitType, unsigned simdSize);
33423342

3343+
GenTree* gtNewSimdShuffleVariableNode(var_types type,
3344+
GenTree* op1,
3345+
GenTree* op2,
3346+
CorInfoType simdBaseJitType,
3347+
unsigned simdSize,
3348+
bool isShuffleNative);
3349+
33433350
GenTree* gtNewSimdShuffleNode(var_types type,
33443351
GenTree* op1,
33453352
GenTree* op2,
33463353
CorInfoType simdBaseJitType,
3347-
unsigned simdSize);
3354+
unsigned simdSize,
3355+
bool isShuffleNative);
33483356

33493357
GenTree* gtNewSimdSqrtNode(
33503358
var_types type, GenTree* op1, CorInfoType simdBaseJitType, unsigned simdSize);
@@ -4669,7 +4677,11 @@ class Compiler
46694677
bool mustExpand);
46704678

46714679
#ifdef FEATURE_HW_INTRINSICS
4672-
bool IsValidForShuffle(GenTreeVecCon* vecCon, unsigned simdSize, var_types simdBaseType) const;
4680+
bool IsValidForShuffle(GenTree* indices,
4681+
unsigned simdSize,
4682+
var_types simdBaseType,
4683+
bool* canBecomeValid,
4684+
bool isShuffleNative) const;
46734685

46744686
GenTree* impHWIntrinsic(NamedIntrinsic intrinsic,
46754687
CORINFO_CLASS_HANDLE clsHnd,

0 commit comments

Comments
 (0)