Implement ShuffleNative methods and optimise Shuffle for non-constant indices#99596
Implement ShuffleNative methods and optimise Shuffle for non-constant indices#99596tannergooding merged 57 commits intodotnet:mainfrom
ShuffleNative methods and optimise Shuffle for non-constant indices#99596Conversation
|
Note regarding the |
|
Benchmark results of my AVX2 code ( Yes, this is a very micro benchmark, but results are pretty reproducible on my machine (within ~%10 usually), and are probably pretty close to reality since it should be pretty quick (but obviously this doesn't measure the overhead with surrounding code due to more pipeline usage, etc.). (edit: I'm unsure if there was an issue with this benchmark turning into a no-op, as it seems quite fast) |
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
MihaZupan
left a comment
There was a problem hiding this comment.
Thank you for looking into this again.
We're already using the so-far-internal Vector128.ShuffleUnsafe in a bunch of places. Should we be using Vector256.ShuffleUnsafe somewhere?
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
It seems to me that all the current uses of |
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs
Outdated
Show resolved
Hide resolved
|
Can someone please check I won't accidentally regress mono :) |
|
Mono changes look good to me. Thanks for your contribution. |
|
Re: 9868e73 |
I don't think there's any issue with the runtime relying on specific behaviour. For external libraries, I think one of the following approaches makes sense:
I think the approach needs to be consistent for all of them, so I removed the Another option, which I briefly mentioned in a comment somewhere, is to expose a variant like |
|
I'm fine with only documenting "anything above 15 is UB". |
Yes, I've been careful to not use the AVX-512 one for this method for this reason. I will add a comment at some point to explain this in the method (assuming I don't forget). |
I've implemented a solution for the reflection thing in 363ae94. Let me know if you want me to revert it, otherwise it's fixed, just not in an ideal way imo, for now. |
|
/azp list |
This comment was marked as resolved.
This comment was marked as resolved.
|
/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, Fuzzlyn, runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-avx512 |
|
Azure Pipelines successfully started running 5 pipeline(s). |
Missing `!= nullptr` & typo in comment & add additional comments in a few places.
- This should fix the mono issues (wasm impl assumes indices are constant) - Additionally ensure V128.Shuffle(byte/sbyte) is vectorised for all mono platforms with ssse3 or arm64 advsimd or packedsimd, when given variable indices

Shufflewith variable indices on coreclr (for all types)ShuffleonVector256(with signed/unsigned bytes and shorts)Vector256shuffle withAvx2.Shuffle(for signed/unsigned bytes and shorts)VectorXxx.Shufflewith constant indices when not crossing lanes, when zeroing, with repeated pattern within each lane, etc.Todo tasks:
VectorXXX.ShuffleNativefor vectors of other element typesShuffleinShuffleNativefallbackUp-to-date codegen (note: the amount of cases I tested is perhaps a bit over the top, so check the c# code to see what cases you'd like to look at before looking through all the assembly - the main ones for non-constant indices are the
IndirectIndirectones; should include most improved scenarios): CodegenCodegen (outdated, some cases have gotten better, doesn't include codegen for all relevant platforms (e.g., AVX-512 and arm64), and doesn't include all improved scenarios):
Shuffle With AVX2
ShuffleUnsafe With AVX2
Shuffle With Sse4.2
ShuffleUnsafe With Sse4.2