Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector128.ShuffleUnsafe #88559

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2384,7 +2384,7 @@ public static Vector128<sbyte> Shuffle(Vector128<sbyte> vector, Vector128<sbyte>
[CompExactlyDependsOn(typeof(AdvSimd))]
[CompExactlyDependsOn(typeof(AdvSimd.Arm64))]
[CompExactlyDependsOn(typeof(PackedSimd))]
internal static Vector128<byte> ShuffleUnsafe(Vector128<byte> vector, Vector128<byte> indices)
public static Vector128<byte> ShuffleUnsafe(Vector128<byte> vector, Vector128<byte> indices)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anywhere else this should be consumed in dotnet/runtime as part of making it public?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I updated all such places when adding the internal helper as part of #80963

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I know you updated places in corelib. I was wondering if there were places outside of corelib you couldn't update. Sounds like no.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two places I saw where we still use the platform-specific instructions directly (BitArray and OptimizedInboxTextEncoder), but both of those are in larger platform-specific blocks of logic, so there was no reason to replace them.

{
if (Ssse3.IsSupported)
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics.Arm;
using System.Runtime.Intrinsics.Wasm;
using System.Runtime.Intrinsics.X86;

namespace System.Runtime.Intrinsics
Expand Down Expand Up @@ -2351,6 +2353,45 @@ public static Vector256<sbyte> Shuffle(Vector256<sbyte> vector, Vector256<sbyte>
return result;
}

/// <summary>Creates a new vector by selecting values from an input vector using a set of indices.
/// Behavior is platform-dependent for out-of-range indices.</summary>
/// <param name="vector">The input vector from which values are selected.</param>
/// <param name="indices">The per-element indices used to select a value from <paramref name="vector" />.</param>
/// <returns>A new vector containing the values from <paramref name="vector" /> selected by the given <paramref name="indices" />.</returns>
/// <remarks>Unlike Shuffle, this method delegates to the underlying hardware intrinsic without ensuring that <paramref name="indices"/> are normalized to [0, 31].
/// On hardware with <see cref="Avx2"/> support, indices are treated as modulo 32, and if the high bit is set, the result will be set to 0 for that element.
/// On hardware with <see cref="AdvSimd.Arm64"/> support, this method behaves the same as Shuffle.</remarks>
/// On hardware with <see cref="PackedSimd"/> support, indices are treated as modulo 32.
Comment on lines +2361 to +2364
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This remarks is confusing. Shuffle doesn't ensure the indices are normalized. It simply treats all out of range indices as setting the result to 0.

Unsafe simply changes how out of range indices are handled and allows it to differ based on the target hardware. In the case of xarch most out of range indices are handled as mod 32. But indices with the MSB set are set to 0 instead.

We can probably exclude the Arm64/Wasm callouts for V256/V512 as they aren't important. They can be combined into a general statement that other platforms will treat this the same as Shuffle

[MethodImpl(MethodImplOptions.AggressiveInlining)]
[CompExactlyDependsOn(typeof(Avx2))]
[CompExactlyDependsOn(typeof(AdvSimd))]
[CompExactlyDependsOn(typeof(AdvSimd.Arm64))]
[CompExactlyDependsOn(typeof(PackedSimd))]
public static Vector256<byte> ShuffleUnsafe(Vector256<byte> vector, Vector256<byte> indices)
{
if (Avx2.IsSupported)
{
return Avx2.Shuffle(vector, indices);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These hardware intrinsics have different semantics than Vector256/512 Shuffles.

Avx2.Shuffle is effectively two Vector128.Shuffles on each half. The indices are still [0, 15]. Similar for Avx512.

Vector256.Shuffle on the other hand works with the whole vector and allows you to pick and index [0, 31].

}

if (AdvSimd.Arm64.IsSupported)
{
(Vector128<byte>, Vector128<byte>) table = (vector.GetLower(), vector.GetUpper());
return Vector256.Create(
AdvSimd.Arm64.VectorTableLookup(table, indices.GetLower()),
AdvSimd.Arm64.VectorTableLookup(table, indices.GetUpper()));
}

if (PackedSimd.IsSupported)
{
return Vector256.Create(
PackedSimd.Shuffle(vector.GetLower(), vector.GetUpper(), indices.GetLower()),
PackedSimd.Shuffle(vector.GetLower(), vector.GetUpper(), indices.GetUpper()));
}
Comment on lines +2377 to +2390
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this additional handling for V256/V512. It isn't expected that users are calling these, let alone expecting good performance, when IsHardwareAccelerated reports false.

It's just causing the JIT to spend more time doing dead code elimination, managing the inliner budget, etc.


return Shuffle(vector, indices);
}

/// <summary>Creates a new vector by selecting values from an input vector using a set of indices.</summary>
/// <param name="vector">The input vector from which values are selected.</param>
/// <param name="indices">The per-element indices used to select a value from <paramref name="vector" />.</param>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics.Arm;
using System.Runtime.Intrinsics.X86;

namespace System.Runtime.Intrinsics
Expand Down Expand Up @@ -2404,6 +2405,41 @@ public static Vector512<sbyte> Shuffle(Vector512<sbyte> vector, Vector512<sbyte>
return result;
}

/// <summary>Creates a new vector by selecting values from an input vector using a set of indices.
/// Behavior is platform-dependent for out-of-range indices.</summary>
/// <param name="vector">The input vector from which values are selected.</param>
/// <param name="indices">The per-element indices used to select a value from <paramref name="vector" />.</param>
/// <returns>A new vector containing the values from <paramref name="vector" /> selected by the given <paramref name="indices" />.</returns>
/// <remarks>Unlike Shuffle, this method delegates to the underlying hardware intrinsic without ensuring that <paramref name="indices"/> are normalized to [0, 63].
/// On hardware with <see cref="Avx512BW"/> support, indices are treated as modulo 64, and if the high bit is set, the result will be set to 0 for that element.
/// On hardware with <see cref="AdvSimd.Arm64"/> support, this method behaves the same as Shuffle.</remarks>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
[CompExactlyDependsOn(typeof(Avx512BW))]
[CompExactlyDependsOn(typeof(AdvSimd))]
[CompExactlyDependsOn(typeof(AdvSimd.Arm64))]
public static Vector512<byte> ShuffleUnsafe(Vector512<byte> vector, Vector512<byte> indices)
{
if (Avx512BW.IsSupported)
{
return Avx512BW.Shuffle(vector, indices);
}

if (AdvSimd.Arm64.IsSupported)
{
(Vector128<byte>, Vector128<byte>, Vector128<byte>, Vector128<byte>) table =
(vector.GetLower().GetLower(), vector.GetLower().GetUpper(), vector.GetUpper().GetLower(), vector.GetUpper().GetUpper());
return Vector512.Create(
Vector256.Create(
AdvSimd.Arm64.VectorTableLookup(table, indices.GetLower().GetLower()),
AdvSimd.Arm64.VectorTableLookup(table, indices.GetLower().GetUpper())),
Vector256.Create(
AdvSimd.Arm64.VectorTableLookup(table, indices.GetUpper().GetLower()),
AdvSimd.Arm64.VectorTableLookup(table, indices.GetUpper().GetUpper())));
}

return Shuffle(vector, indices);
}

/// <summary>Creates a new vector by selecting values from an input vector using a set of indices.</summary>
/// <param name="vector">The input vector from which values are selected.</param>
/// <param name="indices">The per-element indices used to select a value from <paramref name="vector" />.</param>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics.Arm;
using System.Runtime.Intrinsics.Wasm;
using System.Runtime.Intrinsics.X86;

namespace System.Runtime.Intrinsics
{
Expand Down Expand Up @@ -2076,6 +2078,38 @@ public static Vector64<sbyte> Shuffle(Vector64<sbyte> vector, Vector64<sbyte> in
return result;
}

/// <summary>Creates a new vector by selecting values from an input vector using a set of indices.
/// Behavior is platform-dependent for out-of-range indices.</summary>
/// <param name="vector">The input vector from which values are selected.</param>
/// <param name="indices">The per-element indices used to select a value from <paramref name="vector" />.</param>
/// <returns>A new vector containing the values from <paramref name="vector" /> selected by the given <paramref name="indices" />.</returns>
/// <remarks>Unlike Shuffle, this method delegates to the underlying hardware intrinsic without ensuring that <paramref name="indices"/> are normalized to [0, 7].
/// On hardware with <see cref="Ssse3"/> support, indices are treated as modulo 8, and if the high bit is set, the result will be set to 0 for that element.
/// On hardware with <see cref="AdvSimd"/> or <see cref="PackedSimd"/> support, this method behaves the same as Shuffle.</remarks>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
[CompExactlyDependsOn(typeof(Ssse3))]
[CompExactlyDependsOn(typeof(AdvSimd))]
[CompExactlyDependsOn(typeof(PackedSimd))]
public static Vector64<byte> ShuffleUnsafe(Vector64<byte> vector, Vector64<byte> indices)
{
if (Ssse3.IsSupported)
{
return Ssse3.Shuffle(Vector128.Create(vector, vector), Vector128.Create(indices, Vector64<byte>.Zero)).GetLower();
}

if (AdvSimd.IsSupported)
{
return AdvSimd.VectorTableLookup(Vector128.Create(vector, Vector64<byte>.Zero), indices);
}

if (PackedSimd.IsSupported)
{
return PackedSimd.Swizzle(Vector128.Create(vector, Vector64<byte>.Zero), Vector128.Create(indices, Vector64<byte>.Zero)).GetLower();
}

return Shuffle(vector, indices);
}

/// <summary>Creates a new vector by selecting values from an input vector using a set of indices.</summary>
/// <param name="vector">The input vector from which values are selected.</param>
/// <param name="indices">The per-element indices used to select a value from <paramref name="vector" />.</param>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,7 @@ public static void CopyTo<T>(this System.Runtime.Intrinsics.Vector128<T> vector,
[System.CLSCompliantAttribute(false)]
public static System.Runtime.Intrinsics.Vector128<ulong> Shuffle(System.Runtime.Intrinsics.Vector128<ulong> vector, System.Runtime.Intrinsics.Vector128<ulong> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector128<double> Shuffle(System.Runtime.Intrinsics.Vector128<double> vector, System.Runtime.Intrinsics.Vector128<long> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector128<byte> ShuffleUnsafe(System.Runtime.Intrinsics.Vector128<byte> vector, System.Runtime.Intrinsics.Vector128<byte> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector128<T> Sqrt<T>(System.Runtime.Intrinsics.Vector128<T> vector) { throw null; }
#pragma warning disable CS8500 // This takes the address of, gets the size of, or declares a pointer to a managed type ('T')
[System.CLSCompliantAttribute(false)]
Expand Down Expand Up @@ -580,6 +581,7 @@ public static void CopyTo<T>(this System.Runtime.Intrinsics.Vector256<T> vector,
[System.CLSCompliantAttribute(false)]
public static System.Runtime.Intrinsics.Vector256<ulong> Shuffle(System.Runtime.Intrinsics.Vector256<ulong> vector, System.Runtime.Intrinsics.Vector256<ulong> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector256<double> Shuffle(System.Runtime.Intrinsics.Vector256<double> vector, System.Runtime.Intrinsics.Vector256<long> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector256<byte> ShuffleUnsafe(System.Runtime.Intrinsics.Vector256<byte> vector, System.Runtime.Intrinsics.Vector256<byte> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector256<T> Sqrt<T>(System.Runtime.Intrinsics.Vector256<T> vector) { throw null; }
#pragma warning disable CS8500 // This takes the address of, gets the size of, or declares a pointer to a managed type ('T')
[System.CLSCompliantAttribute(false)]
Expand Down Expand Up @@ -907,6 +909,7 @@ public static void CopyTo<T>(this System.Runtime.Intrinsics.Vector512<T> vector,
[System.CLSCompliantAttribute(false)]
public static System.Runtime.Intrinsics.Vector512<ulong> Shuffle(System.Runtime.Intrinsics.Vector512<ulong> vector, System.Runtime.Intrinsics.Vector512<ulong> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector512<double> Shuffle(System.Runtime.Intrinsics.Vector512<double> vector, System.Runtime.Intrinsics.Vector512<long> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector512<byte> ShuffleUnsafe(System.Runtime.Intrinsics.Vector512<byte> vector, System.Runtime.Intrinsics.Vector512<byte> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector512<T> Sqrt<T>(System.Runtime.Intrinsics.Vector512<T> vector) { throw null; }
#pragma warning disable CS8500 // This takes the address of, gets the size of, or declares a pointer to a managed type ('T')
[System.CLSCompliantAttribute(false)]
Expand Down Expand Up @@ -1202,6 +1205,7 @@ public static void CopyTo<T>(this System.Runtime.Intrinsics.Vector64<T> vector,
[System.CLSCompliantAttribute(false)]
public static System.Runtime.Intrinsics.Vector64<uint> Shuffle(System.Runtime.Intrinsics.Vector64<uint> vector, System.Runtime.Intrinsics.Vector64<uint> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector64<float> Shuffle(System.Runtime.Intrinsics.Vector64<float> vector, System.Runtime.Intrinsics.Vector64<int> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector64<byte> ShuffleUnsafe(System.Runtime.Intrinsics.Vector64<byte> vector, System.Runtime.Intrinsics.Vector64<byte> indices) { throw null; }
public static System.Runtime.Intrinsics.Vector64<T> Sqrt<T>(System.Runtime.Intrinsics.Vector64<T> vector) { throw null; }
#pragma warning disable CS8500 // This takes the address of, gets the size of, or declares a pointer to a managed type ('T')
[System.CLSCompliantAttribute(false)]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2157,8 +2157,14 @@ public void Vector128UInt64ShiftRightLogicalTest()
public void Vector128ByteShuffleOneInputTest()
{
Vector128<byte> vector = Vector128.Create((byte)1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);

Vector128<byte> result = Vector128.Shuffle(vector, Vector128.Create((byte)15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0));
for (int index = 0; index < Vector128<byte>.Count; index++)
{
Assert.Equal((byte)(Vector128<byte>.Count - index), result.GetElement(index));
}

result = Vector128.ShuffleUnsafe(vector, Vector128.Create((byte)15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0));
for (int index = 0; index < Vector128<byte>.Count; index++)
{
Assert.Equal((byte)(Vector128<byte>.Count - index), result.GetElement(index));
Expand Down Expand Up @@ -2277,7 +2283,12 @@ public void Vector128UInt64ShuffleOneInputTest()
public void Vector128ByteShuffleOneInputWithDirectVectorTest()
{
Vector128<byte> result = Vector128.Shuffle(Vector128.Create((byte)1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16), Vector128.Create((byte)15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0));
for (int index = 0; index < Vector128<byte>.Count; index++)
{
Assert.Equal((byte)(Vector128<byte>.Count - index), result.GetElement(index));
}

result = Vector128.ShuffleUnsafe(Vector128.Create((byte)1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16), Vector128.Create((byte)15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0));
for (int index = 0; index < Vector128<byte>.Count; index++)
{
Assert.Equal((byte)(Vector128<byte>.Count - index), result.GetElement(index));
Expand Down Expand Up @@ -2388,8 +2399,14 @@ public void Vector128ByteShuffleOneInputWithLocalIndicesTest()
{
Vector128<byte> vector = Vector128.Create((byte)1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
Vector128<byte> indices = Vector128.Create((byte)15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);

Vector128<byte> result = Vector128.Shuffle(vector, indices);
for (int index = 0; index < Vector128<byte>.Count; index++)
{
Assert.Equal((byte)(Vector128<byte>.Count - index), result.GetElement(index));
}

result = Vector128.ShuffleUnsafe(vector, indices);
for (int index = 0; index < Vector128<byte>.Count; index++)
{
Assert.Equal((byte)(Vector128<byte>.Count - index), result.GetElement(index));
Expand Down Expand Up @@ -2517,8 +2534,14 @@ public void Vector128UInt64ShuffleOneInputWithLocalIndicesTest()
public void Vector128ByteShuffleOneInputWithAllBitsSetIndicesTest()
{
Vector128<byte> vector = Vector128.Create((byte)1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);

Vector128<byte> result = Vector128.Shuffle(vector, Vector128<byte>.AllBitsSet);
for (int index = 0; index < Vector128<byte>.Count; index++)
{
Assert.Equal((byte)0, result.GetElement(index));
}

result = Vector128.ShuffleUnsafe(vector, Vector128<byte>.AllBitsSet);
for (int index = 0; index < Vector128<byte>.Count; index++)
{
Assert.Equal((byte)0, result.GetElement(index));
Expand Down Expand Up @@ -2637,8 +2660,14 @@ public void Vector128UInt64ShuffleOneInputWithAllBitsSetIndicesTest()
public void Vector128ByteShuffleOneInputWithZeroIndicesTest()
{
Vector128<byte> vector = Vector128.Create((byte)1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);

Vector128<byte> result = Vector128.Shuffle(vector, Vector128<byte>.Zero);
for (int index = 0; index < Vector128<byte>.Count; index++)
{
Assert.Equal((byte)1, result.GetElement(index));
}

result = Vector128.ShuffleUnsafe(vector, Vector128<byte>.Zero);
for (int index = 0; index < Vector128<byte>.Count; index++)
{
Assert.Equal((byte)1, result.GetElement(index));
Expand Down
Loading
Loading