Constant folding for SIMD comparisons #85584

EgorBo · 2023-05-01T11:36:27Z

This PR extends constant folding for SIMD to support EQ/NE. Can also be extended for GT_GT/GE/LT/LE and, perhaps, floating point, but I was mainly interested in EQ/NE for integral types as I found a few real uses cases. Also, improved constant folding for RVA slightly.

bool Test()
{
    return IsHelloWorldString("Hello World SIMD constant folding!"u8);
}

bool IsHelloWorldString(ReadOnlySpan<byte> data)
{
    Debug.Assert(data.Length is >= 16 and <= 32);

    // Load data into two Vector128, the 2nd vector may overlap if needed
    ref byte ptr = ref MemoryMarshal.GetReference(data);
    Vector128<byte> v1 = Vector128.LoadUnsafe(ref ptr);
    Vector128<byte> v2 = Vector128.LoadUnsafe(ref ptr, (nuint)(data.Length - Vector128<byte>.Count));

    // The constant utf8 string we compare against:
    var cns = "Hello World SIMD constant folding!"u8;
    Vector128<byte> v1Cns = Vector128.Create(cns.Slice(0, Vector128<byte>.Count));
    Vector128<byte> v2Cns = Vector128.Create(cns.Slice(cns.Length - Vector128<byte>.Count));

    // Compare pairwise
    return ((v1 ^ v1Cns) | (v2 ^ v2Cns)) == Vector128<byte>.Zero;
}

Codegen for Test():

; Method Test():bool:this
       mov      eax, 1
       ret      
; Total bytes of code: 6

the whole thing is collapsed into just return true after inlining and constant folding.

ghost · 2023-05-01T11:36:38Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Can be easily extended for GT_GT/GE/LT/LE and, perhaps, floating point, but I was mainly interested in EQ/NE for integeres as I found a few real uses cases. E.g.

bool Test()
{
    var data = "hello world simd folding demo :)"u8;
    return IsHelloWorldString(ref MemoryMarshal.GetReference(data));
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
bool IsHelloWorldString(ref byte data)
{
    var v1 = Vector128.LoadUnsafe(ref data);
    var v2 = Vector128.LoadUnsafe(ref data, 16);

    var v1Cns = Vector128.Create("hello world simd"u8);
    var v2Cns = Vector128.Create(" folding demo :)"u8);

    return ((v1 ^ v1Cns) | (v2 ^ v2Cns)) == Vector128<byte>.Zero;
}

Codegen for Test():

; Method MyClass:Foo():bool:this
       mov      eax, 1
       ret      
; Total bytes of code: 6

the whole thing is collapsed into just return true after inlining and constant folding.

Author:	EgorBo
Assignees:	EgorBo
Labels:	`area-CodeGen-coreclr`
Milestone:	-

…old-eq-ne

tannergooding · 2023-05-01T21:03:17Z

src/coreclr/jit/simd.h

+template <typename TBase>
+TBase GetAllBitsSetScalar()
+{
+    uint8_t bitWidth = (sizeof(TBase) * 8);
+    return static_cast<TBase>((1ULL << bitWidth) - 1);
+}


Won't this fail for int64_t and uint64_t?

Why not simply return ~static_cast<TBase>(0);? Then for float/double you could specialize if needed.

Thanks, addressed

tannergooding · 2023-05-01T21:05:46Z

src/coreclr/jit/simd.h

+        case GT_EQ:
+        {
+            return arg0 == arg1 ? GetAllBitsSetScalar<TBase>() : 0;
+        }
+
+        case GT_NE:
+        {
+            return arg0 != arg1 ? GetAllBitsSetScalar<TBase>() : 0;
+        }


This is going to be incorrect for float/double since it will do a bitwise comparison.

You may need to special-case in EvaluateBinaryScalarSpecialized<float> before it calls this main method.

You might also be able to do it in EvaluateBinaryScalar instead but special-case GetAllBitsSetScalar<float>(), which is probably simpler overall.

This path is never taken for floating point, I added asserts just in case. I was mostly interested in non-fp cases and these can be added if needed

Any reason we aren’t covering floating-point as well? I imagine this might get hits in ImageSharp or PDN.

Potentially in other apps, games, or graphics processing, etc

tannergooding · 2023-05-01T21:10:59Z

src/coreclr/jit/valuenum.cpp

+            assert((vn1Type == vn2Type) && varTypeIsSIMD(vn1Type));
+            assert(!varTypeIsFloating(baseType));
+
+            ValueNum packed     = EvaluateBinarySimd(vns, GT_EQ, scalar, vn1Type, baseType, arg0VN, arg1VN);


shouldn't this be EvaluateBinarySimd(vns, oper, ...)?

I wonder if it would be simpler to implement this as EvaluateVector and then simplify check result IsAllBitsSet or !IsZero

Which would also simplify the other relational comparisons

shouldn't this be EvaluateBinarySimd(vns, oper, ...)?

It makes it harder to check output, doesn't it?
Currently I just pass EQ and then return AllBitsSet or !AllBitsSet depending on source oper

Does it? We have a simple property for any simd_T and it makes it easier to cover the other cases.

That is, if you just evaluate per element you get a result simd_T result and can then just check IsAllBitsSet or IsZero

tannergooding · 2023-05-01T21:12:38Z

src/coreclr/jit/valuenum.cpp

+                    return EvaluateBinarySimd(this, GT_NE, /* scalar */ false, type, baseType, arg0VN, arg1VN);
+                }
+                break;
+            }


Do we want to handle GT, LT, GE, and LE simultaneously?

Likewise given you've added the support for computing the vector version in order to make bool work, should we just handle the intrinsics that produce a vector as well?

Those had no hits so I just didn't want to add more code and tests 🙂 Although, even EQ/NE don't have hits, I just found a use case outside.

I think this is the type of scenario where it’s a core comparison on SIMD and so we want to generally handle it even if our own code doesn’t have any hits today.

it’s odd to cover some of the relational comparisons and not the others

…old-eq-ne

EgorBo · 2023-05-18T12:47:01Z

No diffs and I won't have spare time to cover all cases with tests so closing for now

Constant folding for SIMD comparisons

d1f693d

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 1, 2023

ghost assigned EgorBo May 1, 2023

EgorBo added 5 commits May 1, 2023 14:25

Fix in GetStaticFieldSeqAndAddress

00fd532

fix NE case

999ea40

clean up

83aaf74

Add tests

ac2a191

Merge branch 'main' of github.com:dotnet/runtime into simd-constant-f…

3c3f791

…old-eq-ne

tannergooding reviewed May 1, 2023

View reviewed changes

build-analysis bot mentioned this pull request May 2, 2023

Tracking issue for CI build timeouts #76454

Closed

EgorBo added 3 commits May 6, 2023 22:13

Address feedback

28b759f

Merge branch 'main' of github.com:dotnet/runtime into simd-constant-f…

6c87055

…old-eq-ne

add asserts

18091a3

EgorBo closed this May 18, 2023

ghost locked as resolved and limited conversation to collaborators Jun 17, 2023

Constant folding for SIMD comparisons #85584

Constant folding for SIMD comparisons #85584

Uh oh!

Conversation

EgorBo commented May 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented May 1, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EgorBo commented May 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

EgorBo commented May 1, 2023 •

edited

Loading

EgorBo commented May 18, 2023 •

edited

Loading