Skip to content

Constant folding for SIMD comparisons #85584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented May 1, 2023

This PR extends constant folding for SIMD to support EQ/NE. Can also be extended for GT_GT/GE/LT/LE and, perhaps, floating point, but I was mainly interested in EQ/NE for integral types as I found a few real uses cases. Also, improved constant folding for RVA slightly.

bool Test()
{
    return IsHelloWorldString("Hello World SIMD constant folding!"u8);
}

bool IsHelloWorldString(ReadOnlySpan<byte> data)
{
    Debug.Assert(data.Length is >= 16 and <= 32);

    // Load data into two Vector128, the 2nd vector may overlap if needed
    ref byte ptr = ref MemoryMarshal.GetReference(data);
    Vector128<byte> v1 = Vector128.LoadUnsafe(ref ptr);
    Vector128<byte> v2 = Vector128.LoadUnsafe(ref ptr, (nuint)(data.Length - Vector128<byte>.Count));

    // The constant utf8 string we compare against:
    var cns = "Hello World SIMD constant folding!"u8;
    Vector128<byte> v1Cns = Vector128.Create(cns.Slice(0, Vector128<byte>.Count));
    Vector128<byte> v2Cns = Vector128.Create(cns.Slice(cns.Length - Vector128<byte>.Count));

    // Compare pairwise
    return ((v1 ^ v1Cns) | (v2 ^ v2Cns)) == Vector128<byte>.Zero;
}

Codegen for Test():

; Method Test():bool:this
       mov      eax, 1
       ret      
; Total bytes of code: 6

the whole thing is collapsed into just return true after inlining and constant folding.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 1, 2023
@ghost ghost assigned EgorBo May 1, 2023
@ghost
Copy link

ghost commented May 1, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Can be easily extended for GT_GT/GE/LT/LE and, perhaps, floating point, but I was mainly interested in EQ/NE for integeres as I found a few real uses cases. E.g.

bool Test()
{
    var data = "hello world simd folding demo :)"u8;
    return IsHelloWorldString(ref MemoryMarshal.GetReference(data));
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
bool IsHelloWorldString(ref byte data)
{
    var v1 = Vector128.LoadUnsafe(ref data);
    var v2 = Vector128.LoadUnsafe(ref data, 16);

    var v1Cns = Vector128.Create("hello world simd"u8);
    var v2Cns = Vector128.Create(" folding demo :)"u8);

    return ((v1 ^ v1Cns) | (v2 ^ v2Cns)) == Vector128<byte>.Zero;
}

Codegen for Test():

; Method MyClass:Foo():bool:this
       mov      eax, 1
       ret      
; Total bytes of code: 6

the whole thing is collapsed into just return true after inlining and constant folding.

Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

Comment on lines 443 to 448
template <typename TBase>
TBase GetAllBitsSetScalar()
{
uint8_t bitWidth = (sizeof(TBase) * 8);
return static_cast<TBase>((1ULL << bitWidth) - 1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this fail for int64_t and uint64_t?

Why not simply return ~static_cast<TBase>(0);? Then for float/double you could specialize if needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed

Comment on lines 530 to 538
case GT_EQ:
{
return arg0 == arg1 ? GetAllBitsSetScalar<TBase>() : 0;
}

case GT_NE:
{
return arg0 != arg1 ? GetAllBitsSetScalar<TBase>() : 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be incorrect for float/double since it will do a bitwise comparison.

You may need to special-case in EvaluateBinaryScalarSpecialized<float> before it calls this main method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also be able to do it in EvaluateBinaryScalar instead but special-case GetAllBitsSetScalar<float>(), which is probably simpler overall.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path is never taken for floating point, I added asserts just in case. I was mostly interested in non-fp cases and these can be added if needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we aren’t covering floating-point as well? I imagine this might get hits in ImageSharp or PDN.

Potentially in other apps, games, or graphics processing, etc

assert((vn1Type == vn2Type) && varTypeIsSIMD(vn1Type));
assert(!varTypeIsFloating(baseType));

ValueNum packed = EvaluateBinarySimd(vns, GT_EQ, scalar, vn1Type, baseType, arg0VN, arg1VN);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be EvaluateBinarySimd(vns, oper, ...)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be simpler to implement this as EvaluateVector and then simplify check result IsAllBitsSet or !IsZero

Which would also simplify the other relational comparisons

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be EvaluateBinarySimd(vns, oper, ...)?

It makes it harder to check output, doesn't it?
Currently I just pass EQ and then return AllBitsSet or !AllBitsSet depending on source oper

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it? We have a simple property for any simd_T and it makes it easier to cover the other cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, if you just evaluate per element you get a result simd_T result and can then just check IsAllBitsSet or IsZero

return EvaluateBinarySimd(this, GT_NE, /* scalar */ false, type, baseType, arg0VN, arg1VN);
}
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to handle GT, LT, GE, and LE simultaneously?

Likewise given you've added the support for computing the vector version in order to make bool work, should we just handle the intrinsics that produce a vector as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those had no hits so I just didn't want to add more code and tests 🙂 Although, even EQ/NE don't have hits, I just found a use case outside.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the type of scenario where it’s a core comparison on SIMD and so we want to generally handle it even if our own code doesn’t have any hits today.

it’s odd to cover some of the relational comparisons and not the others

@EgorBo
Copy link
Member Author

EgorBo commented May 18, 2023

No diffs and I won't have spare time to cover all cases with tests so closing for now

@EgorBo EgorBo closed this May 18, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants