-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Constant folding for SIMD comparisons #85584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsCan be easily extended for GT_GT/GE/LT/LE and, perhaps, floating point, but I was mainly interested in EQ/NE for integeres as I found a few real uses cases. E.g. bool Test()
{
var data = "hello world simd folding demo :)"u8;
return IsHelloWorldString(ref MemoryMarshal.GetReference(data));
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
bool IsHelloWorldString(ref byte data)
{
var v1 = Vector128.LoadUnsafe(ref data);
var v2 = Vector128.LoadUnsafe(ref data, 16);
var v1Cns = Vector128.Create("hello world simd"u8);
var v2Cns = Vector128.Create(" folding demo :)"u8);
return ((v1 ^ v1Cns) | (v2 ^ v2Cns)) == Vector128<byte>.Zero;
} Codegen for ; Method MyClass:Foo():bool:this
mov eax, 1
ret
; Total bytes of code: 6 the whole thing is collapsed into just
|
src/coreclr/jit/simd.h
Outdated
template <typename TBase> | ||
TBase GetAllBitsSetScalar() | ||
{ | ||
uint8_t bitWidth = (sizeof(TBase) * 8); | ||
return static_cast<TBase>((1ULL << bitWidth) - 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this fail for int64_t
and uint64_t
?
Why not simply return ~static_cast<TBase>(0);
? Then for float
/double
you could specialize if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, addressed
src/coreclr/jit/simd.h
Outdated
case GT_EQ: | ||
{ | ||
return arg0 == arg1 ? GetAllBitsSetScalar<TBase>() : 0; | ||
} | ||
|
||
case GT_NE: | ||
{ | ||
return arg0 != arg1 ? GetAllBitsSetScalar<TBase>() : 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be incorrect for float
/double
since it will do a bitwise comparison.
You may need to special-case in EvaluateBinaryScalarSpecialized<float>
before it calls this main method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also be able to do it in EvaluateBinaryScalar
instead but special-case GetAllBitsSetScalar<float>()
, which is probably simpler overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path is never taken for floating point, I added asserts just in case. I was mostly interested in non-fp cases and these can be added if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we aren’t covering floating-point as well? I imagine this might get hits in ImageSharp or PDN.
Potentially in other apps, games, or graphics processing, etc
assert((vn1Type == vn2Type) && varTypeIsSIMD(vn1Type)); | ||
assert(!varTypeIsFloating(baseType)); | ||
|
||
ValueNum packed = EvaluateBinarySimd(vns, GT_EQ, scalar, vn1Type, baseType, arg0VN, arg1VN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be EvaluateBinarySimd(vns, oper, ...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be simpler to implement this as EvaluateVector
and then simplify check result IsAllBitsSet
or !IsZero
Which would also simplify the other relational comparisons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be EvaluateBinarySimd(vns, oper, ...)?
It makes it harder to check output, doesn't it?
Currently I just pass EQ and then return AllBitsSet or !AllBitsSet depending on source oper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it? We have a simple property for any simd_T and it makes it easier to cover the other cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, if you just evaluate per element you get a result simd_T result and can then just check IsAllBitsSet or IsZero
return EvaluateBinarySimd(this, GT_NE, /* scalar */ false, type, baseType, arg0VN, arg1VN); | ||
} | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to handle GT
, LT
, GE
, and LE
simultaneously?
Likewise given you've added the support for computing the vector version in order to make bool
work, should we just handle the intrinsics that produce a vector as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those had no hits so I just didn't want to add more code and tests 🙂 Although, even EQ/NE don't have hits, I just found a use case outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the type of scenario where it’s a core comparison on SIMD and so we want to generally handle it even if our own code doesn’t have any hits today.
it’s odd to cover some of the relational comparisons and not the others
No diffs and I won't have spare time to cover all cases with tests so closing for now |
This PR extends constant folding for SIMD to support EQ/NE. Can also be extended for GT_GT/GE/LT/LE and, perhaps, floating point, but I was mainly interested in EQ/NE for integral types as I found a few real uses cases. Also, improved constant folding for RVA slightly.
Codegen for
Test()
:the whole thing is collapsed into just
return true
after inlining and constant folding.