-
Notifications
You must be signed in to change notification settings - Fork 5k
Arm64/Sve: Some optimizations around loop scenario #101885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…, MinNumberAcross
@dotnet/arm64-contrib |
case NI_Sve_ConvertMaskToVector: | ||
{ | ||
GenTree* op1 = node->Op(1); | ||
|
||
if (!op1->OperIsHWIntrinsic(NI_Sve_ConvertVectorToMask)) | ||
{ | ||
break; | ||
} | ||
|
||
unsigned simdBaseTypeSize = genTypeSize(node->GetSimdBaseType()); | ||
GenTreeHWIntrinsic* cvtOp1 = op1->AsHWIntrinsic(); | ||
|
||
if ((genTypeSize(cvtOp1->GetSimdBaseType()) != simdBaseTypeSize)) | ||
{ | ||
// We need the operand to be the same kind of mask; otherwise | ||
// the bitwise operation can differ in how it performs | ||
break; | ||
} | ||
|
||
GenTree* vectorNode = op1->AsHWIntrinsic()->Op(1); | ||
|
||
DEBUG_DESTROY_NODE(op1, node); | ||
INDEBUG(vectorNode->gtDebugFlags |= GTF_DEBUG_NODE_MORPHED); | ||
|
||
return vectorNode; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general logic here, except for the intrinsicId, is identical to the xarch path (and likely always will be since its purely an internal helper).
Is it worth sharing most of the code between them?
GenTree* op1 = node->Op(1); | ||
GenTree* op2 = node->Op(2); | ||
|
||
if (!op1->OperIsHWIntrinsic(NI_Sve_CreateTrueMaskAll) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand why this one needs to be handled?
CreateTrueMaskAll
produces a TYP_MASK
and ConvertVectorToMask
requires a TYP_SIMD
input, so encountering it as the op1 of ConvertVectorToMask
would be representative of malformed IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me get rid of CreateTrueMaskAll
in follow-up PR and then I can combine the logic along with x64.
} | ||
|
||
unsigned simdBaseTypeSize = genTypeSize(node->GetSimdBaseType()); | ||
GenTreeHWIntrinsic* cvtOp2 = op2->AsHWIntrinsic(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this op2
?
I would have expected that ConvertVectorToMask
only needs a single input, much as it does on xarch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently we create ConvertVectorToMask
using ConvertVectorToMask (AllTrue, op)
and I want to do the work in a separate PR of moving it to lowering. Hence, we are getting op2
instead of op1
.
runtime/src/coreclr/jit/hwintrinsicarm64.cpp
Lines 2221 to 2231 in a530a1c
GenTree* Compiler::gtNewSimdConvertVectorToMaskNode(var_types type, | |
GenTree* node, | |
CorInfoType simdBaseJitType, | |
unsigned simdSize) | |
{ | |
assert(varTypeIsSIMD(node)); | |
// ConvertVectorToMask uses cmpne which requires an embedded mask. | |
GenTree* trueMask = gtNewSimdAllTrueMaskNode(simdBaseJitType, simdSize); | |
return gtNewSimdHWIntrinsicNode(TYP_MASK, trueMask, node, NI_Sve_ConvertVectorToMask, simdBaseJitType, simdSize); | |
} |
@@ -10678,6 +10678,63 @@ GenTree* Compiler::fgOptimizeHWIntrinsic(GenTreeHWIntrinsic* node) | |||
INDEBUG(node->gtDebugFlags |= GTF_DEBUG_NODE_MORPHED); | |||
return node; | |||
} | |||
#if defined(TARGET_ARM64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue tracking mirroring of the other mask related handling that morph handles for xarch?
In particular, the recognition of And(MaskToVector(x), MaskToVector(y))
and converting it to AndMask(x, y)
(as well as similar transforms for other trivial operations where the operation is identical).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ba-g the failure seems to be timeout in osx/x64 |
* Add the missing else * Max, MaxAcross, MaxNumber, MaxNumberAcross, Min, MinAcross, MinNumber, MinNumberAcross * Map APIs to instruction * Add test cases * Remove the space * fix the test case * Add handling of delay free * fix some errors * wip: morph optimization * Track TYP_MASK for arm64 * Enable mov predicate registers * jit format
* Add the missing else * Max, MaxAcross, MaxNumber, MaxNumberAcross, Min, MinAcross, MinNumber, MinNumberAcross * Map APIs to instruction * Add test cases * Remove the space * fix the test case * Add handling of delay free * fix some errors * wip: morph optimization * Track TYP_MASK for arm64 * Enable mov predicate registers * jit format
TYP_MASK
for Arm64ConvertVectorToMask(ConvertMaskToVector(...))
orConvertMaskToVector(ConvertVectorToMask(...))
TYP_MASK
INS_sve_mov
for predicate registersWith these optimizations, the code generated matches the one generated by clang .