-
Notifications
You must be signed in to change notification settings - Fork 5.1k
ARM64 - Optimizing a % b operations part 2 #66407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsAddressing part of this issue: #34937 Description There are various ways to optimize
negs x1, x0
and x0, x0, #(n - 1)
and x1, x1, #(n - 1)
csneg x0, x0, x1, mi Acceptance Criteria
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look generally good. I left a couple comments around the possibility of using existing helpers rather than directly querying gtFlags
@kunalspathak @EgorBo this is ready. CI is failing due to unrelated reasons it looks like. I think @AndyAyersMS and I reviewed this together a few weeks back as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes looks good, just a comment.
Also, can you check why there are regressions in crossgen2 and other collections?
Top method regressions (bytes):
24 (1.69 % of base) : 85784.dasm - System.Collections.Generic.HashSet`1:SymmetricExceptWithEnumerable(System.Collections.Generic.IEnumerable`1[System.__Canon]):this
16 (0.60 % of base) : 107306.dasm - System.Text.GB18030Encoding:GetChars(long,int,long,int,System.Text.DecoderNLS):int:this
12 (0.94 % of base) : 85783.dasm - System.Collections.Generic.HashSet`1:CheckUniqueAndUnfoundElements(System.Collections.Generic.IEnumerable`1[System.__Canon],bool):System.ValueTuple`2[System.Int32, System.Int32]:this
12 (1.94 % of base) : 146511.dasm - System.Formats.Asn1.AsnWriter:WriteNamedBitList(System.Nullable`1[System.Formats.Asn1.Asn1Tag],long):this
8 (0.98 % of base) : 5636.dasm - Microsoft.CodeAnalysis.PEModule:IsNoPiaLocalType(System.Reflection.Metadata.TypeDefinitionHandle,byref):bool:this
8 (7.69 % of base) : 4080.dasm - Microsoft.CodeAnalysis.XmlCharType:SplitSurrogateChar(int,byref,byref)
8 (0.75 % of base) : 85786.dasm - System.Collections.Generic.HashSet`1:IntersectWithEnumerable(System.Collections.Generic.IEnumerable`1[System.__Canon]):this
8 (2.47 % of base) : 59774.dasm - System.Data.RBTree`1:FreeNode(int):this
8 (2.53 % of base) : 62804.dasm - System.Data.RBTree`1:FreeNode(int):this
8 (0.99 % of base) : 97269.dasm - System.Globalization.HebrewNumber:Append(System.Text.StringBuilder,int)
8 (0.73 % of base) : 138410.dasm - System.Net.NetworkInformation.PhysicalAddress:TryParse(System.ReadOnlySpan`1[System.Char],byref):bool
8 (3.92 % of base) : 98078.dasm - System.Net.WebUtility:ConvertSmpToUtf16(int,byref,byref)
8 (7.69 % of base) : 120074.dasm - System.Xml.XmlCharType:SplitSurrogateChar(int,byref,byref)
4 (1.59 % of base) : 5634.dasm - Microsoft.CodeAnalysis.PEModule:RecordNoPiaLocalTypeCheck(System.Reflection.Metadata.TypeDefinitionHandle):this
4 (3.12 % of base) : 85613.dasm - System.Collections.Generic.BitHelper:IsMarked(int):bool:this
4 (3.23 % of base) : 85614.dasm - System.Collections.Generic.BitHelper:MarkBit(int):this
4 (1.43 % of base) : 96963.dasm - System.Globalization.OrdinalCasing:InitCasingTable():System.UInt16[][]
4 (0.83 % of base) : 132782.dasm - System.Net.FtpControlStream:FormatAddress(System.Net.IPAddress,int):System.String
4 (0.83 % of base) : 138407.dasm - System.Net.NetworkInformation.UnicastIPAddressInformation:PrefixLengthToSubnetMask(ubyte,int):System.Net.IPAddress
@@ -1707,7 +1796,7 @@ void Lowering::ContainCheckMul(GenTreeOp* node) | |||
// | |||
void Lowering::ContainCheckDivOrMod(GenTreeOp* node) | |||
{ | |||
assert(node->OperIs(GT_DIV, GT_UDIV)); | |||
assert(node->OperIs(GT_DIV, GT_UDIV, GT_MOD)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not also GT_UMOD
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization doesn't impact GT_UMOD, only GT_MOD so that's why the additional GT_MOD check is there.
I will look into the regressions that are there to see what was impacted. |
@kunalspathak , yes it looks like there is a bit of a regression here: |
Thanks for confirming offline that these are happening for scenarios where there is both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Yea, the regression only occurs in CSE for 'a / b' operations. If we believe the benefits of this optimization outweighs that of CSE for 'a / b', then we should do it. I'm not sure the best way to resolve it either; I thought about matching 'a - (a / cns) * cns' and turn it into this specific optimization, but it gets more complicated as 'multiply' can be optimized to a LSH operation, 'a - ((a / cns) << shift' if the 'cns' is the power of 2. As a test, I tried turning 'a - ((a / cns) << shift' back into a mod op so it would get picked up by the lowering optimization: if (tree->TypeIs(TYP_INT, TYP_LONG) && !tree->IsUnsigned() && op1->OperIs(GT_LCL_VAR))
{
GenTree* mul = op2;
if (mul->OperIs(GT_LSH) && !gtIsActiveCSE_Candidate(mul))
{
GenTree* div = mul->gtGetOp1();
GenTree* cns1 = mul->gtGetOp2();
if (div->OperIs(GT_DIV) && !gtIsActiveCSE_Candidate(div) && cns1->IsIntegralConst())
{
GenTree* a = div->gtGetOp1();
GenTree* cns2 = div->gtGetOp2();
if (a->OperIs(GT_LCL_VAR) && cns2->IsIntegralConstPow2() &&
op1->AsLclVar()->GetLclNum() == a->AsLclVar()->GetLclNum())
{
size_t cnsValue1 = cns1->AsIntConCommon()->IntegralValue();
size_t cnsValue2 = cns2->AsIntConCommon()->IntegralValue();
if ((cnsValue2 >> cnsValue1) == 1)
{
tree->ChangeOper(GT_MOD);
tree->AsOp()->gtOp2 = cns1;
op2 = tree->gtGetOp2();
}
}
}
}
} But, I would like to do this after CSE has occurred; that way if there were CSEs that happened in the expression, it wouldn't match this logic. The next logically thing to look at is lowering if we could do this, but I think we would have to mark multiple nodes as contained - so it still gets more complicated. |
I made an issue regarding this minor regression: #67983 |
Possible regression: #68624 |
Addressing part of this issue: #34937
Description
There are various ways to optimize
%
for integers on ARM64.a % b
can be transformed into a specific sequence of instructions for ARM64 if the operation is signed andb
is a constant with the power of 2.Acceptance Criteria
ARM64 diffs based on tests