Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

[WIP] Fix CSE side effect and definition extraction #19125

Merged
merged 1 commit into from
Sep 14, 2018

Conversation

mikedn
Copy link

@mikedn mikedn commented Jul 25, 2018

When performing CSE on an use both side effects and CSE defs need to be preserved. Their ordering must be preserved as well because side effect trees (as extracted by gtExtractSideEffList) may contain CSE defs (potential assignments) and uses.

However, the current implementation extracts side effects and CSE defs during 2 separate tree traversals and this makes preserving the original order difficult or downright impossible. The implementation contains a workaround for a specific case - a CSE def having a child side effect. Since the side effect tree was already extracted, attempting to also extract the CSE def would result in the side effect tree having multiple uses. The workaround simply rejects CSE in such cases.

Incorrect reordering can still occur as the following example shows:

     ( 47, 69)              [000017] ------------              *  STMT      void  (IL 0x000...0x030)
N026 ( 47, 69)              [000016] ---XG-------              \--*  JTRUE     void  
N024 (  1,  1)              [000014] ------------                 |  /--*  CNS_INT   int    0 $82
N025 ( 45, 67)              [000015] J--XG--N----                 \--*  EQ        int    <l:$347, c:$346>
N021 (  8, 15) CSE #02 (use)[000009] ---XG-------                    |     /--*  IND       int    <l:$343, c:$381>
N019 (  1,  1)              [000037] ------------                    |     |  |  /--*  CNS_INT   long   8 field offset Fseq[F8] $1c2
N020 (  6, 13)              [000038] ----G--N----                    |     |  \--*  ADD       byref  <l:$205, c:$206>
N018 (  5, 12) CSE #01 (use)[000008] x---G-------                    |     |     \--*  IND       ref    <l:$140, c:$183>
N017 (  3, 10)              [000039] ------------                    |     |        \--*  CNS_INT(h) long   0x1b210002970 static Fseq[s_1] $100
N022 ( 20, 31)              [000010] ---XG-------                    |  /--*  NE        int    <l:$82, c:$344>
N016 (  8, 15) CSE #02 (def)[000007] ---XG-------                    |  |  \--*  IND       int    <l:$343, c:$380>
N014 (  1,  1)              [000034] ------------                    |  |     |  /--*  CNS_INT   long   8 field offset Fseq[F8] $1c2
N015 (  6, 13)              [000035] ----G--N----                    |  |     \--*  ADD       byref  <l:$205, c:$204>
N013 (  5, 12) CSE #01 (use)[000006] x---G-------                    |  |        \--*  IND       ref    <l:$140, c:$182>
N012 (  3, 10)              [000036] ------------                    |  |           \--*  CNS_INT(h) long   0x1b210002970 static Fseq[s_1] $100
N023 ( 43, 65) CSE #03 (use)[000013] ---XG-------                    \--*  XOR       int    <l:$341, c:$345>
N010 (  9, 16)              [000004] ---XG-------                       |  /--*  IND       byte   <l:$2c1, c:$300>
N008 (  1,  1)              [000031] ------------                       |  |  |  /--*  CNS_INT   long   14 field offset Fseq[F0] $1c1
N009 (  6, 13)              [000032] ----G--N----                       |  |  \--*  ADD       byref  <l:$203, c:$202>
N007 (  5, 12) CSE #01 (use)[000003] x---G-------                       |  |     \--*  IND       ref    <l:$140, c:$181>
N006 (  3, 10)              [000033] ------------                       |  |        \--*  CNS_INT(h) long   0x1b210002970 static Fseq[s_1] $100
N011 ( 22, 33) CSE #03 (def)[000005] ---XG-------                       \--*  LT        int    <l:$341, c:$340>
N005 (  9, 16) CSE #04 (def)[000002] ---XG-------                          \--*  IND       ushort <l:$241, c:$280>
N003 (  1,  1)              [000028] ------------                             |  /--*  CNS_INT   long   12 field offset Fseq[F7] $1c0
N004 (  6, 13)              [000029] ----G--N----                             \--*  ADD       byref  <l:$201, c:$200>
N002 (  5, 12) CSE #01 (def)[000001] x---G-------                                \--*  IND       ref    <l:$140, c:$180>
N001 (  3, 10)              [000030] ------------                                   \--*  CNS_INT(h) long   0x1b210002970 static Fseq[s_1] $100

Performing CSE on use #3 first extracts the LT subtree due to it having an assignment side effect (the assignment just introduced for CSE #3 def). The remaining NE subtree has a CSE def (CSE #2) that will be extracted separately and appended to the side effect list. This yields:

This CSE use has side effects and/or nested CSE defs. The sideEffectList:
N010 (  9, 16)              [000004] ---XG-------                    /--*  IND       byte   <l:$2c1, c:$300>
N008 (  1,  1)              [000031] ------------                    |  |  /--*  CNS_INT   long   14 field offset Fseq[F0] $1c1
N009 (  6, 13)              [000032] ----G--N----                    |  \--*  ADD       byref  <l:$203, c:$202>
N007 (  5, 12) CSE #01 (use)[000003] x---G-------                    |     \--*  IND       ref    <l:$140, c:$181>
N006 (  3, 10)              [000033] ------------                    |        \--*  CNS_INT(h) long   0x1b210002970 static Fseq[s_1] $100
N011 ( 22, 33)              [000005] ---XG-------                 /--*  LT        int    <l:$341, c:$340>
N005 (  9, 16) CSE #04 (def)[000002] ---XG-------                 |  \--*  IND       ushort <l:$241, c:$280>
N003 (  1,  1)              [000028] ------------                 |     |  /--*  CNS_INT   long   12 field offset Fseq[F7] $1c0
N004 (  6, 13)              [000029] ----G--N----                 |     \--*  ADD       byref  <l:$201, c:$200>
N002 (  5, 12) CSE #01 (def)[000001] x---G-------                 |        \--*  IND       ref    <l:$140, c:$180>
N001 (  3, 10)              [000030] ------------                 |           \--*  CNS_INT(h) long   0x1b210002970 static Fseq[s_1] $100
N013 ( 22, 33)              [000047] -A-XG---R---              /--*  ASG       int    $VN.Void
N012 (  1,  1)              [000046] D------N----              |  \--*  LCL_VAR   int    V01 cse0          <l:$341, c:$340>
                            [000051] -A-XG-------              *  COMMA     void   $VN.Void
N020 (  8, 15) CSE #02 (def)[000007] ---XG-------              \--*  IND       int    <l:$343, c:$380>
N018 (  1,  1)              [000034] ------------                 |  /--*  CNS_INT   long   8 field offset Fseq[F8] $1c2
N019 (  6, 13)              [000035] ----G--N----                 \--*  ADD       byref  <l:$205, c:$204>
N017 (  5, 12) CSE #01 (use)[000006] x---G-------                    \--*  IND       ref    <l:$140, c:$182>
N016 (  3, 10)              [000036] ------------                       \--*  CNS_INT(h) long   0x1b210002970 static Fseq[s_1] $100

But these 2 trees are now in reverse order so CSE use #1 occurs before CSE def #1.

The only reasonable way to preserve the correct order seems to be extracting side effects and CSE defs at the same time, during a single tree traversal. gtExtractSideEffList can be extended rather easily to also look at CSE defs when the already existing GTF_IS_IN_CSE is specified. Additionally, CSE uses can be unmarked during the same traversal, avoiding the need for another traversal that needs to look at removed trees.

While working on this I noticed that optPropagateNonCSE and optHasNonCSEChild were not used so I deleted them. I also deleted GTF_PERSISTENT_SIDE_EFFECTS_IN_CSE, it's used only in 2 places and it's IMO more confusing than ORing GTF_IS_IN_CSE explicitly.

Fixes #18770


return Compiler::WALK_CONTINUE;
if (node->OperIs(GT_LCL_VAR))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndyAyersMS Broken ref count update in existing CSE code? I see no reason to decrement the ref count for GT_LCL_VAR and ignore GT_LCL_FLD.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, the existing CSE code is actually here:

coreclr/src/jit/optcse.cpp

Lines 281 to 307 in d38774d

if (comp->optUnmarkCSE(tree))
{
// The call to optUnmarkCSE(tree) should have cleared any CSE info
//
assert(!IS_CSE_INDEX(tree->gtCSEnum));
// This node is to be removed from the graph of GenTree*
// next decrement any LclVar references.
//
if (tree->gtOper == GT_LCL_VAR)
{
unsigned lclNum;
LclVarDsc* varDsc;
// This variable ref is going away, decrease its ref counts
lclNum = tree->gtLclVarCommon.gtLclNum;
assert(lclNum < comp->lvaCount);
varDsc = comp->lvaTable + lclNum;
// make sure it's been initialized
assert(comp->optCSEweight <= BB_MAX_WEIGHT);
// Decrement its lvRefCnt and lvRefCntWtd
varDsc->decRefCnts(comp->optCSEweight, comp);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My ambition is to remove incremental ref counting entirely.

@mikedn
Copy link
Author

mikedn commented Jul 29, 2018

No framework JIT diffs. Tests (appart from the newly added test) contain a couple of diffs that show similar CSE ordering issues but that have no ill/visible effects. For example https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/Regression/JitBlue/DevDiv_605447/DevDiv_605447.il generates diff https://gist.github.com/mikedn/1f0551dd02c677ac4453aac4a62ef3a7

The newly added test generates diff https://gist.github.com/mikedn/21d772b9cf2fa614276213233cd2c7b9

Though the diffs (with or without PMI) for Test2.M1 do not show the problem. There's a difference in the way statics are accessed when the code is actually run and in that case the diff is:

-mov      rax, 0x218AB502980
-mov      rsi, gword ptr [rax]
 call     Test2:M2():int
-movzx    rax, word  ptr [reloc classVar[0xe2334af8]]
+mov      rax, 0x19713FA2980
+mov      r8, gword ptr [rax]
+movzx    rax, word  ptr [reloc classVar[0xe2694af8]]
 mov      eax, eax
 cdq
-idiv     rdx:rax, qword ptr [reloc classVar[0xe2334b10]]
-mov      rcx, 0x218AB502978
-mov      rdx, rsi
+idiv     rdx:rax, qword ptr [reloc classVar[0xe2694b10]]
+mov      rcx, 0x19713FA2978
+mov      rdx, r8
 call     CORINFO_HELP_CHECKED_ASSIGN_REF

Basically the call to M2 and the static field access are out of order in the original code.

@mikedn
Copy link
Author

mikedn commented Jul 30, 2018

@briansull This is the fix for #18770. I don't know if it will conflict with your work on #8648, it can stay as WIP until you sort that one out.

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

@briansull briansull merged commit 31c7190 into dotnet:master Sep 14, 2018
@mikedn mikedn deleted the cse-extract branch September 28, 2019 19:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants