Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Improve DCE in LIR liveness. #12892

Merged
merged 2 commits into from
Aug 19, 2017
Merged

Improve DCE in LIR liveness. #12892

merged 2 commits into from
Aug 19, 2017

Conversation

pgavlin
Copy link

@pgavlin pgavlin commented Jul 19, 2017

In particular:

  • Rather than only removing dead code as part of dead store removal,
    remove all side-effect-free nodes that either do not produce a value
    or produce a value that is unused.
  • When optimizing, set fgStmtRemoved in order to indicate that tracked
    lclVar uses or defs have been removed and liveness may need to be
    recalculated. Previously this flag was only set upon eliminating a
    dead store.

@pgavlin pgavlin force-pushed the LivenessDCE branch 4 times, most recently from 9ec809b to 7864a81 Compare August 12, 2017 00:58
@pgavlin pgavlin changed the title WIP: Improve DCE in LIR liveness. Improve DCE in LIR liveness. Aug 12, 2017
@pgavlin
Copy link
Author

pgavlin commented Aug 12, 2017

@dotnet-bot
test Windows_NT x64 Debug Build and Test
test Windows_NT x64 Throughput

@pgavlin
Copy link
Author

pgavlin commented Aug 12, 2017

@BruceForstall @CarolEidt @dotnet/jit-contrib PTAL

Essentially, this performs dead-store removal on SDSU temps as well as LclVars. Throughput results are good: when compared against the baseline, these changes improve throughput by .1% when using mean times for comparison and regress throughput by the same when using median times for comparison. This lines up pretty well with a reported 0.1% throughput improvement reported by callgrind on crossgen S.P.C.dll.

There are a couple small changes to the IR that are necessary in order to enable this:

  • Nodes that read the flags register must be marked with GTF_USE_FLAGS
  • GTF_IND_REFARR_LAYOUT has been removed and GTF_IND_NONFAULTING updated to use its bit. The former was never used in any case and the latter was overlapping with GTF_SET_FLAGS
    These changes allow GTF_SET_FLAGS to universally indicate that a node writes the flags register. The DCE implementation in this change assumes that any write to the flags register may be observed and will not remove nodes marked with GTF_SET_FLAGS (particularly necessary for x86).

I haven't been able to get jit-diff running well on my MacBook, so full diffs will have to wait until Monday.

@pgavlin
Copy link
Author

pgavlin commented Aug 12, 2017

@dotnet-bot test
OSX10.12 x64 Checked Build and Test
Ubuntu x64 Checked Build and Test
Windows_NT x64 Debug Build and Test

@pgavlin
Copy link
Author

pgavlin commented Aug 14, 2017

@dotnet-bot test Windows_NT x64 Debug Build and Test

@pgavlin
Copy link
Author

pgavlin commented Aug 14, 2017

Internal diffs are good. The most common snippets we end up removing are nops and unused compares. The latter is particularly impactful, generally deleting ~19 bytes or so of code:

       mov      rcx, rsi
       lea      rdx, [(reloc)]
       cmp      qword ptr [rcx], rdx
       sete     cl
       movzx    rcx, cl

Rarer are instances of dead helper calls, e.g. type tests:

       xor      rdx, rdx
       mov      rcx, 0xD1FFAB1E
       call     CORINFO_HELP_ISINSTANCEOFCLASS

and allocations:

       mov      rcx, 0xD1FFAB1E
       call     CORINFO_HELP_NEWSFAST

Overall change as reported by SPMI:

Base: 8909259
Diff: 8846092
Delta: 63167
Pct. Delta: 0.71

@AndyAyersMS
Copy link
Member

Can you note where you saw the dead isinst/newobj calls? These may be cases we could be cleaning up earlier.

@mikedn
Copy link

mikedn commented Aug 14, 2017

The latter is particularly impactful, generally deleting ~19 bytes or so of code:

Presumably in the case of the unused compare we keep the indir and remove everything else?

@pgavlin
Copy link
Author

pgavlin commented Aug 14, 2017

Presumably in the case of the unused compare we keep the indir and remove everything else?

We only keep the indir if it might fault. In a number of cases we are able to remove the entire tree, including the indir.

Can you note where you saw the dead isinst/newobj calls? These may be cases we could be cleaning up earlier.

@russellhadley thought the same. I will take a look.

@pgavlin
Copy link
Author

pgavlin commented Aug 14, 2017

jit-diff results are more modest:

Total bytes of diff: -25 (0.00 % of base)
    diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file regressions by size (bytes):
          23 : System.Private.CoreLib.dasm (0.00 % of base)
           8 : Microsoft.CSharp.dasm (0.00 % of base)
Top file improvements by size (bytes):
         -33 : Microsoft.CodeAnalysis.CSharp.dasm (0.00 % of base)
         -10 : System.Linq.Expressions.dasm (0.00 % of base)
          -5 : Microsoft.CodeAnalysis.dasm (0.00 % of base)
          -4 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.00 % of base)
          -2 : System.Threading.Tasks.Parallel.dasm (-0.01 % of base)
9 total files with size differences (7 improved, 2 regressed).
Top method regessions by size (bytes):
          82 : System.Private.CoreLib.dasm - StackTrace:ToString(int):ref:this
          72 : System.Private.CoreLib.dasm - EventProvider:WriteEvent(byref,long,long,long,ref):bool:this
          14 : Microsoft.CSharp.dasm - ExpressionBinder:BindUDBinop(int,ref,ref,bool,byref):ref:this
Top method improvements by size (bytes):
         -39 : System.Private.CoreLib.dasm - TimeSpanFormat:FormatStandard(struct,bool,ref,int):ref
         -29 : Microsoft.CodeAnalysis.CSharp.dasm - SourceMemberContainerTypeSymbol:MakeAndCheckTypeModifiers(int,int,ref,ref,byref):int:this
         -14 : System.Private.CoreLib.dasm - CustomAttributeData:get_ConstructorArguments():ref:this
         -10 : System.Private.CoreLib.dasm - Dictionary`2:System.Collections.IDictionary.set_Item(ref,ref):this (29 methods)
         -10 : System.Private.CoreLib.dasm - Dictionary`2:System.Collections.IDictionary.Add(ref,ref):this (29 methods)
63 total methods with size differences (60 improved, 3 regressed).

My guess is that we're better at removing dead code because we're always precompiling during jit-diff, but it could also be something .Net Core-specific.

@pgavlin
Copy link
Author

pgavlin commented Aug 14, 2017

@mikedn here's an example where we remove a compare but keep the load around (from SerializationInfo:set_AssemblyName in S.P.CoreLib.dll):

       cmp      byte  ptr [rsi+78], 0
       sete     cl
       movzx    rcx, cl

becomes

       movzx    rcx, byte  ptr [rsi+78]

@pgavlin
Copy link
Author

pgavlin commented Aug 15, 2017

@dotnet-bot test Windows_NT arm64 Cross Debug Build

@@ -945,9 +946,8 @@ struct GenTree
#define GTF_INX_REFARR_LAYOUT 0x20000000 // GT_INDEX -- same as GTF_IND_REFARR_LAYOUT
#define GTF_INX_STRING_LAYOUT 0x40000000 // GT_INDEX -- this uses the special string array layout

#define GTF_IND_NONFAULTING GTF_SET_FLAGS // GT_IND -- An indir that cannot fault. GTF_SET_FLAGS is not used on indirs.
#define GTF_IND_NONFAULTING 0x20000000 // GT_IND -- An indir that cannot fault.
#define GTF_IND_VOLATILE 0x40000000 // GT_IND -- the load or store must use volatile sematics (this is a nop on X86)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the bits in the per-class sets are generally listed from high value to low

@@ -885,6 +885,42 @@ bool GenTreeCall::IsPure(Compiler* compiler) const
compiler->s_helperCallProperties.IsPure(compiler->eeGetHelperNum(gtCallMethHnd));
}

bool GenTreeCall::IsSideEffectFree(Compiler* compiler, bool ignoreExceptions, bool ignoreCctors) const
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please add normal function header comment
  2. Mention and call out relationship to gtNodeHasSideEffects() -- can that one use this one?
  3. Should this instead be named HasSideEffects() to avoid the pseudo-double-negative?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: #2, never mind: I see you changed that func.


In reply to: 133565093 [](ancestors = 133565093)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this instead be named HasSideEffects() to avoid the pseudo-double-negative?

I would be in favor of that

@@ -945,9 +946,8 @@ struct GenTree
#define GTF_INX_REFARR_LAYOUT 0x20000000 // GT_INDEX -- same as GTF_IND_REFARR_LAYOUT
#define GTF_INX_STRING_LAYOUT 0x40000000 // GT_INDEX -- this uses the special string array layout

#define GTF_IND_NONFAULTING GTF_SET_FLAGS // GT_IND -- An indir that cannot fault. GTF_SET_FLAGS is not used on indirs.
#define GTF_IND_NONFAULTING 0x20000000 // GT_IND -- An indir that cannot fault.
#define GTF_IND_VOLATILE 0x40000000 // GT_IND -- the load or store must use volatile sematics (this is a nop on X86)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's disconcerting that (at least) GTF_IND_VOLATILE/GTF_IND_UNALIGNED also apply to GT_INDEX, but are not commented as such.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that possible? In the case of GTF_IND_VOLATILE, there would be an ambiguity with GTF_INX_STRING_LAYOUT were that the case. As far as I can tell these flags only apply to GT_IND.

// otherwise.
//
bool Compiler::fgComputeLifeLocal(VARSET_TP& life, VARSET_VALARG_TP keepAliveVars, GenTree* lclVarNode, GenTree* node)
void Compiler::fgComputeLifeTrackedLocalUse(VARSET_TP& life, LclVarDsc& varDsc, GenTreeLclVarCommon* node)
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As usual, please add function headers here (and below)

//------------------------------------------------------------------------
// Compiler::fgComputeLifeLocal: compute the changes to local var liveness
// due to a use or a def of a local var and
// indicates wither the use/def is a dead

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wither => whether

//
// Arguments:
// life - The live set that is being computed.
// keepAliveVars - The currents set of variables to keep alive

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currents => current

Copy link

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo nits

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from some suggestions

@@ -885,6 +885,42 @@ bool GenTreeCall::IsPure(Compiler* compiler) const
compiler->s_helperCallProperties.IsPure(compiler->eeGetHelperNum(gtCallMethHnd));
}

bool GenTreeCall::IsSideEffectFree(Compiler* compiler, bool ignoreExceptions, bool ignoreCctors) const
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this instead be named HasSideEffects() to avoid the pseudo-double-negative?

I would be in favor of that

@@ -945,9 +946,8 @@ struct GenTree
#define GTF_INX_REFARR_LAYOUT 0x20000000 // GT_INDEX -- same as GTF_IND_REFARR_LAYOUT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you've eliminated GTF_IND_REFARR_LAYOUT you should change the above comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think that there should be a comment here describing the commonality between the GTF_INX flags and the GTF_IND flags, and explaining why some can overlap.

{
// Is the variable already known to be alive?
if (VarSetOps::IsMember(this, life, varIndex))
/* Mark variable as dead from here to its closest use */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this a C++ style comment, while you're here.
Also, I would rephrase this as simply "Remove variable from the live set." Talking about its closest use almost makes one think we're talking about its downstream use.

return;
}

VARSET_TP varBit(VarSetOps::MakeEmpty(this));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I know this is pre-existing code, but varBit seems to be a strange name for a set that is likely to have more than one member. I would probably rename this fieldVars or something, but IMO it should at least be plural.


// Removing a call does not affect liveness unless it is a tail call in a nethod with P/Invokes or
// is
// itself a P/Invoke, in which case it may affect the liveness of the frame root variable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird comment formatting

There are three cases that are relevant here:
- Defs of tracked locals
- Uses of tracked locals
- Appearances of untracked locals

Each of these cases has been moved into its own function.
@pgavlin pgavlin force-pushed the LivenessDCE branch 2 times, most recently from c1f1b97 to 44dbecb Compare August 18, 2017 14:02
@@ -9511,7 +9560,7 @@ void Compiler::gtDispNode(GenTreePtr tree, IndentStack* indentStack, __in __in_z

case GT_IND:
// We prefer printing R, V or U
if ((tree->gtFlags & (GTF_IND_REFARR_LAYOUT | GTF_IND_VOLATILE | GTF_IND_UNALIGNED)) == 0)
if ((tree->gtFlags & (GTF_IND_VOLATILE | GTF_IND_UNALIGNED)) == 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above should be updated:

// We prefer printing V or U

if (tree->gtCall.gtCallType == CT_HELPER)
GenTreeCall* const call = tree->AsCall();
const bool ignoreExceptions = (flags & GTF_EXCEPT) == 0;
const bool ignoreCctors = (flags & GTF_IS_IN_CSE) != 0; // We can CSE helpers that run cctors.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we also might be able to CSE expressions that throw exceptions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the case, yes. CSE does not pass the GTF_EXCEPT flag to this function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a function header for this method describing that the caller passes in flags that control the behavior of this method. It seems a bit odd to me to use the flags argument on this way.

Alternatively we could pass in two values : 1) ignoreExceptions and 2 ignoreCctors

In particular:
- Rather than only removing dead code as part of dead store removal,
  remove all side-effect-free nodes that either do not produce a value
  or produce a value that is unused.
- When optimizing, set `fgStmtRemoved` in order to indicate that tracked
  lclVar uses or defs have been removed and liveness may need to be
  recalculated. Previously this flag was only set upon eliminating a
  dead store.
if (tree->gtCall.gtCallType == CT_HELPER)
GenTreeCall* const call = tree->AsCall();
const bool ignoreExceptions = (flags & GTF_EXCEPT) == 0;
const bool ignoreCctors = (flags & GTF_IS_IN_CSE) != 0; // We can CSE helpers that run cctors.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a function header for this method describing that the caller passes in flags that control the behavior of this method. It seems a bit odd to me to use the flags argument on this way.

Alternatively we could pass in two values : 1) ignoreExceptions and 2 ignoreCctors

@pgavlin
Copy link
Author

pgavlin commented Aug 18, 2017

@dotnet-bot test Windows_NT x86 Checked Build and Test

@pgavlin
Copy link
Author

pgavlin commented Aug 18, 2017

@mmitche any idea what's up with the RHEL packages not being found?

@pgavlin
Copy link
Author

pgavlin commented Aug 19, 2017

@dotnet-bot test Windows_NT x86 Checked Build and Test

@pgavlin
Copy link
Author

pgavlin commented Aug 19, 2017

@dotnet-bot
test OSX10.12 x64 Checked Build and Test
test Ubuntu x64 Checked Build and Test
test Windows_NT arm Cross Checked Build and Test
test Windows_NT arm64 Cross Debug Build
test Windows_NT x64 CoreCLR Perf Tests Correctness
test Windows_NT x64 Debug Build and Test
test Windows_NT x64 Release Priority 1 Build and Test
test Windows_NT x86 CoreCLR Perf Tests Correctness

@pgavlin pgavlin merged commit a604580 into dotnet:master Aug 19, 2017
@pgavlin pgavlin deleted the LivenessDCE branch August 19, 2017 13:43
@karelz karelz modified the milestone: 2.1.0 Aug 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants