Skip to content

JIT: cross-block local assertion prop in morph #94363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion eng/pipelines/common/templates/runtimes/run-test-job.yml
Original file line number Diff line number Diff line change
Expand Up @@ -585,7 +585,7 @@ jobs:
- jitobjectstackallocation
- jitphysicalpromotion_only
- jitphysicalpromotion_full

- jitcrossblocklocalassertionprop
${{ if in(parameters.testGroup, 'jit-cfg') }}:
scenarios:
- jitcfg
Expand Down
90 changes: 68 additions & 22 deletions src/coreclr/jit/assertionprop.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -541,36 +541,83 @@ void Compiler::optAssertionTraitsInit(AssertionIndex assertionCount)

void Compiler::optAssertionInit(bool isLocalProp)
{
// Use a function countFunc to determine a proper maximum assertion count for the
// method being compiled. The function is linear to the IL size for small and
// moderate methods. For large methods, considering throughput impact, we track no
// more than 64 assertions.
// Note this tracks at most only 256 assertions.
static const AssertionIndex countFunc[] = {64, 128, 256, 64};
static const unsigned lowerBound = 0;
static const unsigned upperBound = ArrLen(countFunc) - 1;
const unsigned codeSize = info.compILCodeSize / 512;
optMaxAssertionCount = countFunc[isLocalProp ? lowerBound : min(upperBound, codeSize)];

optLocalAssertionProp = isLocalProp;
optAssertionTabPrivate = new (this, CMK_AssertionProp) AssertionDsc[optMaxAssertionCount];
assert(NO_ASSERTION_INDEX == 0);
const unsigned maxTrackedLocals = (unsigned)JitConfig.JitMaxLocalsToTrack();

if (!isLocalProp)
// We initialize differently for local prop / global prop
//
if (isLocalProp)
{
optLocalAssertionProp = true;
optCrossBlockLocalAssertionProp = true;

// Disable via config
//
if (JitConfig.JitEnableCrossBlockLocalAssertionProp() == 0)
{
JITDUMP("Disabling cross-block assertion prop by config setting\n");
optCrossBlockLocalAssertionProp = false;
}

// Disable if too many locals
//
// The typical number of local assertions is roughly proportional
// to the number of locals. So when we have huge numbers of locals,
// just do within-block local assertion prop.
//
if (lvaCount > maxTrackedLocals)
{
JITDUMP("Disabling cross-block assertion prop: too many locals\n");
optCrossBlockLocalAssertionProp = false;
}
Comment on lines +568 to +572
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it might be very "spurious diff" inducing (add a local and suddenly a bunch of assertions stop being propagated). You might have mentioned it somewhere else, but do you have any plans to make this less "hard threshold-y"?

Copy link
Member

@jakobbotsch jakobbotsch Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looking at things like optAssertionDep below it kind of feels like we should be calling lvaSortByRefCount here and utilize tracked indices, although I don't think ref counts are up to date here (in particular ones introduced by physical promotion -- might not be so hard to fix that).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's one of those cases where it would be great to have separated "single block" locals and "global" locals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have any plans to make this less "hard threshold-y"?

I don't have any great ideas here, no. Some of this is the inherent cost of a blind forward scheme, we push facts along hoping they will matter. SSA would be helpful, we could just attach assertions to SSA lifetimes.

utilize tracked indices

Some kind of prioritization scheme might work out nicely; presumably frequently referenced locals are more interesting candidates for assertions.

I guess it's one of those cases where it would be great to have separated "single block" locals and "global" locals.

I had the same thought -- I tried an experiment to see how many assertions were killed in the same block they were born in, and it was quite small; most of them get killed at merges.

It might be possible to leverage RCS_EARLY to deduce if newly generated assertions are likely to be referenced later on; if not they could be purged before they "escape" the block. But we're also enabling cross-block copy prop so even getting that right might be tricky, as the ref counts can and will change during morph.


if (optCrossBlockLocalAssertionProp)
{
// We may need a fairly large table.
// Allow for roughly one assertion per local, up to the tracked limit.
// (empirical studies show about 0.6 asserions/local)
//
optMaxAssertionCount = (AssertionIndex)min(maxTrackedLocals, ((lvaCount / 64) + 1) * 64);
}
else
{
// The assertion table will be reset for each block, so it can be smaller.
//
optMaxAssertionCount = 64;
}

// Local assertion prop keeps mappings from each local var to the assertions about that var.
//
optAssertionDep =
new (this, CMK_AssertionProp) JitExpandArray<ASSERT_TP>(getAllocator(CMK_AssertionProp), max(1, lvaCount));
}
else
{
// General assertion prop.
//
optLocalAssertionProp = false;

// Use a function countFunc to determine a proper maximum assertion count for the
// method being compiled. The function is linear to the IL size for small and
// moderate methods. For large methods, considering throughput impact, we track no
// more than 64 assertions.
// Note this tracks at most only 256 assertions.
//
static const AssertionIndex countFunc[] = {64, 128, 256, 64};
static const unsigned upperBound = ArrLen(countFunc) - 1;
const unsigned codeSize = info.compILCodeSize / 512;
optMaxAssertionCount = countFunc[min(upperBound, codeSize)];

optValueNumToAsserts =
new (getAllocator(CMK_AssertionProp)) ValueNumToAssertsMap(getAllocator(CMK_AssertionProp));
optComplementaryAssertionMap = new (this, CMK_AssertionProp)
AssertionIndex[optMaxAssertionCount + 1](); // zero-inited (NO_ASSERTION_INDEX)
}

if (optAssertionDep == nullptr)
{
optAssertionDep =
new (this, CMK_AssertionProp) JitExpandArray<ASSERT_TP>(getAllocator(CMK_AssertionProp), max(1, lvaCount));
}
optAssertionTabPrivate = new (this, CMK_AssertionProp) AssertionDsc[optMaxAssertionCount];

optAssertionTraitsInit(optMaxAssertionCount);

optAssertionCount = 0;
optAssertionOverflow = 0;
optAssertionPropagated = false;
Expand Down Expand Up @@ -4379,12 +4426,11 @@ AssertionIndex Compiler::optAssertionIsNonNullInternal(GenTree* op,
// Find live assertions related to lclNum
//
unsigned const lclNum = op->AsLclVarCommon()->GetLclNum();
ASSERT_TP apDependent = GetAssertionDep(lclNum);
BitVecOps::IntersectionD(apTraits, apDependent, apLocal);
ASSERT_TP apDependent = BitVecOps::Intersection(apTraits, GetAssertionDep(lclNum), assertions);

// Scan those looking for a suitable assertion
//
BitVecOps::Iter iter(apTraits, assertions);
BitVecOps::Iter iter(apTraits, apDependent);
unsigned index = 0;
while (iter.NextElem(&index))
{
Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -7640,6 +7640,7 @@ class Compiler
AssertionDsc* optAssertionTabPrivate; // table that holds info about value assignments
AssertionIndex optAssertionCount; // total number of assertions in the assertion table
AssertionIndex optMaxAssertionCount;
bool optCrossBlockLocalAssertionProp;
unsigned optAssertionOverflow;
bool optCanPropLclVar;
bool optCanPropEqual;
Expand Down
3 changes: 3 additions & 0 deletions src/coreclr/jit/jitconfigvalues.h
Original file line number Diff line number Diff line change
Expand Up @@ -653,6 +653,9 @@ CONFIG_INTEGER(JitEnableHeadTailMerge, W("JitEnableHeadTailMerge"), 1)
// Enable physical promotion
CONFIG_INTEGER(JitEnablePhysicalPromotion, W("JitEnablePhysicalPromotion"), 1)

// Enable cross-block local assertion prop
CONFIG_INTEGER(JitEnableCrossBlockLocalAssertionProp, W("JitEnableCrossBlockLocalAssertionProp"), 0)

#if defined(DEBUG)
// JitFunctionFile: Name of a file that contains a list of functions. If the currently compiled function is in the
// file, certain other JIT config variables will be active. If the currently compiled function is not in the file,
Expand Down
100 changes: 85 additions & 15 deletions src/coreclr/jit/morph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13780,10 +13780,73 @@ void Compiler::fgMorphBlock(BasicBlock* block)

if (optLocalAssertionProp)
{
// For now, each block starts with an empty table, and no available assertions
//
optAssertionReset(0);
apLocal = BitVecOps::MakeEmpty(apTraits);
if (!optCrossBlockLocalAssertionProp)
{
// Each block starts with an empty table, and no available assertions
//
optAssertionReset(0);
apLocal = BitVecOps::MakeEmpty(apTraits);
}
else
{
// Determine if this block can leverage assertions from its pred blocks.
//
// Some blocks are ineligible.
//
bool canUsePredAssertions = ((block->bbFlags & BBF_CAN_ADD_PRED) == 0) && !bbIsHandlerBeg(block);

// Validate all preds have valid info
//
if (!canUsePredAssertions)
{
JITDUMP(FMT_BB " ineligible for cross-block\n", block->bbNum);
}
else
{
bool hasPredAssertions = false;

for (BasicBlock* const pred : block->PredBlocks())
{
// A smaller pred postorder number means the pred appears later in the postorder.
// An equal number means pred == block (block is a self-loop).
// Either way the assertion info is not available, and we must assume the worst.
//
if (pred->bbPostorderNum <= block->bbPostorderNum)
{
JITDUMP(FMT_BB " pred " FMT_BB " not processed; clearing assertions in\n", block->bbNum,
pred->bbNum);
break;
}

// Yes, pred assertions are available. If this is the first pred, copy.
// If this is a subsequent pred, intersect.
//
if (!hasPredAssertions)
{
apLocal = BitVecOps::MakeCopy(apTraits, pred->bbAssertionOut);
hasPredAssertions = true;
}
else
{
BitVecOps::IntersectionD(apTraits, apLocal, pred->bbAssertionOut);
}
}

if (!hasPredAssertions)
{
// Either no preds, or some preds w/o assertions.
//
canUsePredAssertions = false;
}
}

if (!canUsePredAssertions)
{
apLocal = BitVecOps::MakeEmpty(apTraits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also need optAssertionReset(0);?

Can the (!optCrossBlockLocalAssertionProp) and the "failure" cases be combined (tail merged)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also need optAssertionReset(0);?

No. There are 3 modes of operation here:

  1. classic: block starts with empty table via Reset(0) and empty BV
  2. cross-block: block starts with current table and intersection of pred BVs (which may end up empty)
  3. cross-block special case: block starts with current table and empty BV

Modes 2 and 3 can mix with each other, but not with mode 1.

We use mode 1 if the feature is disabled, or if the method has a lot of locals.

When the feature is enabled, and the method does not have a lot of locals, we use mode 2 (if canUsePredAssertions) or mode 3 (otherwise) depending on the various block properties.

Can the (!optCrossBlockLocalAssertionProp) and the "failure" cases be combined (tail merged)?

Let me look at tweaking this.

}

JITDUMPEXEC(optDumpAssertionIndices("Assertions in: ", apLocal));
}
}

// Make the current basic block address available globally.
Expand All @@ -13801,6 +13864,14 @@ void Compiler::fgMorphBlock(BasicBlock* block)
}
}

// Publish the live out state.
//
if (optCrossBlockLocalAssertionProp && (block->NumSucc() > 0))
{
assert(optLocalAssertionProp);
block->bbAssertionOut = BitVecOps::MakeCopy(apTraits, apLocal);
}

compCurBB = nullptr;
}

Expand All @@ -13820,15 +13891,18 @@ PhaseStatus Compiler::fgMorphBlocks()
//
fgGlobalMorph = true;

// Local assertion prop is enabled if we are optimized
//
optLocalAssertionProp = opts.OptimizationEnabled();

if (optLocalAssertionProp)
if (opts.OptimizationEnabled())
{
// Local assertion prop is enabled if we are optimizing.
//
optAssertionInit(/* isLocalProp*/ true);
}
else
{
// Initialize for local assertion prop
// Not optimizing. No assertion prop.
//
optAssertionInit(true);
optLocalAssertionProp = false;
optCrossBlockLocalAssertionProp = false;
}

if (!compEnregLocals())
Expand All @@ -13855,10 +13929,6 @@ PhaseStatus Compiler::fgMorphBlocks()
{
// If we aren't optimizing, we just morph in normal bbNext order.
//
// Note morph can add blocks downstream from the current block,
// and alter (but not null out) the current block's bbNext;
// this iterator ensures they all get visited.
//
for (BasicBlock* block : Blocks())
{
fgMorphBlock(block);
Expand Down
2 changes: 2 additions & 0 deletions src/tests/Common/testenvironment.proj
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@
RunningIlasmRoundTrip;
DOTNET_JitSynthesizeCounts;
DOTNET_JitCheckSynthesizedCounts
DOTNET_JitDoCrossBlockLocalAssertionProp
</DOTNETVariables>
</PropertyGroup>
<ItemGroup>
Expand Down Expand Up @@ -220,6 +221,7 @@
<TestEnvironment Include="jitobjectstackallocation" JitObjectStackAllocation="1" TieredCompilation="0" />
<TestEnvironment Include="jitphysicalpromotion_only" JitStressModeNames="STRESS_NO_OLD_PROMOTION" TieredCompilation="0" />
<TestEnvironment Include="jitphysicalpromotion_full" JitStressModeNames="STRESS_PHYSICAL_PROMOTION_COST STRESS_NO_OLD_PROMOTION" TieredCompilation="0" />
<TestEnvironment Include="jitcrossblocklocalassertionprop" JitEnableCrossBlockLocalAssertionProp="1" TieredCompilation="0" />
<TestEnvironment Include="jitcfg" JitForceControlFlowGuard="1" />
<TestEnvironment Include="jitcfg_dispatcher_always" JitForceControlFlowGuard="1" JitCFGUseDispatcher="1" />
<TestEnvironment Include="jitcfg_dispatcher_never" JitForceControlFlowGuard="1" JitCFGUseDispatcher="0" />
Expand Down