Skip to content

Tune CCMP for better Perf #116445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -7028,6 +7028,10 @@ class Compiler
void optComputeInterestingVarSets();

private:
// The BitVec below is used to track the blocks which will be converted to switch and hence avoid getting converted to ccmp
BitVecTraits* ccmp_traits;
BitVec ccmp_vec;
Comment on lines +7031 to +7033
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should rather make ccmp_vec a parameter. We should avoid global ambient state when possible. It makes it very hard to understand the flow of how this gets defined and used.

BitVecTraits can be constructed on-the-fly when the bit vector is used.

I have a question. Why is all this special casing required for xarch? For arm64 we simply disable the optimization. Why not do the same?

// Limit to XARCH, ARM is already doing a great job with such comparisons using
// a series of ccmp instruction (see ifConvert phase).
#ifdef TARGET_XARCH

Is this comment not also true for xarch now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current changes are not special handling but rather an attempt to improve codegen / throughput for CCMP on XARCH. XARCH also has a way to disable this optimization but the current changes are rather aimed at improving the perf when CCMP optimization is enabled on XARCH.

Is this comment not also true for xarch now?

Regarding the comment, I will look into it. It might be a stale comment but I need to check its functionality and why it is there. It might be out of scope for this PR but will look into it since it seems to be related to CCMP optimization on XARCH


// Given a loop mark it and any nested loops as having 'memoryHavoc'
void optRecordLoopNestsMemoryHavoc(FlowGraphNaturalLoop* loop, MemoryKindSet memoryHavoc);

Expand All @@ -7040,7 +7044,7 @@ class Compiler
public:
PhaseStatus optOptimizeBools();
PhaseStatus optRecognizeAndOptimizeSwitchJumps();
bool optSwitchConvert(BasicBlock* firstBlock, int testsCount, ssize_t* testValues, weight_t falseLikelihood, GenTree* nodeToTest);
bool optSwitchConvert(BasicBlock* firstBlock, int testsCount, ssize_t* testValues, weight_t falseLikelihood, GenTree* nodeToTest, bool testingForConversion = false);
bool optSwitchDetectAndConvert(BasicBlock* firstBlock, bool testingForConversion = false);

PhaseStatus optInvertLoops(); // Invert loops so they're entered at top and tested at bottom.
Expand Down
49 changes: 42 additions & 7 deletions src/coreclr/jit/lower.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11589,6 +11589,39 @@ bool Lowering::TryLowerAndNegativeOne(GenTreeOp* node, GenTree** nextNode)
}

#if defined(TARGET_AMD64) || defined(TARGET_ARM64)
//------------------------------------------------------------------------
// CanConvertOpToCCMP : Checks whether operand can be converted to CCMP
//
// Arguments:
// operand - operand to check for CCMP conversion
// tree - parent of the operand
//
// Return Value:
// true if operand can be converted to CCMP
//
bool Lowering::CanConvertOpToCCMP(GenTree* operand, GenTree* tree)
{
return operand->OperIsCmpCompare() && varTypeIsIntegralOrI(operand->gtGetOp1()) &&
IsInvariantInRange(operand, tree);
}

//------------------------------------------------------------------------
// IsOpPreferredForCCMP : Checks if operand is preferred for conversion to CCMP
//
// Arguments:
// operand - operand to check for CCMP conversion
//
// Return Value:
// true if operand is preferred for CCMP
//
bool Lowering::IsOpPreferredForCCMP(GenTree* operand)
{
assert(operand->OperIsCmpCompare());
return (operand->gtGetOp1()->IsIntegralConst() || !operand->gtGetOp1()->isContained()) &&
(operand->gtGetOp2() == nullptr || operand->gtGetOp2()->IsIntegralConst() ||
!operand->gtGetOp2()->isContained());
}

//------------------------------------------------------------------------
// TryLowerAndOrToCCMP : Lower AND/OR of two conditions into test + CCMP + SETCC nodes.
//
Expand Down Expand Up @@ -11618,6 +11651,10 @@ bool Lowering::TryLowerAndOrToCCMP(GenTreeOp* tree, GenTree** next)
DISPTREERANGE(BlockRange(), tree);
JITDUMP("\n");
}
else
{
return false;
}

// Find out whether an operand is eligible to be converted to a conditional
// compare. It must be a normal integral relop; for example, we cannot
Expand All @@ -11630,17 +11667,15 @@ bool Lowering::TryLowerAndOrToCCMP(GenTreeOp* tree, GenTree** next)
// by TryLowerConditionToFlagsNode.
//
GenCondition cond1;
if (op2->OperIsCmpCompare() && varTypeIsIntegralOrI(op2->gtGetOp1()) && IsInvariantInRange(op2, tree) &&
(op2->gtGetOp1()->IsIntegralConst() || !op2->gtGetOp1()->isContained()) &&
(op2->gtGetOp2() == nullptr || op2->gtGetOp2()->IsIntegralConst() || !op2->gtGetOp2()->isContained()) &&
bool canConvertOp2ToCCMP = CanConvertOpToCCMP(op2, tree);
bool canConvertOp1ToCCMP = CanConvertOpToCCMP(op1, tree);

if (canConvertOp2ToCCMP && (!canConvertOp1ToCCMP || IsOpPreferredForCCMP(op2)) &&
TryLowerConditionToFlagsNode(tree, op1, &cond1, false))
{
// Fall through, converting op2 to the CCMP
}
else if (op1->OperIsCmpCompare() && varTypeIsIntegralOrI(op1->gtGetOp1()) && IsInvariantInRange(op1, tree) &&
(op1->gtGetOp1()->IsIntegralConst() || !op1->gtGetOp1()->isContained()) &&
(op1->gtGetOp2() == nullptr || op1->gtGetOp2()->IsIntegralConst() || !op1->gtGetOp2()->isContained()) &&
TryLowerConditionToFlagsNode(tree, op2, &cond1, false))
else if (canConvertOp1ToCCMP && TryLowerConditionToFlagsNode(tree, op2, &cond1, false))
{
std::swap(op1, op2);
}
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/jit/lower.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ class Lowering final : public Phase
void ContainCheckLclHeap(GenTreeOp* node);
void ContainCheckRet(GenTreeUnOp* ret);
#if defined(TARGET_ARM64) || defined(TARGET_AMD64)
bool IsOpPreferredForCCMP(GenTree* operand);
bool CanConvertOpToCCMP(GenTree* operand, GenTree* tree);
bool TryLowerAndOrToCCMP(GenTreeOp* tree, GenTree** next);
insCflags TruthifyingFlags(GenCondition cond);
void ContainCheckConditionalCompare(GenTreeCCMP* ccmp);
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/jit/optimizebools.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1581,6 +1581,8 @@ PhaseStatus Compiler::optOptimizeBools()
unsigned numCond = 0;
unsigned numPasses = 0;
unsigned stress = false;
ccmp_traits = new BitVecTraits(fgBBNumMax + 1, this);
ccmp_vec = BitVecOps::MakeEmpty(ccmp_traits);

do
{
Expand Down
76 changes: 54 additions & 22 deletions src/coreclr/jit/switchrecognition.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@
#define SWITCH_MAX_DISTANCE ((TARGET_POINTER_SIZE * BITS_PER_BYTE) - 1)
#define SWITCH_MIN_TESTS 3

// This is a heuristics based value tuned for optimal performance
#define CONVERT_SWITCH_TO_CCMP_MIN_TEST 5

//-----------------------------------------------------------------------------
// optRecognizeAndOptimizeSwitchJumps: Optimize range check for `x == cns1 || x == cns2 || x == cns3 ...`
// pattern and convert it to a BBJ_SWITCH block (jump table), which then *might* be converted
Expand Down Expand Up @@ -186,6 +189,18 @@ bool Compiler::optSwitchDetectAndConvert(BasicBlock* firstBlock, bool testingFor
// TODO: make it more flexible and support cases like "x != cns1 && x != cns2 && ..."
return false;
}
if (testingForConversion)
{
if (BitVecOps::IsMember(ccmp_traits, ccmp_vec, firstBlock->bbNum))
{
return true;
}
else
{
ccmp_vec = BitVecOps::MakeEmpty(ccmp_traits);
}
}


// No more than SWITCH_MAX_TABLE_SIZE blocks are allowed (arbitrary limit in this context)
int testValueIndex = 0;
Expand All @@ -209,8 +224,8 @@ bool Compiler::optSwitchDetectAndConvert(BasicBlock* firstBlock, bool testingFor
{
// Only the first conditional block can have multiple statements.
// Stop searching and process what we already have.
return !testingForConversion &&
optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode);
return optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode,
testingForConversion);
}

// Inspect secondary blocks
Expand All @@ -220,29 +235,29 @@ bool Compiler::optSwitchDetectAndConvert(BasicBlock* firstBlock, bool testingFor
if (currTrueTarget != trueTarget)
{
// This blocks jumps to a different target, stop searching and process what we already have.
return !testingForConversion &&
optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode);
return optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode,
testingForConversion);
}

if (!GenTree::Compare(currVariableNode, variableNode->gtEffectiveVal()))
{
// A different variable node is used, stop searching and process what we already have.
return !testingForConversion &&
optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode);
return optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode,
testingForConversion);
}

if (currBb->GetUniquePred(this) != prevBlock)
{
// Multiple preds in a secondary block, stop searching and process what we already have.
return !testingForConversion &&
optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode);
return optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode,
testingForConversion);
}

if (!BasicBlock::sameEHRegion(prevBlock, currBb))
{
// Current block is in a different EH region, stop searching and process what we already have.
return !testingForConversion &&
optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode);
return optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode,
testingForConversion);
}

// Ok we can work with that, add the test value to the list
Expand All @@ -252,27 +267,23 @@ bool Compiler::optSwitchDetectAndConvert(BasicBlock* firstBlock, bool testingFor
if (testValueIndex == SWITCH_MAX_DISTANCE)
{
// Too many suitable tests found - stop and process what we already have.
return !testingForConversion &&
optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode);
return optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode,
testingForConversion);
}

if (isReversed)
{
// We only support reversed test (GT_NE) for the last block.
return !testingForConversion &&
optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode);
return optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode,
testingForConversion);
}

if (testingForConversion)
return true;

prevBlock = currBb;
}
else
{
// Current block is not a suitable test, stop searching and process what we already have.
return !testingForConversion &&
optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode);
return optSwitchConvert(firstBlock, testValueIndex, testValues, falseLikelihood, variableNode,
testingForConversion);
}
}
}
Expand All @@ -296,12 +307,21 @@ bool Compiler::optSwitchDetectAndConvert(BasicBlock* firstBlock, bool testingFor
// Return Value:
// True if the conversion was successful, false otherwise
//
bool Compiler::optSwitchConvert(
BasicBlock* firstBlock, int testsCount, ssize_t* testValues, weight_t falseLikelihood, GenTree* nodeToTest)
bool Compiler::optSwitchConvert(BasicBlock* firstBlock,
int testsCount,
ssize_t* testValues,
weight_t falseLikelihood,
GenTree* nodeToTest,
bool testingForConversion)
{
assert(firstBlock->KindIs(BBJ_COND));
assert(!varTypeIsSmall(nodeToTest));

if (testingForConversion && (testsCount < CONVERT_SWITCH_TO_CCMP_MIN_TEST))
{
return false;
}

if (testsCount < SWITCH_MIN_TESTS)
{
// Early out - short chains.
Expand Down Expand Up @@ -376,6 +396,18 @@ bool Compiler::optSwitchConvert(
FlowEdge* const trueEdge = firstBlock->GetTrueEdge();
FlowEdge* const falseEdge = firstBlock->GetFalseEdge();

if (testingForConversion)
{
// Return if we are just checking for a possibility of a switch convert and not actually making the conversion
// to switch here.
BasicBlock* iterBlock = firstBlock;
for (int i = 0; i < testsCount; i++)
{
BitVecOps::AddElemD(ccmp_traits, ccmp_vec, iterBlock->bbNum);
iterBlock = iterBlock->GetFalseTarget();
}
return true;
}
// Convert firstBlock to a switch block
firstBlock->SetSwitch(new (this, CMK_BasicBlock) BBswtDesc);
firstBlock->bbCodeOffsEnd = lastBlock->bbCodeOffsEnd;
Expand Down
Loading