-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Tune CCMP for better Perf #116445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Tune CCMP for better Perf #116445
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the JIT’s CCMP optimization by ensuring full switch chains are detected across basic blocks and by broadening the lowering phase to consider more compare operations for CCMP.
- Introduces a
testingForConversion
mode inoptSwitchDetectAndConvert
with a minimum-test threshold to avoid partial CCMP. - Declares and defines
CanConvertOpToCCMP
andIsOpPreferredForCCMP
to guide lowering choices. - Adds
BBF_SWITCH_CONVERSION_LIKELY
to mark candidate blocks and clears it on block splits.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
switchrecognition.cpp | Added conversion-testing path, BBF_SWITCH_CONVERSION_LIKELY , and CONVERT_SWITCH_TO_CCMP_MIN_TEST . |
lower.h | Declared CanConvertOpToCCMP and IsOpPreferredForCCMP . |
lower.cpp | Defined CCMP helper methods and updated TryLowerAndOrToCCMP . |
fgbasic.cpp | Cleared BBF_SWITCH_CONVERSION_LIKELY on block splits. |
compiler.h | Extended optSwitchConvert and optSwitchDetectAndConvert APIs. |
block.h | Defined new BBF_SWITCH_CONVERSION_LIKELY flag. |
Comments suppressed due to low confidence (2)
src/coreclr/jit/switchrecognition.cpp:15
- There are no tests covering the new minimum-test threshold for CCMP conversion (fewer than 5 comparisons). Please add unit tests that verify behavior both below and above this threshold to prevent regressions.
#define CONVERT_SWITCH_TO_CCMP_MIN_TEST 5
src/coreclr/jit/lower.cpp:11664
- The newly added 'else { return false; }' appears to pair with the preceding debug-only block (e.g., after JITDUMP), causing TryLowerAndOrToCCMP to return false in normal builds. This likely disables CCMP lowering when not in verbose mode. Adjust the else so it’s scoped to the intended condition or remove it.
else
BasicBlock* iterBlock = firstBlock; | ||
for (int i = 0; i < testsCount; i++) | ||
{ | ||
iterBlock->SetFlags(BBF_SWITCH_CONVERSION_LIKELY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an anti pattern to propagate information via a flag like this.
Why is it necessary? Can the code be refactored into separate "check for eligibility" and "perform the conversion" methods? If not, it would be better to use a block set to track this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh.. Let me look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakobbotsch this review has been handled in my latest changes.
@dotnet/intel for further review. |
// The BitVec below is used to track the blocks which will be converted to switch and hence avoid getting converted to ccmp | ||
BitVecTraits* ccmp_traits; | ||
BitVec ccmp_vec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should rather make ccmp_vec
a parameter. We should avoid global ambient state when possible. It makes it very hard to understand the flow of how this gets defined and used.
BitVecTraits
can be constructed on-the-fly when the bit vector is used.
I have a question. Why is all this special casing required for xarch? For arm64 we simply disable the optimization. Why not do the same?
runtime/src/coreclr/jit/switchrecognition.cpp
Lines 35 to 37 in a0397e7
// Limit to XARCH, ARM is already doing a great job with such comparisons using | |
// a series of ccmp instruction (see ifConvert phase). | |
#ifdef TARGET_XARCH |
Is this comment not also true for xarch now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current changes are not special handling but rather an attempt to improve codegen / throughput for CCMP on XARCH. XARCH also has a way to disable this optimization but the current changes are rather aimed at improving the perf when CCMP optimization is enabled on XARCH.
Is this comment not also true for xarch now?
Regarding the comment, I will look into it. It might be a stale comment but I need to check its functionality and why it is there. It might be out of scope for this PR but will look into it since it seems to be related to CCMP optimization on XARCH
This PR enable more paths for CCMP(#111072) by doing the following
Control when and how many switches are converted to CCMP - A switch conversion can span across blocks but CCMP did not check across blocks and hence converted potential switch candidates to ccmp partially hence reducing the effectiveness of switch. This has been handled in this PR to make sure existing switches do not regress.
Let all candidates for CCMP go through lowering - There were priori conditions for a CCMP to happen. Although it can handle all types of nodes, it was limited to certain types of node right now. I have gone ahead and enabled CCMP on all nodes while carefully checking for which node to convert to CCMP.
Superpmi run
Clean Superpmi replay
TESTING
SDE test RUN

APX + CCMP + PR SDE test RUN
Superpmi Results -
Base - APX + CCMP
Diff - APX + CCMP + PR
Overall (-53,202 bytes)
Details
Instruction Count improvements/regressions per collection