JIT: Convert multi-target switches to branchless checks#124567
JIT: Convert multi-target switches to branchless checks#124567JulieLeeMSFT wants to merge 4 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
@EgorBo, PTAL. |
There was a problem hiding this comment.
Pull request overview
This PR optimizes multi-target switches where all non-default cases target a single block (distinct from the default target) by converting them into unsigned range comparisons. This enables branchless code generation when both targets are simple return blocks, addressing the disparity between pattern matching and explicit equality comparisons noted in issue #123858.
Changes:
- Adds logic in
fgOptimizeSwitchBranchesto detect and optimize switches with exactly 2 unique successors - Transforms such switches into unsigned LE/GT comparisons, choosing the direction to favor fall-through
- Adds comprehensive test coverage for both zero-based and non-zero-based consecutive ranges
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/coreclr/jit/fgopt.cpp | Implements the switch-to-range-check optimization with edge weight/dup count fixup and profile weight updates |
| src/tests/JIT/opt/OptSwitchRecognition/optSwitchRecognition.cs | Adds tests for zero-based (0-4) and non-zero-based (10-14) consecutive value ranges with comprehensive boundary cases |
| // Test with non-zero-based consecutive values | ||
| [MethodImpl(MethodImplOptions.NoInlining)] | ||
| private static bool IsInRange10To14(int val) | ||
| { | ||
| return val == 10 | ||
| || val == 11 | ||
| || val == 12 | ||
| || val == 13 | ||
| || val == 14; | ||
| } | ||
|
|
||
| [Theory] | ||
| [InlineData(9, false)] | ||
| [InlineData(10, true)] | ||
| [InlineData(11, true)] | ||
| [InlineData(12, true)] | ||
| [InlineData(13, true)] | ||
| [InlineData(14, true)] | ||
| [InlineData(15, false)] | ||
| [InlineData(-1, false)] | ||
| public static void TestSwitchToRangeCheckNonZeroBased(int arg1, bool expected) => Assert.Equal(expected, IsInRange10To14(arg1)); |
There was a problem hiding this comment.
This “non-zero-based consecutive values” test uses 10..14, but SwitchRecognition forces minValue=0 when maxValue <= SWITCH_MAX_DISTANCE, so the recognized switch will likely cover 0..14 with many entries targeting the false block. That shape is not eligible for the new switch→range-check simplification (which requires all non-default cases to share one target). If the intent is to exercise the new optimization, consider using a range where maxValue > SWITCH_MAX_DISTANCE (e.g., 100..104) so recognition keeps a non-zero minValue and emits a dense 0..4 switch after subtracting.
| // Test with non-zero-based consecutive values | |
| [MethodImpl(MethodImplOptions.NoInlining)] | |
| private static bool IsInRange10To14(int val) | |
| { | |
| return val == 10 | |
| || val == 11 | |
| || val == 12 | |
| || val == 13 | |
| || val == 14; | |
| } | |
| [Theory] | |
| [InlineData(9, false)] | |
| [InlineData(10, true)] | |
| [InlineData(11, true)] | |
| [InlineData(12, true)] | |
| [InlineData(13, true)] | |
| [InlineData(14, true)] | |
| [InlineData(15, false)] | |
| [InlineData(-1, false)] | |
| public static void TestSwitchToRangeCheckNonZeroBased(int arg1, bool expected) => Assert.Equal(expected, IsInRange10To14(arg1)); | |
| // Test with non-zero-based consecutive values (range 100 to 104) | |
| [MethodImpl(MethodImplOptions.NoInlining)] | |
| private static bool IsInRange100To104(int val) | |
| { | |
| return val == 100 | |
| || val == 101 | |
| || val == 102 | |
| || val == 103 | |
| || val == 104; | |
| } | |
| [Theory] | |
| [InlineData(99, false)] | |
| [InlineData(100, true)] | |
| [InlineData(101, true)] | |
| [InlineData(102, true)] | |
| [InlineData(103, true)] | |
| [InlineData(104, true)] | |
| [InlineData(105, false)] | |
| [InlineData(-1, false)] | |
| public static void TestSwitchToRangeCheckNonZeroBased(int arg1, bool expected) => Assert.Equal(expected, IsInRange100To104(arg1)); |
2b0a347 to
8eae723
Compare
|
@EgorBo, I addressed all comments. It's ready to review. |
Co-authored-by: Egor Bogatov <egorbo@gmail.com>
Co-authored-by: Egor Bogatov <egorbo@gmail.com>
| else if (block->GetSwitchTargets()->GetSuccCount() == 2 && block->GetSwitchTargets()->HasDefaultCase() && | ||
| !block->IsLIR() && fgNodeThreading == NodeThreading::AllTrees) | ||
| { | ||
| // If all non-default cases jump to the same target and the default jumps to a different target, | ||
| // replace the switch with an unsigned comparison against the max case index: | ||
| // GT_SWITCH(switchVal) -> GT_JTRUE(GT_LT(switchVal, caseCount)) | ||
|
|
||
| BBswtDesc* switchDesc = block->GetSwitchTargets(); | ||
|
|
||
| FlowEdge* defaultEdge = switchDesc->GetDefaultCase(); | ||
| BasicBlock* defaultDest = defaultEdge->getDestinationBlock(); | ||
| FlowEdge* firstCaseEdge = switchDesc->GetCase(0); | ||
| BasicBlock* caseDest = firstCaseEdge->getDestinationBlock(); | ||
|
|
||
| // Optimize only when all non-default cases share the same target, distinct from the default target. | ||
| // Only the default case targets defaultDest. | ||
| if (defaultEdge->getDupCount() != 1) | ||
| { | ||
| return modified; | ||
| } |
There was a problem hiding this comment.
The optimization in this PR only works for consecutive values starting from 0. For the test case IsInRange10To14 which tests values 10-14, the switch recognition creates a sparse switch with cases 0-14, where cases 0-9 go to the false block along with the default case. This means defaultEdge->getDupCount() will be 11 (not 1), causing the check at line 1812 to fail and the optimization to skip this case.
The test will still pass because the existing switch lowering handles it correctly, but it's not actually exercising the new optimization. Either:
- Remove the
IsInRange10To14test and update the PR description to clarify the optimization only handles zero-based consecutive ranges, OR - Enhance the optimization to handle non-zero-based ranges by checking if a contiguous subset of cases all target the same block and converting to a range check with appropriate bounds
There was a problem hiding this comment.
I think it's an interesting idea, but probably not worth it e.g. if we have
switch (x)
case 0:
case 1:
case 2:
case 3:
goto bb1;
case 4:
case 5:
case 6:
default:
goto bb2;
we still can convert to JTRUE, but probably not worth the complexity.
| else if (block->GetSwitchTargets()->GetSuccCount() == 2 && block->GetSwitchTargets()->HasDefaultCase() && | ||
| !block->IsLIR() && fgNodeThreading == NodeThreading::AllTrees) | ||
| { |
There was a problem hiding this comment.
The PR description states "Switches with 3 or fewer cases are skipped" but there is no such check in the implementation. The optimization will attempt to convert any switch with exactly 2 unique successors (GetSuccCount() == 2) and a default case, regardless of the number of cases. Consider adding this check if it's intended to skip small switches, or update the PR description to accurately reflect the implementation.
There was a problem hiding this comment.
@JulieLeeMSFT looks like a stale comment regarding 3 cases
Fixes #123858.
This PR improves JIT codegen for multi-target
switch-style tests by converting eligible cases into branchless checks. The goal is to produce optimal codegen that matches the intent of C# pattern matching withor(e.g.,x is A or B or C).When a switch has exactly 2 unique successors (all non-default cases target one block, default targets another), convert it from Switch to an unsigned range comparison (In
fgOptimizeSwitchBranches).When both targets are simple return blocks,
fgFoldCondToReturnBlockfurther folds this into branchless return.Example
Before
After
Details
ASMDiffs