-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DAGCombiner] Attempt to fold 'add' nodes to funnel-shift or rotate #125612
base: main
Are you sure you want to change the base?
[DAGCombiner] Attempt to fold 'add' nodes to funnel-shift or rotate #125612
Conversation
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-llvm-selectiondag Author: Alex MacLean (AlexMaclean) ChangesAlmost all of the rotate idioms that are valid for an 'or' are also valid when the halves are combined with an 'add'. Further, many of these cases are not handled by common bits tracking meaning that the 'add' is not converted to a 'disjoint or'. Full diff: https://github.com/llvm/llvm-project/pull/125612.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index f4caaf426de6a07..d671a6d555621c1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -662,14 +662,15 @@ namespace {
bool DemandHighBits = true);
SDValue MatchBSwapHWord(SDNode *N, SDValue N0, SDValue N1);
SDValue MatchRotatePosNeg(SDValue Shifted, SDValue Pos, SDValue Neg,
- SDValue InnerPos, SDValue InnerNeg, bool HasPos,
- unsigned PosOpcode, unsigned NegOpcode,
- const SDLoc &DL);
+ SDValue InnerPos, SDValue InnerNeg, bool FromAdd,
+ bool HasPos, unsigned PosOpcode,
+ unsigned NegOpcode, const SDLoc &DL);
SDValue MatchFunnelPosNeg(SDValue N0, SDValue N1, SDValue Pos, SDValue Neg,
- SDValue InnerPos, SDValue InnerNeg, bool HasPos,
- unsigned PosOpcode, unsigned NegOpcode,
- const SDLoc &DL);
- SDValue MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL);
+ SDValue InnerPos, SDValue InnerNeg, bool FromAdd,
+ bool HasPos, unsigned PosOpcode,
+ unsigned NegOpcode, const SDLoc &DL);
+ SDValue MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL,
+ bool FromAdd);
SDValue MatchLoadCombine(SDNode *N);
SDValue mergeTruncStores(StoreSDNode *N);
SDValue reduceLoadWidth(SDNode *N);
@@ -2992,6 +2993,9 @@ SDValue DAGCombiner::visitADD(SDNode *N) {
if (SDValue V = foldAddSubOfSignBit(N, DL, DAG))
return V;
+ if (SDValue V = MatchRotate(N0, N1, SDLoc(N), /*FromAdd=*/true))
+ return V;
+
// Try to match AVGFLOOR fixedwidth pattern
if (SDValue V = foldAddToAvg(N, DL))
return V;
@@ -8161,7 +8165,7 @@ SDValue DAGCombiner::visitOR(SDNode *N) {
return V;
// See if this is some rotate idiom.
- if (SDValue Rot = MatchRotate(N0, N1, DL))
+ if (SDValue Rot = MatchRotate(N0, N1, DL, /*FromAdd=*/false))
return Rot;
if (SDValue Load = MatchLoadCombine(N))
@@ -8350,7 +8354,7 @@ static SDValue extractShiftForRotate(SelectionDAG &DAG, SDValue OppShift,
// The IsRotate flag should be set when the LHS of both shifts is the same.
// Otherwise if matching a general funnel shift, it should be clear.
static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize,
- SelectionDAG &DAG, bool IsRotate) {
+ SelectionDAG &DAG, bool IsRotate, bool FromAdd) {
const auto &TLI = DAG.getTargetLoweringInfo();
// If EltSize is a power of 2 then:
//
@@ -8389,7 +8393,7 @@ static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize,
// NOTE: We can only do this when matching operations which won't modify the
// least Log2(EltSize) significant bits and not a general funnel shift.
unsigned MaskLoBits = 0;
- if (IsRotate && isPowerOf2_64(EltSize)) {
+ if (IsRotate && !FromAdd && isPowerOf2_64(EltSize)) {
unsigned Bits = Log2_64(EltSize);
unsigned NegBits = Neg.getScalarValueSizeInBits();
if (NegBits >= Bits) {
@@ -8472,9 +8476,9 @@ static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize,
// Neg with outer conversions stripped away.
SDValue DAGCombiner::MatchRotatePosNeg(SDValue Shifted, SDValue Pos,
SDValue Neg, SDValue InnerPos,
- SDValue InnerNeg, bool HasPos,
- unsigned PosOpcode, unsigned NegOpcode,
- const SDLoc &DL) {
+ SDValue InnerNeg, bool FromAdd,
+ bool HasPos, unsigned PosOpcode,
+ unsigned NegOpcode, const SDLoc &DL) {
// fold (or (shl x, (*ext y)),
// (srl x, (*ext (sub 32, y)))) ->
// (rotl x, y) or (rotr x, (sub 32, y))
@@ -8484,10 +8488,9 @@ SDValue DAGCombiner::MatchRotatePosNeg(SDValue Shifted, SDValue Pos,
// (rotr x, y) or (rotl x, (sub 32, y))
EVT VT = Shifted.getValueType();
if (matchRotateSub(InnerPos, InnerNeg, VT.getScalarSizeInBits(), DAG,
- /*IsRotate*/ true)) {
+ /*IsRotate*/ true, FromAdd))
return DAG.getNode(HasPos ? PosOpcode : NegOpcode, DL, VT, Shifted,
HasPos ? Pos : Neg);
- }
return SDValue();
}
@@ -8500,9 +8503,9 @@ SDValue DAGCombiner::MatchRotatePosNeg(SDValue Shifted, SDValue Pos,
// TODO: Merge with MatchRotatePosNeg.
SDValue DAGCombiner::MatchFunnelPosNeg(SDValue N0, SDValue N1, SDValue Pos,
SDValue Neg, SDValue InnerPos,
- SDValue InnerNeg, bool HasPos,
- unsigned PosOpcode, unsigned NegOpcode,
- const SDLoc &DL) {
+ SDValue InnerNeg, bool FromAdd,
+ bool HasPos, unsigned PosOpcode,
+ unsigned NegOpcode, const SDLoc &DL) {
EVT VT = N0.getValueType();
unsigned EltBits = VT.getScalarSizeInBits();
@@ -8513,10 +8516,10 @@ SDValue DAGCombiner::MatchFunnelPosNeg(SDValue N0, SDValue N1, SDValue Pos,
// fold (or (shl x0, (*ext (sub 32, y))),
// (srl x1, (*ext y))) ->
// (fshr x0, x1, y) or (fshl x0, x1, (sub 32, y))
- if (matchRotateSub(InnerPos, InnerNeg, EltBits, DAG, /*IsRotate*/ N0 == N1)) {
+ if (matchRotateSub(InnerPos, InnerNeg, EltBits, DAG, /*IsRotate*/ N0 == N1,
+ FromAdd))
return DAG.getNode(HasPos ? PosOpcode : NegOpcode, DL, VT, N0, N1,
HasPos ? Pos : Neg);
- }
// Matching the shift+xor cases, we can't easily use the xor'd shift amount
// so for now just use the PosOpcode case if its legal.
@@ -8561,11 +8564,12 @@ SDValue DAGCombiner::MatchFunnelPosNeg(SDValue N0, SDValue N1, SDValue Pos,
return SDValue();
}
-// MatchRotate - Handle an 'or' of two operands. If this is one of the many
-// idioms for rotate, and if the target supports rotation instructions, generate
-// a rot[lr]. This also matches funnel shift patterns, similar to rotation but
-// with different shifted sources.
-SDValue DAGCombiner::MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL) {
+// MatchRotate - Handle an 'or' or 'add' of two operands. If this is one of the
+// many idioms for rotate, and if the target supports rotation instructions,
+// generate a rot[lr]. This also matches funnel shift patterns, similar to
+// rotation but with different shifted sources.
+SDValue DAGCombiner::MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL,
+ bool FromAdd) {
EVT VT = LHS.getValueType();
// The target must have at least one rotate/funnel flavor.
@@ -8592,9 +8596,9 @@ SDValue DAGCombiner::MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL) {
if (LHS.getOpcode() == ISD::TRUNCATE && RHS.getOpcode() == ISD::TRUNCATE &&
LHS.getOperand(0).getValueType() == RHS.getOperand(0).getValueType()) {
assert(LHS.getValueType() == RHS.getValueType());
- if (SDValue Rot = MatchRotate(LHS.getOperand(0), RHS.getOperand(0), DL)) {
+ if (SDValue Rot =
+ MatchRotate(LHS.getOperand(0), RHS.getOperand(0), DL, FromAdd))
return DAG.getNode(ISD::TRUNCATE, SDLoc(LHS), LHS.getValueType(), Rot);
- }
}
// Match "(X shl/srl V1) & V2" where V2 may not be present.
@@ -8773,30 +8777,36 @@ SDValue DAGCombiner::MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL) {
RExtOp0 = RHSShiftAmt.getOperand(0);
}
+ // // If we are here from visitADD() we must ensure the Right-Shift Amt is
+ // // non-zero when the pattern includes AND op. So, allow optimizing to ROTL
+ // // only if it is recognized as a non-zero constant. Same for ROTR.
+ // auto RotateSafe = [FromAdd](const SDValue& ExtOp0) {
+ // if (!FromAdd || ExtOp0.getOpcode() != ISD::AND)
+ // return true;
+ // auto *ExtOp0Const = dyn_cast<ConstantSDNode>(ExtOp0);
+ // return ExtOp0Const && !ExtOp0Const->isZero();
+ // };
+
if (IsRotate && (HasROTL || HasROTR)) {
- SDValue TryL =
- MatchRotatePosNeg(LHSShiftArg, LHSShiftAmt, RHSShiftAmt, LExtOp0,
- RExtOp0, HasROTL, ISD::ROTL, ISD::ROTR, DL);
- if (TryL)
+ if (SDValue TryL = MatchRotatePosNeg(LHSShiftArg, LHSShiftAmt, RHSShiftAmt,
+ LExtOp0, RExtOp0, FromAdd, HasROTL,
+ ISD::ROTL, ISD::ROTR, DL))
return TryL;
- SDValue TryR =
- MatchRotatePosNeg(RHSShiftArg, RHSShiftAmt, LHSShiftAmt, RExtOp0,
- LExtOp0, HasROTR, ISD::ROTR, ISD::ROTL, DL);
- if (TryR)
+ if (SDValue TryR = MatchRotatePosNeg(RHSShiftArg, RHSShiftAmt, LHSShiftAmt,
+ RExtOp0, LExtOp0, FromAdd, HasROTR,
+ ISD::ROTR, ISD::ROTL, DL))
return TryR;
}
- SDValue TryL =
- MatchFunnelPosNeg(LHSShiftArg, RHSShiftArg, LHSShiftAmt, RHSShiftAmt,
- LExtOp0, RExtOp0, HasFSHL, ISD::FSHL, ISD::FSHR, DL);
- if (TryL)
+ if (SDValue TryL = MatchFunnelPosNeg(LHSShiftArg, RHSShiftArg, LHSShiftAmt,
+ RHSShiftAmt, LExtOp0, RExtOp0, FromAdd,
+ HasFSHL, ISD::FSHL, ISD::FSHR, DL))
return TryL;
- SDValue TryR =
- MatchFunnelPosNeg(LHSShiftArg, RHSShiftArg, RHSShiftAmt, LHSShiftAmt,
- RExtOp0, LExtOp0, HasFSHR, ISD::FSHR, ISD::FSHL, DL);
- if (TryR)
+ if (SDValue TryR = MatchFunnelPosNeg(LHSShiftArg, RHSShiftArg, RHSShiftAmt,
+ LHSShiftAmt, RExtOp0, LExtOp0, FromAdd,
+ HasFSHR, ISD::FSHR, ISD::FSHL, DL))
return TryR;
return SDValue();
diff --git a/llvm/test/CodeGen/NVPTX/add-rotate.ll b/llvm/test/CodeGen/NVPTX/add-rotate.ll
new file mode 100644
index 000000000000000..f777ac964154343
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/add-rotate.ll
@@ -0,0 +1,118 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_50 | FileCheck %s
+
+target triple = "nvptx64-nvidia-cuda"
+
+define i32 @test_rotl(i32 %x) {
+; CHECK-LABEL: test_rotl(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.u32 %r1, [test_rotl_param_0];
+; CHECK-NEXT: shf.l.wrap.b32 %r2, %r1, %r1, 7;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT: ret;
+ %shl = shl i32 %x, 7
+ %shr = lshr i32 %x, 25
+ %add = add i32 %shl, %shr
+ ret i32 %add
+}
+
+define i32 @test_rotr(i32 %x) {
+; CHECK-LABEL: test_rotr(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.u32 %r1, [test_rotr_param_0];
+; CHECK-NEXT: shf.l.wrap.b32 %r2, %r1, %r1, 25;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT: ret;
+ %shr = lshr i32 %x, 7
+ %shl = shl i32 %x, 25
+ %add = add i32 %shr, %shl
+ ret i32 %add
+}
+
+define i32 @test_rotl_var(i32 %x, i32 %y) {
+; CHECK-LABEL: test_rotl_var(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.u32 %r1, [test_rotl_var_param_0];
+; CHECK-NEXT: ld.param.u32 %r2, [test_rotl_var_param_1];
+; CHECK-NEXT: shf.l.wrap.b32 %r3, %r1, %r1, %r2;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT: ret;
+ %shl = shl i32 %x, %y
+ %sub = sub i32 32, %y
+ %shr = lshr i32 %x, %sub
+ %add = add i32 %shl, %shr
+ ret i32 %add
+}
+
+define i32 @test_rotr_var(i32 %x, i32 %y) {
+; CHECK-LABEL: test_rotr_var(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.u32 %r1, [test_rotr_var_param_0];
+; CHECK-NEXT: ld.param.u32 %r2, [test_rotr_var_param_1];
+; CHECK-NEXT: shf.r.wrap.b32 %r3, %r1, %r1, %r2;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT: ret;
+ %shr = lshr i32 %x, %y
+ %sub = sub i32 32, %y
+ %shl = shl i32 %x, %sub
+ %add = add i32 %shr, %shl
+ ret i32 %add
+}
+
+define i32 @test_rotl_var_and(i32 %x, i32 %y) {
+; CHECK-LABEL: test_rotl_var_and(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<8>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.u32 %r1, [test_rotl_var_and_param_0];
+; CHECK-NEXT: ld.param.u32 %r2, [test_rotl_var_and_param_1];
+; CHECK-NEXT: shl.b32 %r3, %r1, %r2;
+; CHECK-NEXT: neg.s32 %r4, %r2;
+; CHECK-NEXT: and.b32 %r5, %r4, 31;
+; CHECK-NEXT: shr.u32 %r6, %r1, %r5;
+; CHECK-NEXT: add.s32 %r7, %r6, %r3;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r7;
+; CHECK-NEXT: ret;
+ %shr = shl i32 %x, %y
+ %sub = sub nsw i32 0, %y
+ %and = and i32 %sub, 31
+ %shl = lshr i32 %x, %and
+ %add = add i32 %shl, %shr
+ ret i32 %add
+}
+
+define i32 @test_rotr_var_and(i32 %x, i32 %y) {
+; CHECK-LABEL: test_rotr_var_and(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<8>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.u32 %r1, [test_rotr_var_and_param_0];
+; CHECK-NEXT: ld.param.u32 %r2, [test_rotr_var_and_param_1];
+; CHECK-NEXT: shr.u32 %r3, %r1, %r2;
+; CHECK-NEXT: neg.s32 %r4, %r2;
+; CHECK-NEXT: and.b32 %r5, %r4, 31;
+; CHECK-NEXT: shl.b32 %r6, %r1, %r5;
+; CHECK-NEXT: add.s32 %r7, %r3, %r6;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r7;
+; CHECK-NEXT: ret;
+ %shr = lshr i32 %x, %y
+ %sub = sub nsw i32 0, %y
+ %and = and i32 %sub, 31
+ %shl = shl i32 %x, %and
+ %add = add i32 %shr, %shl
+ ret i32 %add
+}
|
Thanks @AlexMaclean. We had a situation downstream which needed this transformation at SelectionDAG level too. I just wanted to mention that I tried following, similar to your change at IR level in 59ced72bc211f (using the pattern matching framework at SD level).
This then triggers the 'add' to 'or' transformation in DAGCombiner, which in turn helps in the selection of the rotate instruction. |
31bc049
to
e128108
Compare
Nice, the SDAG matching is new to me and does seem quite useful. However, I think at the SelectionDAG level it makes more sense to fold 'add' to rotate directly, reusing the existing rotate matching code which looks for many different rotate idioms beyond just this simple case (ex. https://github.com/llvm/llvm-project/pull/125612/files#diff-7cb0d03bd314924f121a96a010f963383fe24079e25c71399b68ba979e8453abR140). |
Can we update comments to show the
|
I've updated all comments to reflect the add case as well. |
please can we get some alive2 coverage for these? I'd still be a lot more comfortable with getting the new tests committed with existing codegen and see how far the haveNoCommonBitsSetCommutative improvements would get us tbh. |
Here are alive2 proofs for the cases this PR adds: https://alive2.llvm.org/ce/z/yCPZtE Adding the support in |
982d35b
to
7d01ea5
Compare
@RKSimon ping on this? |
@@ -8379,7 +8383,7 @@ static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize, | |||
// NOTE: We can only do this when matching operations which won't modify the | |||
// least Log2(EltSize) significant bits and not a general funnel shift. | |||
unsigned MaskLoBits = 0; | |||
if (IsRotate && isPowerOf2_64(EltSize)) { | |||
if (IsRotate && !FromAdd && isPowerOf2_64(EltSize)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it incorrect to do this for add
? Or do we just not want to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is incorrect to do this for an ADD or a non-rotate funnel-shift because when the shift amount is 0 the LHS and RHS will be combined with each other. If they are equal (IsRotate) and the combining operation is an OR (!FromAdd) than this will be a no-op (LHS = RHS = Result) which is consistent with a funnel-shift of 0. If either of these conditions are not met the then they cannot be safely combined like this for the 0 case.
@@ -8340,7 +8344,7 @@ static SDValue extractShiftForRotate(SelectionDAG &DAG, SDValue OppShift, | |||
// The IsRotate flag should be set when the LHS of both shifts is the same. | |||
// Otherwise if matching a general funnel shift, it should be clear. | |||
static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize, | |
static bool matchRotateSubAdd(SDValue Pos, SDValue Neg, unsigned EltSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this rename makes sense. The Sub
here reflects how the shift amounts are related to each-other not something about the combining operation. The fact that we can now handle ADD cases in this function doesn't mean we should have an Add
suffix on the function, just as the fact that we already handle OR cases doesn't mean we should have an Or
suffix.
Hi All, If any of the reviewers have some spare time I'd really appreciate another round of review on this change. I've addressed all the specific stylistic issues raised. It seems like the main hesitation around this change might be that it can be a little hard to tell if all the rotate folds are going to be correct for an ADD node. I'm guessing this is what @RKSimon's comment (#125612 (comment)) is getting at. However, I think more trivially correct solutions are a bit less efficient and will require lots of logic duplication to match the full set of transforms this approach allows. To try to address this concern, I've written up a set of alive2 proofs that demonstrate the transforms that could occur (https://alive2.llvm.org/ce/z/yCPZtE). Further, for what it's worth, this change has been present in the internal fork of llvm used by NVIDIA for a fairly long time and we haven't noticed any correctness issues. If there are edge cases I'm missing though, I'll be happy to revert the change ASAP. Are there any further objections on these grounds? Is there anything I can do to assuage them? Beyond this, I don't think there are any open issues or blockers for this change. Also, if there is anyone else anyone can think who should take a look at this change please add them or let me know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in principle, but please wait if someone else wants to chime in.
If there's no objection, I'd suggest landing the patch early next week and watch the bots in case something unexpected pops up.
I'm not sure if there are existing tests covering add/shift on other platforms. Given that there are no changes in existing tests that code path may not have enough coverage. It may be useful to add the tests for NVPTX and some common targets first, so we can observe the effects of this patch. It's much easier to reason about the impact of the patch when the test diffs show what changed, as opposed to the new state of affairs only.
I've per-committed the tests within this PR so c9d6c84 shows only the diff associated with the code change: c9d6c84#diff-7cb0d03bd314924f121a96a010f963383fe24079e25c71399b68ba979e8453ab. Are you asking that I land the pre-commit on main as a separate commit? Happy to do so if this would help. I can also add this same IR with a different triple to demonstrate the impacts on other targets if there particular back-ends people would like to see coverage of? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I am still worried that we're possibly missing a corner case with the ADD nodes, but I can't see anything. I'd have still preferred we tried haveNoCommonBitsSet first tbh as although there's duplication, it's a lot more self contained.
Not quite. Or, rather, that, and a bit more. We're concerned about corner cases that may be exposed by the change, and could benefit for wider testing. For testing to be useful we either need confidence that the tests themselves are correct, and it helps to have an established known good baseline. For this particular mix of instructions we appear to have no tests at the moment, but we do know that the current code in LLVM has been around for a while and is likely correct. What I'm proposing is to create a set of tests capturing the status quo, across multiple "important" architectures. I would target x86, arm, amdgpu and nvptx. Then land your patch, and see what changes in those baseline tests and verify that the changes do work and make sense. It's not, strictly speaking, necessary to do it that way, but it's a relatively easy way to capture potential issues early. |
7d01ea5
to
3bcf779
Compare
Sounds good, I've opened #132842 to cover the backends you've suggested, please let me know what you think! I've also rebased this PR atop that branch so you can see the impact this change would have across those targets. |
For some reason I see only complete test files added. It looks like rebase didn't make it. |
The diff on the tests should be viewable within this commit 0f8aef9. Does that work? Once I land #132842 the diff will be apparent in the main PR diff. |
Yes, I can see test diffs now. That looks like a very nice improvement across all tested platforms. |
Add tests to various targets covering rotate idioms where an 'ADD' node is used to combine the halves instead of an 'OR'. Some of these cases will be better optimized following #125612, while others are already well optimized or do not have a valid fold to a rotate or funnel-shift.
Almost all of the rotate idioms that are valid for an 'or' are also valid when the halves are combined with an 'add'. Further, many of these cases are not handled by common bits tracking meaning that the 'add' is not converted to a 'disjoint or'.