Skip to content

[NVPTX] Fix lowering of i1 SETCC #115035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18763,9 +18763,13 @@ SDValue DAGCombiner::rebuildSetCC(SDValue N) {
EVT SetCCVT = N.getValueType();
if (LegalTypes)
SetCCVT = getSetCCResultType(SetCCVT);
// Replace the uses of XOR with SETCC
return DAG.getSetCC(SDLoc(N), SetCCVT, Op0, Op1,
Equal ? ISD::SETEQ : ISD::SETNE);
// Replace the uses of XOR with SETCC. Note, avoid this transformation if
// it would introduce illegal operations post-legalization as this can
// result in infinite looping between converting xor->setcc here, and
// expanding setcc->xor in LegalizeSetCCCondCode if requested.
const ISD::CondCode CC = Equal ? ISD::SETEQ : ISD::SETNE;
if (!LegalOperations || TLI.isCondCodeLegal(CC, Op0.getSimpleValueType()))
return DAG.getSetCC(SDLoc(N), SetCCVT, Op0, Op1, CC);
}
}

Expand Down
41 changes: 41 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11889,6 +11889,47 @@ bool TargetLowering::LegalizeSetCCCondCode(SelectionDAG &DAG, EVT VT,
return true;
}

// Special case: expand i1 comparisons using logical operations.
if (OpVT == MVT::i1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need the type special case? LegalizeSetCCCondCode should do as asked regardless of the type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need the type special case?

The logical simplifications we're using to expand setcc for i1 will not be correct for larger types.

LegalizeSetCCCondCode should do as asked regardless of the type

In theory yes, but this isn't the case today, and creating a valid expansion for all types x cond-codes would be a time consuming way to add a lot of in practice dead code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a more specific example for "not correct for larger types"? It's not an arbitrary wide typed value. The result conforms to getTargetBooleanContents, which in turn should respect all of the logical expansions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these expansions take advantage of the unique nature of the i1 type. For example consider signed greater than, this will only be true for i1s if and only if the first operand is 0 and the second is -1, hence it can be expressed as ~X & Y. This sort of bit manipulation is of course totally invalid for any larger type.

To see that this transformation is okay for i1, and not for a larger type here is an alive2 where the transform has been expressed on llvm IR: https://alive2.llvm.org/ce/z/6jS_rA

SDValue Ret;
switch (CCCode) {
default:
llvm_unreachable("Unknown integer setcc!");
case ISD::SETEQ: // X == Y --> ~(X ^ Y)
Ret = DAG.getNOT(dl, DAG.getNode(ISD::XOR, dl, MVT::i1, LHS, RHS),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be guarded by TLI.isLegal(ISD::XOR)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to. NVPTX is the only target using these expansions and we don't have any way to expand if not. I don't see what the point of adding them would be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question boils down to: will xor be legalized if it's generated for a target that doesn't support it? In other words, can Expand generate illegal code?

My assumption was no, but based on your answer, I'm gathering that Expand can generate illegal code and rely on the legalizer to legalize it. Is that correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue here is that there's more than one possible interpretation of "lower SetCC(i1) differently". NVPTX can do it via logical ops on predicates, other targets may not. On other targets it may be something else. Whoever would use "setCondCodeAction(...,Expand)" would have no idea what it would expand into. This should at least be documented. If/when we would encounter a target which would have a different way of expanding setcc, we would probably need a more nuanced hint to tell LLVM which flavor we actually want.

I think at the moment only Hexagon uses setCondCodeAction(...,i1, Expand). It would be good to check what effect this change has on them.

MVT::i1);
break;
case ISD::SETNE: // X != Y --> (X ^ Y)
Ret = DAG.getNode(ISD::XOR, dl, MVT::i1, LHS, RHS);
break;
case ISD::SETGT: // X >s Y --> X == 0 & Y == 1 --> ~X & Y
case ISD::SETULT: // X <u Y --> X == 0 & Y == 1 --> ~X & Y
Ret = DAG.getNode(ISD::AND, dl, MVT::i1, RHS,
DAG.getNOT(dl, LHS, MVT::i1));
break;
case ISD::SETLT: // X <s Y --> X == 1 & Y == 0 --> ~Y & X
case ISD::SETUGT: // X >u Y --> X == 1 & Y == 0 --> ~Y & X
Ret = DAG.getNode(ISD::AND, dl, MVT::i1, LHS,
DAG.getNOT(dl, RHS, MVT::i1));
break;
case ISD::SETULE: // X <=u Y --> X == 0 | Y == 1 --> ~X | Y
case ISD::SETGE: // X >=s Y --> X == 0 | Y == 1 --> ~X | Y
Ret = DAG.getNode(ISD::OR, dl, MVT::i1, RHS,
DAG.getNOT(dl, LHS, MVT::i1));
break;
case ISD::SETUGE: // X >=u Y --> X == 1 | Y == 0 --> ~Y | X
case ISD::SETLE: // X <=s Y --> X == 1 | Y == 0 --> ~Y | X
Ret = DAG.getNode(ISD::OR, dl, MVT::i1, LHS,
DAG.getNOT(dl, RHS, MVT::i1));
break;
}

LHS = DAG.getZExtOrTrunc(Ret, dl, VT);
RHS = SDValue();
CC = SDValue();
return true;
}

ISD::CondCode CC1 = ISD::SETCC_INVALID, CC2 = ISD::SETCC_INVALID;
unsigned Opc = 0;
switch (CCCode) {
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -667,6 +667,11 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
setTruncStoreAction(VT, MVT::i1, Expand);
}

setCondCodeAction({ISD::SETNE, ISD::SETEQ, ISD::SETUGE, ISD::SETULE,
ISD::SETUGT, ISD::SETULT, ISD::SETGT, ISD::SETLT,
ISD::SETGE, ISD::SETLE},
MVT::i1, Expand);

// expand extload of vector of integers.
setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, MVT::v2i16,
MVT::v2i8, Expand);
Expand Down
11 changes: 0 additions & 11 deletions llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -2120,17 +2120,6 @@ defm : ISET_FORMAT_UNSIGNED<setule, CmpLE>;
defm : ISET_FORMAT_UNSIGNED<setueq, CmpEQ>;
defm : ISET_FORMAT_UNSIGNED<setune, CmpNE>;

// i1 compares
def : Pat<(setne Int1Regs:$a, Int1Regs:$b),
(XORb1rr Int1Regs:$a, Int1Regs:$b)>;
def : Pat<(setune Int1Regs:$a, Int1Regs:$b),
(XORb1rr Int1Regs:$a, Int1Regs:$b)>;

def : Pat<(seteq Int1Regs:$a, Int1Regs:$b),
(NOT1 (XORb1rr Int1Regs:$a, Int1Regs:$b))>;
def : Pat<(setueq Int1Regs:$a, Int1Regs:$b),
(NOT1 (XORb1rr Int1Regs:$a, Int1Regs:$b))>;

// comparisons of i8 extracted with BFE as i32
// It's faster to do comparison directly on i32 extracted by BFE,
// instead of the long conversion and sign extending.
Expand Down
Loading
Loading