Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAG] Support saturated truncate #99418

Merged
merged 6 commits into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions llvm/include/llvm/CodeGen/ISDOpcodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -814,6 +814,26 @@ enum NodeType {

/// TRUNCATE - Completely drop the high bits.
TRUNCATE,
/// TRUNCATE_[SU]SAT_[SU] - Truncate for saturated operand
/// [SU] located in middle, prefix for `SAT` means indicates whether
/// existing truncate target was a signed operation. For examples,
/// If `truncate(smin(smax(x, C), C))` was saturated then become `S`.
/// If `truncate(umin(x, C))` was saturated then become `U`.
/// [SU] located in last indicates whether range of truncated values is
/// sign-saturated. For example, if `truncate(smin(smax(x, C), C))` is a
/// truncation to `i8`, then if value of C ranges from `-128 to 127`, it will
/// be saturated against signed values, resulting in `S`, which will combine
/// to `TRUNCATE_SSAT_S`. If the value of C ranges from `0 to 255`, it will
/// be saturated against unsigned values, resulting in `U`, which will
/// combine to `TRUNATE_SSAT_U`. Similarly, in `truncate(umin(x, C))`, if
/// value of C ranges from `0 to 255`, it becomes `U` because it is saturated
/// for unsigned values. As a result, it combines to `TRUNCATE_USAT_U`.
TRUNCATE_SSAT_S, // saturate signed input to signed result -
// truncate(smin(smax(x, C), C))
TRUNCATE_SSAT_U, // saturate signed input to unsigned result -
// truncate(smin(smax(x, 0), C))
TRUNCATE_USAT_U, // saturate unsigned input to unsigned result -
// truncate(umin(x, C))

/// [SU]INT_TO_FP - These operators convert integers (whose interpreted sign
/// depends on the first letter) to floating point.
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/Target/TargetSelectionDAG.td
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,9 @@ def sext : SDNode<"ISD::SIGN_EXTEND", SDTIntExtendOp>;
def zext : SDNode<"ISD::ZERO_EXTEND", SDTIntExtendOp>;
def anyext : SDNode<"ISD::ANY_EXTEND" , SDTIntExtendOp>;
def trunc : SDNode<"ISD::TRUNCATE" , SDTIntTruncOp>;
def truncssat_s : SDNode<"ISD::TRUNCATE_SSAT_S", SDTIntTruncOp>;
def truncssat_u : SDNode<"ISD::TRUNCATE_SSAT_U", SDTIntTruncOp>;
def truncusat_u : SDNode<"ISD::TRUNCATE_USAT_U", SDTIntTruncOp>;
def bitconvert : SDNode<"ISD::BITCAST" , SDTUnaryOp>;
def addrspacecast : SDNode<"ISD::ADDRSPACECAST", SDTUnaryOp>;
def freeze : SDNode<"ISD::FREEZE" , SDTFreeze>;
Expand Down
136 changes: 135 additions & 1 deletion llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,7 @@ namespace {
SDValue visitSIGN_EXTEND_INREG(SDNode *N);
SDValue visitEXTEND_VECTOR_INREG(SDNode *N);
SDValue visitTRUNCATE(SDNode *N);
SDValue visitTRUNCATE_USAT_U(SDNode *N);
SDValue visitBITCAST(SDNode *N);
SDValue visitFREEZE(SDNode *N);
SDValue visitBUILD_PAIR(SDNode *N);
Expand Down Expand Up @@ -1908,6 +1909,7 @@ SDValue DAGCombiner::visit(SDNode *N) {
case ISD::ZERO_EXTEND_VECTOR_INREG:
case ISD::ANY_EXTEND_VECTOR_INREG: return visitEXTEND_VECTOR_INREG(N);
case ISD::TRUNCATE: return visitTRUNCATE(N);
case ISD::TRUNCATE_USAT_U: return visitTRUNCATE_USAT_U(N);
case ISD::BITCAST: return visitBITCAST(N);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called visitTRUNCATE_SAT_U? Or should we just have a visitTRUNCATE_SAT call and handle the unsigned cases inside it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we not mean to change this to just case ISD::TRUNCATE_USAT_U: return visitTRUNCATE_USAT(N);, without the TRUNCATE_SSAT_U case? (Sorry if I missed that). Otherwise it will change TRUNCATE_SSAT_U(FP_TO_UINT(x)) to FP_TO_UINT_SAT(x), which will not clamp to the same bounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visitTRUNCATE_USAT has a small task, so I don't see the need to separate SSAT and USAT.
Is it LLVM's way to reduce unnecessary separation and separate them if they are reasonably large?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can generalize it later if the need arises - but we need to confirm if we should be handling the TRUNCATE_SSAT_U case or not (do we have test coverage?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, then we don't have a test for truncate_ssat_u. As @davemgreen commented, it is right to change to call visitTRUNCATE_USAT() in case TRUNCATE_USAT_U.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I change it so that only case ISD::TRUNCATE_USAT_U: return visitTRUNCATE_USAT(N); remains?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Perhaps call it visitTRUNCATE_USAT_U too?

case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);
case ISD::FADD: return visitFADD(N);
Expand Down Expand Up @@ -13203,7 +13205,9 @@ SDValue DAGCombiner::matchVSelectOpSizesWithSetCC(SDNode *Cast) {
unsigned CastOpcode = Cast->getOpcode();
assert((CastOpcode == ISD::SIGN_EXTEND || CastOpcode == ISD::ZERO_EXTEND ||
CastOpcode == ISD::TRUNCATE || CastOpcode == ISD::FP_EXTEND ||
CastOpcode == ISD::FP_ROUND) &&
CastOpcode == ISD::TRUNCATE_SSAT_S ||
CastOpcode == ISD::TRUNCATE_SSAT_U ||
CastOpcode == ISD::TRUNCATE_USAT_U || CastOpcode == ISD::FP_ROUND) &&
"Unexpected opcode for vector select narrowing/widening");

// We only do this transform before legal ops because the pattern may be
Expand Down Expand Up @@ -14915,6 +14919,132 @@ SDValue DAGCombiner::visitEXTEND_VECTOR_INREG(SDNode *N) {
return SDValue();
}

SDValue DAGCombiner::visitTRUNCATE_USAT_U(SDNode *N) {
EVT VT = N->getValueType(0);
SDValue N0 = N->getOperand(0);

std::function<SDValue(SDValue)> MatchFPTOINT = [&](SDValue Val) -> SDValue {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to check the other operand of this SMAX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, missed it. I'll fix it.

if (Val.getOpcode() == ISD::FP_TO_UINT)
davemgreen marked this conversation as resolved.
Show resolved Hide resolved
return Val;
return SDValue();
};

SDValue FPInstr = MatchFPTOINT(N0);
if (!FPInstr)
return SDValue();

EVT FPVT = FPInstr.getOperand(0).getValueType();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return DAG.getNode... no need for temporary variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I can do it. could you give me little advise please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@topperc how about now? do you think I'm doing properly?

if (!DAG.getTargetLoweringInfo().shouldConvertFpToSat(ISD::FP_TO_UINT_SAT,
FPVT, VT))
return SDValue();
return DAG.getNode(ISD::FP_TO_UINT_SAT, SDLoc(FPInstr), VT,
FPInstr.getOperand(0),
DAG.getValueType(VT.getScalarType()));
davemgreen marked this conversation as resolved.
Show resolved Hide resolved
}

/// Detect patterns of truncation with unsigned saturation:
///
/// (truncate (umin (x, unsigned_max_of_dest_type)) to dest_type).
/// Return the source value x to be truncated or SDValue() if the pattern was
/// not matched.
///
static SDValue detectUSatUPattern(SDValue In, EVT VT) {
unsigned NumDstBits = VT.getScalarSizeInBits();
unsigned NumSrcBits = In.getScalarValueSizeInBits();
// Saturation with truncation. We truncate from InVT to VT.
assert(NumSrcBits > NumDstBits && "Unexpected types for truncate operation");

davemgreen marked this conversation as resolved.
Show resolved Hide resolved
SDValue Min;
davemgreen marked this conversation as resolved.
Show resolved Hide resolved
APInt UnsignedMax = APInt::getMaxValue(NumDstBits).zext(NumSrcBits);
if (sd_match(In, m_UMin(m_Value(Min), m_SpecificInt(UnsignedMax))))
return Min;

return SDValue();
}

/// Detect patterns of truncation with signed saturation:
/// (truncate (smin (smax (x, signed_min_of_dest_type),
/// signed_max_of_dest_type)) to dest_type)
/// or:
/// (truncate (smax (smin (x, signed_max_of_dest_type),
/// signed_min_of_dest_type)) to dest_type).
///
RKSimon marked this conversation as resolved.
Show resolved Hide resolved
/// Return the source value to be truncated or SDValue() if the pattern was not
/// matched.
static SDValue detectSSatSPattern(SDValue In, EVT VT) {
unsigned NumDstBits = VT.getScalarSizeInBits();
unsigned NumSrcBits = In.getScalarValueSizeInBits();
// Saturation with truncation. We truncate from InVT to VT.
assert(NumSrcBits > NumDstBits && "Unexpected types for truncate operation");

SDValue Val;
APInt SignedMax = APInt::getSignedMaxValue(NumDstBits).sext(NumSrcBits);
APInt SignedMin = APInt::getSignedMinValue(NumDstBits).sext(NumSrcBits);

if (sd_match(In, m_SMin(m_SMax(m_Value(Val), m_SpecificInt(SignedMin)),
m_SpecificInt(SignedMax))))
return Val;

if (sd_match(In, m_SMax(m_SMin(m_Value(Val), m_SpecificInt(SignedMax)),
davemgreen marked this conversation as resolved.
Show resolved Hide resolved
m_SpecificInt(SignedMin))))
return Val;
davemgreen marked this conversation as resolved.
Show resolved Hide resolved

return SDValue();
}

/// Detect patterns of truncation with unsigned saturation:
static SDValue detectSSatUPattern(SDValue In, EVT VT, SelectionDAG &DAG,
const SDLoc &DL) {
unsigned NumDstBits = VT.getScalarSizeInBits();
unsigned NumSrcBits = In.getScalarValueSizeInBits();
// Saturation with truncation. We truncate from InVT to VT.
assert(NumSrcBits > NumDstBits && "Unexpected types for truncate operation");

SDValue Val;
APInt UnsignedMax = APInt::getMaxValue(NumDstBits).zext(NumSrcBits);
// Min == 0, Max is unsigned max of destination type.
if (sd_match(In, m_SMax(m_SMin(m_Value(Val), m_SpecificInt(UnsignedMax)),
m_Zero())))
return Val;

if (sd_match(In, m_SMin(m_SMax(m_Value(Val), m_Zero()),
m_SpecificInt(UnsignedMax))))
RKSimon marked this conversation as resolved.
Show resolved Hide resolved
return Val;

if (sd_match(In, m_UMin(m_SMax(m_Value(Val), m_Zero()),
m_SpecificInt(UnsignedMax))))
return Val;

return SDValue();
}

static SDValue foldToSaturated(SDNode *N, EVT &VT, SDValue &Src, EVT &SrcVT,
SDLoc &DL, const TargetLowering &TLI,
SelectionDAG &DAG) {
auto AllowedTruncateSat = [&](unsigned Opc, EVT SrcVT, EVT VT) -> bool {
return (TLI.isOperationLegalOrCustom(Opc, SrcVT) &&
TLI.isTypeDesirableForOp(Opc, VT));
};

if (Src.getOpcode() == ISD::SMIN || Src.getOpcode() == ISD::SMAX) {
if (AllowedTruncateSat(ISD::TRUNCATE_SSAT_S, SrcVT, VT))
if (SDValue SSatVal = detectSSatSPattern(Src, VT))
return DAG.getNode(ISD::TRUNCATE_SSAT_S, DL, VT, SSatVal);
if (AllowedTruncateSat(ISD::TRUNCATE_SSAT_U, SrcVT, VT))
if (SDValue SSatVal = detectSSatUPattern(Src, VT, DAG, DL))
return DAG.getNode(ISD::TRUNCATE_SSAT_U, DL, VT, SSatVal);
} else if (Src.getOpcode() == ISD::UMIN) {
if (AllowedTruncateSat(ISD::TRUNCATE_SSAT_U, SrcVT, VT))
if (SDValue SSatVal = detectSSatUPattern(Src, VT, DAG, DL))
return DAG.getNode(ISD::TRUNCATE_SSAT_U, DL, VT, SSatVal);
davemgreen marked this conversation as resolved.
Show resolved Hide resolved
if (AllowedTruncateSat(ISD::TRUNCATE_USAT_U, SrcVT, VT))
if (SDValue USatVal = detectUSatUPattern(Src, VT))
return DAG.getNode(ISD::TRUNCATE_USAT_U, DL, VT, USatVal);
}

return SDValue();
}

SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
davemgreen marked this conversation as resolved.
Show resolved Hide resolved
SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);
Expand All @@ -14930,6 +15060,10 @@ SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
if (N0.getOpcode() == ISD::TRUNCATE)
return DAG.getNode(ISD::TRUNCATE, DL, VT, N0.getOperand(0));

// fold saturated truncate
if (SDValue SaturatedTR = foldToSaturated(N, VT, N0, SrcVT, DL, TLI, DAG))
RKSimon marked this conversation as resolved.
Show resolved Hide resolved
return SaturatedTR;

// fold (truncate c1) -> c1
if (SDValue C = DAG.FoldConstantArithmetic(ISD::TRUNCATE, DL, VT, {N0}))
return C;
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,9 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
case ISD::SIGN_EXTEND_VECTOR_INREG: return "sign_extend_vector_inreg";
case ISD::ZERO_EXTEND_VECTOR_INREG: return "zero_extend_vector_inreg";
case ISD::TRUNCATE: return "truncate";
case ISD::TRUNCATE_SSAT_S: return "truncate_ssat_s";
case ISD::TRUNCATE_SSAT_U: return "truncate_ssat_u";
case ISD::TRUNCATE_USAT_U: return "truncate_usat_u";
case ISD::FP_ROUND: return "fp_round";
case ISD::STRICT_FP_ROUND: return "strict_fp_round";
case ISD::FP_EXTEND: return "fp_extend";
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/CodeGen/TargetLoweringBase.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,11 @@ void TargetLoweringBase::initActions() {
// Absolute difference
setOperationAction({ISD::ABDS, ISD::ABDU}, VT, Expand);

// Saturated trunc
setOperationAction(ISD::TRUNCATE_SSAT_S, VT, Expand);
setOperationAction(ISD::TRUNCATE_SSAT_U, VT, Expand);
setOperationAction(ISD::TRUNCATE_USAT_U, VT, Expand);

// These default to Expand so they will be expanded to CTLZ/CTTZ by default.
setOperationAction({ISD::CTLZ_ZERO_UNDEF, ISD::CTTZ_ZERO_UNDEF}, VT,
Expand);
Expand Down
18 changes: 18 additions & 0 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1410,6 +1410,12 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
}
}

for (MVT VT : {MVT::v8i16, MVT::v4i32, MVT::v2i64}) {
davemgreen marked this conversation as resolved.
Show resolved Hide resolved
setOperationAction(ISD::TRUNCATE_SSAT_S, VT, Legal);
davemgreen marked this conversation as resolved.
Show resolved Hide resolved
setOperationAction(ISD::TRUNCATE_SSAT_U, VT, Legal);
setOperationAction(ISD::TRUNCATE_USAT_U, VT, Legal);
}

if (Subtarget->hasSME()) {
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);
}
Expand Down Expand Up @@ -28730,6 +28736,18 @@ bool AArch64TargetLowering::hasInlineStackProbe(
MF.getInfo<AArch64FunctionInfo>()->hasStackProbing();
}

bool AArch64TargetLowering::isTypeDesirableForOp(unsigned Opc, EVT VT) const {
switch (Opc) {
case ISD::TRUNCATE_SSAT_S:
case ISD::TRUNCATE_SSAT_U:
case ISD::TRUNCATE_USAT_U:
if (VT == MVT::v8i8 || VT == MVT::v4i16 || VT == MVT::v2i32)
return true;
}

return TargetLowering::isTypeDesirableForOp(Opc, VT);
}

#ifndef NDEBUG
void AArch64TargetLowering::verifyTargetSDNode(const SDNode *N) const {
switch (N->getOpcode()) {
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -743,6 +743,11 @@ class AArch64TargetLowering : public TargetLowering {
bool generateFMAsInMachineCombiner(EVT VT,
CodeGenOptLevel OptLevel) const override;

/// Return true if the target has native support for
/// the specified value type and it is 'desirable' to use the type for the
/// given node type.
bool isTypeDesirableForOp(unsigned Opc, EVT VT) const override;

const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;
ArrayRef<MCPhysReg> getRoundingControlRegisters() const override;

Expand Down
83 changes: 47 additions & 36 deletions llvm/lib/Target/AArch64/AArch64InstrInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -5418,64 +5418,75 @@ def VImm7FFF: PatLeaf<(AArch64movi_msl (i32 127), (i32 264))>;
def VImm8000: PatLeaf<(AArch64mvni_msl (i32 127), (i32 264))>;

// trunc(umin(X, 255)) -> UQXTRN v8i8
def : Pat<(v8i8 (trunc (umin (v8i16 V128:$Vn), (v8i16 VImmFF)))),
def : Pat<(v8i8 (truncusat_u (v8i16 V128:$Vn))),
(UQXTNv8i8 V128:$Vn)>;
// trunc(umin(X, 65535)) -> UQXTRN v4i16
def : Pat<(v4i16 (trunc (umin (v4i32 V128:$Vn), (v4i32 VImmFFFF)))),
def : Pat<(v4i16 (truncusat_u (v4i32 V128:$Vn))),
(UQXTNv4i16 V128:$Vn)>;
// trunc(umin(X, 4294967295)) -> UQXTRN v2i32
def : Pat<(v2i32 (truncusat_u (v2i64 V128:$Vn))),
(UQXTNv2i32 V128:$Vn)>;
// trunc(smin(smax(X, -128), 128)) -> SQXTRN
// with reversed min/max
def : Pat<(v8i8 (trunc (smin (smax (v8i16 V128:$Vn), (v8i16 VImm80)),
(v8i16 VImm7F)))),
(SQXTNv8i8 V128:$Vn)>;
def : Pat<(v8i8 (trunc (smax (smin (v8i16 V128:$Vn), (v8i16 VImm7F)),
(v8i16 VImm80)))),
def : Pat<(v8i8 (truncssat_s (v8i16 V128:$Vn))),
(SQXTNv8i8 V128:$Vn)>;
// trunc(smin(smax(X, -32768), 32767)) -> SQXTRN
// with reversed min/max
def : Pat<(v4i16 (trunc (smin (smax (v4i32 V128:$Vn), (v4i32 VImm8000)),
(v4i32 VImm7FFF)))),
(SQXTNv4i16 V128:$Vn)>;
def : Pat<(v4i16 (trunc (smax (smin (v4i32 V128:$Vn), (v4i32 VImm7FFF)),
(v4i32 VImm8000)))),
def : Pat<(v4i16 (truncssat_s (v4i32 V128:$Vn))),
(SQXTNv4i16 V128:$Vn)>;

// concat_vectors(Vd, trunc(umin(X, 255))) -> UQXTRN(Vd, Vn)
// trunc(smin(smax(X, -2147483648), 2147483647)) -> SQXTRN
def : Pat<(v2i32 (truncssat_s (v2i64 V128:$Vn))),
(SQXTNv2i32 V128:$Vn)>;
// trunc(umin(smax(X, 0), 255)) -> SQXTUN
def : Pat<(v8i8 (truncssat_u (v8i16 V128:$Vn))),
(SQXTUNv8i8 V128:$Vn)>;
// trunc(umin(smax(X, 0), 65535)) -> SQXTUN
def : Pat<(v4i16 (truncssat_u (v4i32 V128:$Vn))),
(SQXTUNv4i16 V128:$Vn)>;
// trunc(umin(smax(X, 0), 4294967295)) -> SQXTUN
def : Pat<(v2i32 (truncssat_u (v2i64 V128:$Vn))),
(SQXTUNv2i32 V128:$Vn)>;

// truncusat_u
// concat_vectors(Vd, truncusat_u(Vn)) ~> UQXTRN(Vd, Vn)
def : Pat<(v16i8 (concat_vectors
(v8i8 V64:$Vd),
(v8i8 (trunc (umin (v8i16 V128:$Vn), (v8i16 VImmFF)))))),
(v8i8 (truncusat_u (v8i16 V128:$Vn))))),
(UQXTNv16i8 (INSERT_SUBREG (IMPLICIT_DEF), V64:$Vd, dsub), V128:$Vn)>;
// concat_vectors(Vd, trunc(umin(X, 65535))) -> UQXTRN(Vd, Vn)
def : Pat<(v8i16 (concat_vectors
(v4i16 V64:$Vd),
(v4i16 (trunc (umin (v4i32 V128:$Vn), (v4i32 VImmFFFF)))))),
(v4i16 (truncusat_u (v4i32 V128:$Vn))))),
(UQXTNv8i16 (INSERT_SUBREG (IMPLICIT_DEF), V64:$Vd, dsub), V128:$Vn)>;
def : Pat<(v4i32 (concat_vectors
(v2i32 V64:$Vd),
(v2i32 (truncusat_u (v2i64 V128:$Vn))))),
(UQXTNv4i32 (INSERT_SUBREG (IMPLICIT_DEF), V64:$Vd, dsub), V128:$Vn)>;

// concat_vectors(Vd, trunc(smin(smax Vm, -128), 127) ~> SQXTN2(Vd, Vn)
// with reversed min/max
// concat_vectors(Vd, truncssat_s(Vn)) ~> SQXTN2(Vd, Vn)
def : Pat<(v16i8 (concat_vectors
(v8i8 V64:$Vd),
(v8i8 (trunc (smin (smax (v8i16 V128:$Vn), (v8i16 VImm80)),
(v8i16 VImm7F)))))),
(v8i8 (truncssat_s (v8i16 V128:$Vn))))),
(SQXTNv16i8 (INSERT_SUBREG (IMPLICIT_DEF), V64:$Vd, dsub), V128:$Vn)>;
def : Pat<(v16i8 (concat_vectors
(v8i8 V64:$Vd),
(v8i8 (trunc (smax (smin (v8i16 V128:$Vn), (v8i16 VImm7F)),
(v8i16 VImm80)))))),
(SQXTNv16i8 (INSERT_SUBREG (IMPLICIT_DEF), V64:$Vd, dsub), V128:$Vn)>;

// concat_vectors(Vd, trunc(smin(smax Vm, -32768), 32767) ~> SQXTN2(Vd, Vn)
// with reversed min/max
def : Pat<(v8i16 (concat_vectors
(v4i16 V64:$Vd),
(v4i16 (trunc (smin (smax (v4i32 V128:$Vn), (v4i32 VImm8000)),
(v4i32 VImm7FFF)))))),
(v4i16 (truncssat_s (v4i32 V128:$Vn))))),
(SQXTNv8i16 (INSERT_SUBREG (IMPLICIT_DEF), V64:$Vd, dsub), V128:$Vn)>;
def : Pat<(v4i32 (concat_vectors
(v2i32 V64:$Vd),
(v2i32 (truncssat_s (v2i64 V128:$Vn))))),
(SQXTNv4i32 (INSERT_SUBREG (IMPLICIT_DEF), V64:$Vd, dsub), V128:$Vn)>;

// concat_vectors(Vd, truncssat_u(Vn)) ~> SQXTUN2(Vd, Vn)
def : Pat<(v16i8 (concat_vectors
(v8i8 V64:$Vd),
(v8i8 (truncssat_u (v8i16 V128:$Vn))))),
(SQXTUNv16i8 (INSERT_SUBREG (IMPLICIT_DEF), V64:$Vd, dsub), V128:$Vn)>;
def : Pat<(v8i16 (concat_vectors
(v4i16 V64:$Vd),
(v4i16 (trunc (smax (smin (v4i32 V128:$Vn), (v4i32 VImm7FFF)),
(v4i32 VImm8000)))))),
(SQXTNv8i16 (INSERT_SUBREG (IMPLICIT_DEF), V64:$Vd, dsub), V128:$Vn)>;
(v4i16 (truncssat_u (v4i32 V128:$Vn))))),
(SQXTUNv8i16 (INSERT_SUBREG (IMPLICIT_DEF), V64:$Vd, dsub), V128:$Vn)>;
def : Pat<(v4i32 (concat_vectors
(v2i32 V64:$Vd),
(v2i32 (truncssat_u (v2i64 V128:$Vn))))),
(SQXTUNv4i32 (INSERT_SUBREG (IMPLICIT_DEF), V64:$Vd, dsub), V128:$Vn)>;

// Select BSWAP vector instructions into REV instructions
def : Pat<(v4i16 (bswap (v4i16 V64:$Rn))),
Expand Down
Loading
Loading