Skip to content

[RISCV][TTI] Scale the cost of FP-Int conversion with LMUL #87506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 2, 2024

Conversation

arcbbb
Copy link
Contributor

@arcbbb arcbbb commented Apr 3, 2024

Widening/narrowing the source data type to match the destination data type may require multiple steps.
To model the costs, the patch generated the interim type by following the logic in RISCVTargetLowering::lowerVPFPIntConvOp.

@llvmbot
Copy link
Member

llvmbot commented Apr 3, 2024

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-analysis

Author: Shih-Po Hung (arcbbb)

Changes

Widening/narrowing the source data type to match the destination data type may require multiple steps.
To model the costs, the patch generated the interim type by following the logic in RISCVTargetLowering::lowerVPFPIntConvOp.


Patch is 352.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87506.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+95-20)
  • (modified) llvm/test/Analysis/CostModel/RISCV/cast.ll (+1108-1108)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 38304ff90252f0..6ea17aa1130963 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -988,31 +988,106 @@ InstructionCost RISCVTTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
     return Cost;
   }
   case ISD::FP_TO_SINT:
-  case ISD::FP_TO_UINT:
+  case ISD::FP_TO_UINT: {
+    unsigned IsSigned = ISD == ISD::FP_TO_SINT;
+    unsigned FCVT = IsSigned ? RISCV::VFCVT_RTZ_X_F_V : RISCV::VFCVT_RTZ_XU_F_V;
+    unsigned FWCVT =
+        IsSigned ? RISCV::VFWCVT_RTZ_X_F_V : RISCV::VFWCVT_RTZ_XU_F_V;
+    unsigned FNCVT =
+        IsSigned ? RISCV::VFNCVT_RTZ_X_F_W : RISCV::VFNCVT_RTZ_XU_F_W;
+    unsigned SrcEltSize = Src->getScalarSizeInBits();
+    unsigned DstEltSize = Dst->getScalarSizeInBits();
+    if (DstEltSize == 1) {
+      // For fp vector to mask, we use:
+      // vfncvt.rtz.x.f.w v9, v8
+      // vand.vi v8, v9, 1
+      // vmsne.vi v0, v8, 0
+      SrcEltSize /= 2;
+      MVT ElementVT = MVT::getIntegerVT(SrcEltSize);
+      MVT InterimVT = SrcLT.second.changeVectorElementType(ElementVT);
+      return getRISCVInstructionCost(FNCVT, InterimVT, CostKind) +
+             getRISCVInstructionCost({RISCV::VAND_VI, RISCV::VMSNE_VI},
+                                     DstLT.second, CostKind);
+    }
+    if (DstEltSize == SrcEltSize)
+      return getRISCVInstructionCost(FCVT, DstLT.second, CostKind);
+    if (DstEltSize == (2 * SrcEltSize))
+      return getRISCVInstructionCost(FWCVT, DstLT.second, CostKind);
+    if (DstEltSize == (4 * SrcEltSize) && (SrcEltSize == 16)) {
+      // Convert f16 to f32 then convert f32 to i64.
+      MVT VecF32VT = DstLT.second.changeVectorElementType(MVT::f32);
+      return getRISCVInstructionCost(RISCV::VFWCVT_F_F_V, VecF32VT, CostKind) +
+             getRISCVInstructionCost(FWCVT, DstLT.second, CostKind);
+    }
+    if (DstEltSize < SrcEltSize) {
+      SrcEltSize /= 2;
+      MVT ElementVT = MVT::getIntegerVT(SrcEltSize);
+      MVT InterimVT = DstLT.second.changeVectorElementType(ElementVT);
+      InstructionCost Cost =
+          getRISCVInstructionCost(FNCVT, InterimVT, CostKind);
+      while (DstEltSize < SrcEltSize) {
+        SrcEltSize /= 2;
+        ElementVT = MVT::getIntegerVT(SrcEltSize);
+        InterimVT = DstLT.second.changeVectorElementType(ElementVT);
+        Cost += getRISCVInstructionCost(RISCV::VNSRL_WI, InterimVT, CostKind);
+      }
+      return Cost;
+    }
+    return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
+  }
   case ISD::SINT_TO_FP:
-  case ISD::UINT_TO_FP:
-    if (Src->getScalarSizeInBits() == 1 || Dst->getScalarSizeInBits() == 1) {
-      // The cost of convert from or to mask vector is different from other
-      // cases. We could not use PowDiff to calculate it.
-      // For mask vector to fp, we should use the following instructions:
+  case ISD::UINT_TO_FP: {
+    unsigned IsSigned = ISD == ISD::SINT_TO_FP;
+    unsigned FCVT = IsSigned ? RISCV::VFCVT_F_X_V : RISCV::VFCVT_F_XU_V;
+    unsigned FWCVT = IsSigned ? RISCV::VFWCVT_F_X_V : RISCV::VFWCVT_F_XU_V;
+    unsigned FNCVT = IsSigned ? RISCV::VFNCVT_F_X_W : RISCV::VFNCVT_F_XU_W;
+    unsigned SrcEltSize = Src->getScalarSizeInBits();
+    unsigned DstEltSize = Dst->getScalarSizeInBits();
+
+    if (SrcEltSize == 1) {
+      // For mask vector to fp, we use:
       // vmv.v.i v8, 0
       // vmerge.vim v8, v8, -1, v0
-      // vfcvt.f.x.v v8, v8
+      // vfwcvt.f.x.v v8, v8
+      MVT ElementVT = MVT::getIntegerVT(DstEltSize >> 1);
+      MVT VecHalfVT = DstLT.second.changeVectorElementType(ElementVT);
+      return getRISCVInstructionCost({RISCV::VMV_V_I, RISCV::VMERGE_VIM},
+                                     VecHalfVT, CostKind) +
+             getRISCVInstructionCost(FWCVT, DstLT.second, CostKind);
+    }
 
-      // And for fp vector to mask, we use:
-      // vfncvt.rtz.x.f.w v9, v8
-      // vand.vi v8, v9, 1
-      // vmsne.vi v0, v8, 0
-      return 3;
+    if (DstEltSize == SrcEltSize)
+      return getRISCVInstructionCost(FCVT, DstLT.second, CostKind);
+
+    if (DstEltSize == (2 * SrcEltSize))
+      return getRISCVInstructionCost(FWCVT, DstLT.second, CostKind);
+
+    if (DstEltSize == (4 * SrcEltSize)) {
+      unsigned WidenIntOp = IsSigned ? RISCV::VSEXT_VF2 : RISCV::VZEXT_VF2;
+      MVT ElementVT = MVT::getIntegerVT(DstEltSize >> 1);
+      MVT VecVT = DstLT.second.changeVectorElementType(ElementVT);
+      return getRISCVInstructionCost(WidenIntOp, VecVT, CostKind) +
+             getRISCVInstructionCost(FWCVT, DstLT.second, CostKind);
     }
-    if (std::abs(PowDiff) <= 1)
-      return 1;
-    // Backend could lower (v[sz]ext i8 to double) to vfcvt(v[sz]ext.f8 i8),
-    // so it only need two conversion.
-    if (Src->isIntOrIntVectorTy())
-      return 2;
-    // Counts of narrow/widen instructions.
-    return std::abs(PowDiff);
+    if (DstEltSize == (8 * SrcEltSize)) {
+      unsigned WidenIntOp = IsSigned ? RISCV::VSEXT_VF4 : RISCV::VZEXT_VF4;
+      MVT ElementVT = MVT::getIntegerVT(DstEltSize >> 1);
+      MVT VecVT = DstLT.second.changeVectorElementType(ElementVT);
+      return getRISCVInstructionCost(WidenIntOp, VecVT, CostKind) +
+             getRISCVInstructionCost(FWCVT, DstLT.second, CostKind);
+    }
+    if (SrcEltSize == (2 * DstEltSize))
+      return getRISCVInstructionCost(FNCVT, DstLT.second, CostKind);
+
+    if ((SrcEltSize == (4 * DstEltSize)) && (DstEltSize == 16)) {
+      // Handle i64 to f16: vfncvt.f.x/xu + vfncvt.f.f
+      MVT DstVT = DstLT.second.changeVectorElementType(MVT::f32);
+      return getRISCVInstructionCost(FNCVT, DstVT, CostKind) +
+             getRISCVInstructionCost(RISCV::VFNCVT_F_F_W, DstLT.second,
+                                     CostKind);
+    }
+    return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
+  }
   }
   return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
 }
diff --git a/llvm/test/Analysis/CostModel/RISCV/cast.ll b/llvm/test/Analysis/CostModel/RISCV/cast.ll
index 6ddd57a24c51f5..616310b30d0da9 100644
--- a/llvm/test/Analysis/CostModel/RISCV/cast.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/cast.ll
@@ -1725,87 +1725,87 @@ define void @fptosi() {
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v4f16_v4i32 = fptosi <4 x half> undef to <4 x i32>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v4f32_v4i32 = fptosi <4 x float> undef to <4 x i32>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v4f64_v4i32 = fptosi <4 x double> undef to <4 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4f16_v4i64 = fptosi <4 x half> undef to <4 x i64>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v4f32_v4i64 = fptosi <4 x float> undef to <4 x i64>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v4f64_v4i64 = fptosi <4 x double> undef to <4 x i64>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v4f16_v4i64 = fptosi <4 x half> undef to <4 x i64>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4f32_v4i64 = fptosi <4 x float> undef to <4 x i64>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4f64_v4i64 = fptosi <4 x double> undef to <4 x i64>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v4f16_v4i1 = fptosi <4 x half> undef to <4 x i1>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v4f32_v4i1 = fptosi <4 x float> undef to <4 x i1>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v4f64_v4i1 = fptosi <4 x double> undef to <4 x i1>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v8f16_v8i8 = fptosi <8 x half> undef to <8 x i8>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8f32_v8i8 = fptosi <8 x float> undef to <8 x i8>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v8f64_v8i8 = fptosi <8 x double> undef to <8 x i8>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v8f64_v8i8 = fptosi <8 x double> undef to <8 x i8>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v8f16_v8i16 = fptosi <8 x half> undef to <8 x i16>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v8f32_v8i16 = fptosi <8 x float> undef to <8 x i16>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8f64_v8i16 = fptosi <8 x double> undef to <8 x i16>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v8f16_v8i32 = fptosi <8 x half> undef to <8 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v8f32_v8i32 = fptosi <8 x float> undef to <8 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v8f64_v8i32 = fptosi <8 x double> undef to <8 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8f16_v8i64 = fptosi <8 x half> undef to <8 x i64>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v8f32_v8i64 = fptosi <8 x float> undef to <8 x i64>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v8f64_v8i64 = fptosi <8 x double> undef to <8 x i64>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v8f64_v8i16 = fptosi <8 x double> undef to <8 x i16>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8f16_v8i32 = fptosi <8 x half> undef to <8 x i32>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8f32_v8i32 = fptosi <8 x float> undef to <8 x i32>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8f64_v8i32 = fptosi <8 x double> undef to <8 x i32>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v8f16_v8i64 = fptosi <8 x half> undef to <8 x i64>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v8f32_v8i64 = fptosi <8 x float> undef to <8 x i64>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v8f64_v8i64 = fptosi <8 x double> undef to <8 x i64>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v8f16_v8i1 = fptosi <8 x half> undef to <8 x i1>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v8f32_v8i1 = fptosi <8 x float> undef to <8 x i1>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v8f64_v8i1 = fptosi <8 x double> undef to <8 x i1>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v8f64_v8i1 = fptosi <8 x double> undef to <8 x i1>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v16f16_v16i8 = fptosi <16 x half> undef to <16 x i8>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v16f32_v16i8 = fptosi <16 x float> undef to <16 x i8>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v16f64_v16i8 = fptosi <16 x double> undef to <16 x i8>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v16f16_v16i16 = fptosi <16 x half> undef to <16 x i16>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v16f32_v16i16 = fptosi <16 x float> undef to <16 x i16>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v16f64_v16i16 = fptosi <16 x double> undef to <16 x i16>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v16f16_v16i32 = fptosi <16 x half> undef to <16 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v16f32_v16i32 = fptosi <16 x float> undef to <16 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v16f64_v16i32 = fptosi <16 x double> undef to <16 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v16f16_v16i64 = fptosi <16 x half> undef to <16 x i64>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v16f32_v16i64 = fptosi <16 x float> undef to <16 x i64>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v16f64_v16i64 = fptosi <16 x double> undef to <16 x i64>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v16f32_v16i8 = fptosi <16 x float> undef to <16 x i8>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v16f64_v16i8 = fptosi <16 x double> undef to <16 x i8>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v16f16_v16i16 = fptosi <16 x half> undef to <16 x i16>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v16f32_v16i16 = fptosi <16 x float> undef to <16 x i16>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v16f64_v16i16 = fptosi <16 x double> undef to <16 x i16>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v16f16_v16i32 = fptosi <16 x half> undef to <16 x i32>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v16f32_v16i32 = fptosi <16 x float> undef to <16 x i32>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v16f64_v16i32 = fptosi <16 x double> undef to <16 x i32>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %v16f16_v16i64 = fptosi <16 x half> undef to <16 x i64>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v16f32_v16i64 = fptosi <16 x float> undef to <16 x i64>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v16f64_v16i64 = fptosi <16 x double> undef to <16 x i64>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v16f16_v16i1 = fptosi <16 x half> undef to <16 x i1>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v16f32_v16i1 = fptosi <16 x float> undef to <16 x i1>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v16f64_v16i1 = fptosi <16 x double> undef to <16 x i1>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32f16_v32i8 = fptosi <32 x half> undef to <32 x i8>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32f32_v32i8 = fptosi <32 x float> undef to <32 x i8>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v32f64_v32i8 = fptosi <32 x double> undef to <32 x i8>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32f16_v32i16 = fptosi <32 x half> undef to <32 x i16>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32f32_v32i16 = fptosi <32 x float> undef to <32 x i16>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v32f64_v32i16 = fptosi <32 x double> undef to <32 x i16>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32f16_v32i32 = fptosi <32 x half> undef to <32 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32f32_v32i32 = fptosi <32 x float> undef to <32 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v32f64_v32i32 = fptosi <32 x double> undef to <32 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v32f16_v32i64 = fptosi <32 x half> undef to <32 x i64>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v32f32_v32i64 = fptosi <32 x float> undef to <32 x i64>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v16f32_v16i1 = fptosi <16 x float> undef to <16 x i1>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v16f64_v16i1 = fptosi <16 x double> undef to <16 x i1>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32f16_v32i8 = fptosi <32 x half> undef to <32 x i8>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v32f32_v32i8 = fptosi <32 x float> undef to <32 x i8>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %v32f64_v32i8 = fptosi <32 x double> undef to <32 x i8>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v32f16_v32i16 = fptosi <32 x half> undef to <32 x i16>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v32f32_v32i16 = fptosi <32 x float> undef to <32 x i16>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 13 for instruction: %v32f64_v32i16 = fptosi <32 x double> undef to <32 x i16>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v32f16_v32i32 = fptosi <32 x half> undef to <32 x i32>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v32f32_v32i32 = fptosi <32 x float> undef to <32 x i32>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %v32f64_v32i32 = fptosi <32 x double> undef to <32 x i32>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %v32f16_v32i64 = fptosi <32 x half> undef to <32 x i64>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %v32f32_v32i64 = fptosi <32 x float> undef to <32 x i64>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32f64_v32i64 = fptosi <32 x double> undef to <32 x i64>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v32f16_v32i1 = fptosi <32 x half> undef to <32 x i1>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v32f32_v32i1 = fptosi <32 x float> undef to <32 x i1>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %v32f64_v32i1 = fptosi <32 x double> undef to <32 x i1>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v64f16_v64i8 = fptosi <64 x half> undef to <64 x i8>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %v64f32_v64i8 = fptosi <64 x float> undef to <64 x i8>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %v64f64_v64i8 = fptosi <64 x double> undef to <64 x i8>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v64f16_v64i16 = fptosi <64 x half> undef to <64 x i16>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v64f32_v64i16 = fptosi <64 x float> undef to <64 x i16>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 11 for instruction: %v64f64_v64i16 = fptosi <64 x double> undef to <64 x i16>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v64f16_v64i32 = fptosi <64 x half> undef to <64 x i32>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v32f16_v32i1 = fptosi <32 x half> undef to <32 x i1>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v32f32_v32i1 = fptosi <32 x float> undef to <32 x i1>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 13 for instruction: %v32f64_v32i1 = fptosi <32 x double> undef to <32 x i1>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v64f16_v64i8 = fptosi <64 x half> undef to <64 x i8>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 13 for instruction: %v64f32_v64i8 = fptosi <64 x float> undef to <64 x i8>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 31 for instruction: %v64f64_v64i8 = fptosi <64 x double> undef to <64 x i8>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v64f16_v64i16 = fptosi <64 x half> undef to <64 x i16>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %v64f32_v64i16 = fptosi <64 x float> undef to <64 x i16>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 27 for instruction: %v64f64_v64i16 = fptosi <64 x double> undef to <64 x i16>
+; RV32-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %v64f16_v64i32 = fptosi <64 x half> undef to <64 x i32>
 ; RV32-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v64f32_v64i32 = fptosi <64 x float> undef to <64 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %v64f64_v64i32 = fptosi <64 x double> undef to <64 x i32>
-; RV32-NEXT:  Cost Model: Found an estimated cost of 11 for instruction: %v64f16_v64i64 = fptosi <64 x half> undef to <64 x i64>
-; RV32-NEXT:  Cost Model: Found an estimated cost...
[truncated]

// vmv.v.i v8, 0
// vmerge.vim v8, v8, -1, v0
// vfcvt.f.x.v v8, v8
// vfwcvt.f.x.v v8, v8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weirdly vp.sitofp does not use a widening fwcvt here, but that's a separate issue that doesn't affect this patch.

@arcbbb
Copy link
Contributor Author

arcbbb commented May 27, 2024

ping

1 similar comment
@arcbbb
Copy link
Contributor Author

arcbbb commented Jul 24, 2024

ping

@arcbbb arcbbb force-pushed the tti-fp-conv-cost branch from 6d01c61 to 0398f01 Compare July 30, 2024 08:33
Comment on lines 1138 to 1164
if ((SrcEltSize >> 1) > DstEltSize) {
// For mask type, we use:
// vand.vi v8, v9, 1
// vmsne.vi v0, v8, 0
VectorType *VecTy =
VectorType::get(IntegerType::get(Dst->getContext(), SrcEltSize >> 1),
cast<VectorType>(Dst)->getElementCount());
Cost +=
getCastInstrCost(Instruction::Trunc, Dst, VecTy, CCH, CostKind, I);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved into the SrcEltSize > DstEltSize branch above, so we can reuse VecVT? Also I'm happy if you want to leave out the mask type comment, thanks for clarifying it in the reply to my review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved but I cannot reuse it since one is MVT and the other is VectorType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, can this be renamed to cast-half since it tests both zvfh and zvfhmin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

Comment on lines 1138 to 1140
VectorType *VecTy = VectorType::get(
IntegerType::get(Dst->getContext(), SrcEltSize >> 1),
cast<VectorType>(Dst)->getElementCount());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use getTypeForEVT so we reuse VecVT? Otherwise I don't think we use the legalized type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for catching that!

unsigned DstEltSize = Dst->getScalarSizeInBits();
InstructionCost Cost = 0;
if ((SrcEltSize == 16) &&
(!ST->hasVInstructionsF16() || ((DstEltSize >> 1) > SrcEltSize))) {
Copy link
Collaborator

@topperc topperc Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use / 2 instead of >> 1. Leave converting divide to shift as a compiler optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@arcbbb
Copy link
Contributor Author

arcbbb commented Aug 30, 2024

gentle ping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to precommit splitting out the half tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, created in #106692

arcbbb added a commit that referenced this pull request Aug 30, 2024
Widening/narrowing the source data type to match the destination data
type may require multiple steps.
To model the costs, the patch generated the interim type by following
the logic in RISCVTargetLowering::lowerVPFPIntConvOp.
Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arcbbb arcbbb merged commit 837ee5b into llvm:main Sep 2, 2024
8 checks passed
@arcbbb arcbbb deleted the tti-fp-conv-cost branch September 2, 2024 01:38
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 2, 2024

LLVM Buildbot has detected a new failure on builder clang-armv7-global-isel running on linaro-clang-armv7-global-isel while building llvm at step 7 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/39/builds/1391

Here is the relevant piece of the build log for the reference
Step 7 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'ClangPseudo :: cxx/unsized-array.cpp' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: clang-pseudo -grammar=cxx -source=/home/tcwg-buildbot/worker/clang-armv7-global-isel/llvm/clang-tools-extra/pseudo/test/cxx/unsized-array.cpp --print-forest | /home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv7-global-isel/llvm/clang-tools-extra/pseudo/test/cxx/unsized-array.cpp
+ clang-pseudo -grammar=cxx -source=/home/tcwg-buildbot/worker/clang-armv7-global-isel/llvm/clang-tools-extra/pseudo/test/cxx/unsized-array.cpp --print-forest
clang-pseudo: ../llvm/clang-tools-extra/pseudo/lib/cxx/CXX.cpp:437: auto clang::pseudo::cxx::getLanguage()::(anonymous class)::operator()() const: Assertion `Diags.empty()' failed.
+ /home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv7-global-isel/llvm/clang-tools-extra/pseudo/test/cxx/unsized-array.cpp
#0 0x00c5535c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/clang-pseudo+0x5f35c)
#1 0x00c530e4 llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/clang-pseudo+0x5d0e4)
#2 0x00c55db0 SignalHandler(int) Signals.cpp:0:0
#3 0xf792d6e0 __default_sa_restorer ./signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:67:0
#4 0xf791db06 ./csu/../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47:0
#5 0xf795d292 __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
#6 0xf792c840 gsignal ./signal/../sysdeps/posix/raise.c:27:6
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv7-global-isel/llvm/clang-tools-extra/pseudo/test/cxx/unsized-array.cpp

--

********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants