Skip to content

[VPlan] Implement VPWidenCallRecipe::computeCost (NFCI). #106047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 1, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Aug 26, 2024

Implement cost computation for VPWidenCallRecipe. In some cases, targets use argument info to compute intrinsic costs. If all operands of the call are VPValues with an underlying IR value, use the IR values as arguments.

@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Implement cost computation for VPWidenCallRecipe. In some cases, targets use argument info to compute intrinsic costs. If all operands of the call are VPValues with an underlying IR value, use the IR values as arguments.


Full diff: https://github.com/llvm/llvm-project/pull/106047.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+4)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+40)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 36a1aa08654d5b..ea88fe6720d96f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1565,6 +1565,10 @@ class VPWidenCallRecipe : public VPSingleDefRecipe {
   /// Produce a widened version of the call instruction.
   void execute(VPTransformState &State) override;
 
+  /// Return the cost of this VPWidenCallRecipe.
+  InstructionCost computeCost(ElementCount VF,
+                              VPCostContext &Ctx) const override;
+
   Function *getCalledScalarFunction() const {
     return cast<Function>(getOperand(getNumOperands() - 1)->getLiveInIRValue());
   }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index fe1325f4163004..8bc0c6621fbde2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -924,6 +924,46 @@ void VPWidenCallRecipe::execute(VPTransformState &State) {
   }
 }
 
+InstructionCost VPWidenCallRecipe::computeCost(ElementCount VF,
+                                               VPCostContext &Ctx) const {
+  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
+  if (Variant) {
+    return Ctx.TTI.getCallInstrCost(nullptr, Variant->getReturnType(),
+                                    Variant->getFunctionType()->params(),
+                                    CostKind);
+  }
+
+  FastMathFlags FMF;
+  // TODO: Manage flags via VPRecipeWithIRFlags.
+  if (auto *FPMO = dyn_cast_or_null<FPMathOperator>(getUnderlyingValue()))
+    FMF = FPMO->getFastMathFlags();
+
+  // Some backends analyze intrinsic arguments to determine cost. If all
+  // operands are VPValues with an underlying IR value, use the original IR
+  // values for cost computations.
+  SmallVector<const Value *> Arguments;
+  for (VPValue *Op : operands()) {
+    auto *V = Op->getUnderlyingValue();
+    if (!V) {
+      Arguments.clear();
+      break;
+    }
+    Arguments.push_back(V);
+  }
+
+  Type *RetTy =
+      ToVectorTy(Ctx.Types.inferScalarType(this->getVPSingleValue()), VF);
+  SmallVector<Type *> ParamTys;
+  for (unsigned I = 0; I != getNumOperands(); ++I)
+    ParamTys.push_back(
+        ToVectorTy(Ctx.Types.inferScalarType(getOperand(I)), VF));
+
+  IntrinsicCostAttributes CostAttrs(VectorIntrinsicID, RetTy, Arguments,
+                                    ParamTys, FMF);
+  return Ctx.TTI.getIntrinsicInstrCost(
+      CostAttrs, TargetTransformInfo::TCK_RecipThroughput);
+}
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 void VPWidenCallRecipe::print(raw_ostream &O, const Twine &Indent,
                               VPSlotTracker &SlotTracker) const {

@fhahn fhahn requested review from arcbbb and ElvisWang123 August 27, 2024 10:58
Copy link
Contributor

@arcbbb arcbbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

IntrinsicCostAttributes CostAttrs(VectorIntrinsicID, RetTy, Arguments,
ParamTys, FMF);
return Ctx.TTI.getIntrinsicInstrCost(
CostAttrs, TargetTransformInfo::TCK_RecipThroughput);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can just use CostKind here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

Implement cost computation for VPWidenCallRecipe. In some cases, targets
use argument info to compute intrinsic costs. If all operands of the
call are VPValues with an underlying IR value, use the IR values as
arguments.
@fhahn fhahn force-pushed the vpwidencall-cost branch from 3fc1eee to 3730ebf Compare August 30, 2024 14:22
@fhahn fhahn merged commit 9ccf825 into llvm:main Sep 1, 2024
5 of 8 checks passed
@fhahn fhahn deleted the vpwidencall-cost branch September 1, 2024 15:26
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 1, 2024

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vls-2stage running on linaro-g3-03 while building llvm at step 18 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/4/builds/1815

Here is the relevant piece of the build log for the reference
Step 18 (test-suite) failure: test (failure)
...
cd /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Fortran/gfortran/regression && /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/stage2.install/bin/llvm-size --format=sysv /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Fortran/gfortran/regression/gfortran-regression-execute-regression__widechar_intrinsics_5_f90 > /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Fortran/gfortran/regression/gfortran-regression-execute-regression__widechar_intrinsics_5_f90.size
gmake[2]: Leaving directory '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build'
[ 36%] Built target gfortran-regression-execute-regression__widechar_intrinsics_5_f90
gmake[1]: Leaving directory '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build'
gmake: *** [Makefile:136: all] Error 2

2024-09-01 16:38:36 INFO: Testing...
2024-09-01 16:38:36 INFO: Execute: /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/stage1/bin/llvm-lit -v -j 32 /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build -o /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/outputi47f1v5p.json
-- Testing: 10005 tests, 32 workers --
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1026.test (1 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1026.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_1026' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1028.test (2 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1028.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_1028' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_366.test (3 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_366.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_366' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1027.test (4 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1027.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_1027' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_367.test (5 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_367.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_367' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_38.test (6 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_38.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_38' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_698.test (7 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_698.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_698' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_add_1029.test (8 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_add_1029.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_add_1029' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_add_1030.test (9 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_add_1030.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_add_1030' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_697.test (10 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_697.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_697' is missing
********************

@chapuni
Copy link
Contributor

chapuni commented Sep 2, 2024

Looks like this causes buildbot failures.

In my case, stage1-clang crashes while compiling ADTTests on the aarch64 host.

FAILED: unittests/ADT/CMakeFiles/ADTTests.dir/BitVectorTest.cpp.o 
clang++: llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7366: llvm::VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop)) && " VPlan cost model and legacy cost model disagreed"' failed.

@fhahn
Copy link
Contributor Author

fhahn commented Sep 2, 2024

@chapuni could you share the full crashing compiler invocation? Bootstrapping on ARM64 macOS works fine for me.

@DavidSpickett
Copy link
Collaborator

This is also the cause of test suite failures on AArch64 Linux: https://lab.llvm.org/buildbot/#/builders/41/builds/1722

It's just that build configuration issues are hiding them under a pile of unintentional NOEXE results. We'll find what is actually failing.

@fhahn
Copy link
Contributor Author

fhahn commented Sep 2, 2024

@DavidSpickett thanks, I think it looks like an assertion is triggered when building a Fortran workload from the llvm-test-suite with Flang.

Is there any chance to extract the IR reproducer from Flang?

@antmox
Copy link
Contributor

antmox commented Sep 2, 2024

Looks like the failing test is typebound_operator_9.f03

@ye-luo ye-luo mentioned this pull request Sep 2, 2024
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Sep 3, 2024
…m#106047)"

Breaks build of 534.hpgmg

This reverts commit
9ccf825 and
b0de7fa [VPlan] Use op from underlying call in computeCost if needed.

Change-Id: Ib8b10cc72bfb787d5f295971aa871e625148aa9d
@mstorsjo
Copy link
Member

mstorsjo commented Sep 3, 2024

I also ran into assert failures with this, on a multitude of architectures. (I would have caught it already yesterday, but my nightly build yesterday was broken due to other reasons, so yesterday's build went untested otherwise.)

It's reproducible with the following snippets:

float *a;
int b;
double fabs();
__attribute__((cold)) int c() {
  for (; b; b++)
    a[b] = 1. - fabs(b);
}
$ clang -target aarch64-linux-gnu -c -O2 repro.c
clang: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7369: llvm::VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop)) && " VPlan cost model and legacy cost model disagreed"' failed.

And:

int *a;
__attribute__((cold)) int b(int c) {
  long d, f;
  double e;
  for (int g = 0; g < c; g++) {
    e = g * 3.14159265358979323846;
    f = e;
    if (f > 0)
      d = 0;
    else
      d = f;
    a[g] = d;
  }
}
$ clang -target x86_64-w64-mingw32 -c -O2 repro.c
clang: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7369: llvm::VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop)) && " VPlan cost model and legacy cost model disagreed"' failed.

@fhahn
Copy link
Contributor Author

fhahn commented Sep 3, 2024

@mstorsjo thanks, should be fixed by dd94537

@antmox @DavidSpickett AFAICT now the bots with Flang crashes should be back to green?

@antmox
Copy link
Contributor

antmox commented Sep 3, 2024

@fhahn yes indeed, bots are back to green, thanks

@aeubanks
Copy link
Contributor

aeubanks commented Sep 3, 2024

this is causing crashes on

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64-unknown-linux-android26"

define fastcc void @selfguided_filter(ptr %arrayidx32, ptr %dav1d_sgr_x_by_x) {
entry:
  %sumsq = alloca [26520 x i32], i32 0, align 4
  br label %for.body27

for.cond22.for.cond.cleanup26_crit_edge:          ; preds = %for.body27
  ret void

for.body27:                                       ; preds = %for.body27, %entry
  %indvars.iv1 = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body27 ]
  %0 = load i32, ptr %arrayidx32, align 4
  %cond.i = tail call i32 @llvm.smax.i32(i32 %0, i32 0)
  %cond.i186 = tail call i32 @llvm.umin.i32(i32 %cond.i, i32 1)
  %idxprom45 = zext i32 %cond.i186 to i64
  %arrayidx46 = getelementptr [256 x i8], ptr %dav1d_sgr_x_by_x, i64 0, i64 %idxprom45
  %1 = load i8, ptr %arrayidx46, align 1
  %conv47 = zext i8 %1 to i32
  %mul51 = mul i32 0, %conv47
  store i32 %mul51, ptr %sumsq, align 4
  store i32 0, ptr null, align 4
  %indvars.iv.next = add i64 %indvars.iv1, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 0
  br i1 %exitcond.not, label %for.cond22.for.cond.cleanup26_crit_edge, label %for.body27
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.smax.i32(i32, i32) #0

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.umin.i32(i32, i32) #0

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
$ opt -passes="function<eager-inv>(loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>)" -disable-output /tmp/a.ll
opt: ../../llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7382: VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop)) && " VPlan cost model and legacy cost model disagreed"' failed.                                            

@fhahn
Copy link
Contributor Author

fhahn commented Sep 3, 2024

@aeubanks thanks for the report, it was the same issue as #107051: setVectorizedCallDecision did not consider forced scalars, meaning the VPlans created did not reflect the legacy cost decisions. Should be fixed now

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Sep 3, 2024
Implement cost computation for VPWidenCallRecipe. In some cases, targets
use argument info to compute intrinsic costs. If all operands of the
call are VPValues with an underlying IR value, use the IR values as
arguments.

PR: llvm#106731

Change-Id: I9eea7ea07341d8662d5cb0798dcb7151f0fb04e2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants