[VPlan] Implementation of onlyFirstLaneUsed for VPLiveOut class #93513

arcbbb · 2024-05-28T08:14:18Z

Following up on #83068, when scalarizing VPWidenPointerInductionRecipe,
The onlyScalarsGenerated checks whether VF is scalable. With scalable
VF, it requires all user to use the first lane only.
However if any user happens to be VPLiveOut, the check inevitably fails.

This patch addresses this by implementing onlyFirstLaneUsed for the
VPLiveOut class.
It ensures that if the operand is a VPWidenPointerInductionRecipe, it
returns true.

llvmbot · 2024-05-28T08:14:49Z

@llvm/pr-subscribers-llvm-transforms

Author: Shih-Po Hung (arcbbb)

Changes

Following up on #83068, when scalarizing VPWidenPointerInductionRecipe,
The onlyScalarsGenerated checks whether VF is scalable. With scalable
VF, it requires all user to use the first lane only.
However if any user happens to be VPLiveOut, the check inevitably fails.

This patch addresses this by implementing onlyFirstLaneUsed for the
VPLiveOut class.
It ensures that if the operand is a VPWidenPointerInductionRecipe, it
returns true.

Full diff: https://github.com/llvm/llvm-project/pull/93513.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+2)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+7)
(added) llvm/test/Transforms/LoopVectorize/vplan-optimize-ptr-induction.ll (+98)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e75a1de548f7d..a0140f64eb643 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -693,6 +693,8 @@ class VPLiveOut : public VPUser {
     return true;
   }
 
+  bool onlyFirstLaneUsed(const VPValue *Op) const override;
+
   PHINode *getPhi() const { return Phi; }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5eb99ffd1e10e..f7a9fac8fb3d7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -186,6 +186,13 @@ bool VPRecipeBase::mayHaveSideEffects() const {
   }
 }
 
+bool VPLiveOut::onlyFirstLaneUsed(const VPValue *Op) const {
+  assert(is_contained(operands(), Op) &&
+   "Op must be an operand of the recipe");
+
+  return vputils::isUniformAfterVectorization(getOperand(0)) || isa<VPWidenPointerInductionRecipe>(Op);
+}
+
 void VPLiveOut::fixPhi(VPlan &Plan, VPTransformState &State) {
   auto Lane = VPLane::getLastLaneForVF(State.VF);
   VPValue *ExitValue = getOperand(0);
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-optimize-ptr-induction.ll b/llvm/test/Transforms/LoopVectorize/vplan-optimize-ptr-induction.ll
new file mode 100644
index 0000000000000..25d6c64a5fce6
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/vplan-optimize-ptr-induction.ll
@@ -0,0 +1,98 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -scalable-vectorization=on  -force-target-supports-scalable-vectors -passes=loop-vectorize < %s -S | FileCheck %s
+define ptr @foo(ptr %y, float %alpha, i32 %N) {
+; CHECK-LABEL: define ptr @foo(
+; CHECK-SAME: ptr [[Y:%.*]], float [[ALPHA:%.*]], i32 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    [[CMP3:%.*]] = icmp sgt i32 [[N]], 0
+; CHECK-NEXT:    br i1 [[CMP3]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; CHECK:       [[FOR_BODY_PREHEADER]]:
+; CHECK-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[N]] to i64
+; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], [[TMP1]]
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
+; CHECK-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[Y]], i64 [[N_VEC]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 1 x float> poison, float [[ALPHA]], i64 0
+; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x float> [[BROADCAST_SPLATINSERT]], <vscale x 1 x float> poison, <vscale x 1 x i32> zeroinitializer
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call <vscale x 1 x i64> @llvm.experimental.stepvector.nxv1i64()
+; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 1 x i64> poison, i64 [[TMP6]], i64 0
+; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[DOTSPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP8:%.*]] = add <vscale x 1 x i64> zeroinitializer, [[TMP3]]
+; CHECK-NEXT:    [[VECTOR_GEP:%.*]] = mul <vscale x 1 x i64> [[TMP8]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 1, i64 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)
+; CHECK-NEXT:    [[TMP9:%.*]] = add <vscale x 1 x i64> [[DOTSPLAT]], [[VECTOR_GEP]]
+; CHECK-NEXT:    [[TMP7:%.*]] = add i64 [[TMP6]], 0
+; CHECK-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[Y]], i64 [[TMP7]]
+; CHECK-NEXT:    [[TMP10:%.*]] = add i64 [[TMP6]], 0
+; CHECK-NEXT:    [[TMP11:%.*]] = getelementptr inbounds float, ptr [[Y]], i64 [[TMP10]]
+; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds float, ptr [[TMP11]], i32 0
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 1 x float>, ptr [[TMP12]], align 4
+; CHECK-NEXT:    [[TMP13:%.*]] = fadd fast <vscale x 1 x float> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
+; CHECK-NEXT:    store <vscale x 1 x float> [[TMP13]], ptr [[TMP12]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[TMP6]], [[TMP2]]
+; CHECK-NEXT:    [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
+; CHECK-NEXT:    [[CMO:%.*]] = sub i64 [[N_VEC]], 1
+; CHECK-NEXT:    [[IND_ESCAPE:%.*]] = getelementptr i8, ptr [[Y]], i64 [[CMO]]
+; CHECK-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[Y]], %[[FOR_BODY_PREHEADER]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL1:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY_PREHEADER]] ]
+; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
+; CHECK:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; CHECK-NEXT:    [[END_0_LCSSA:%.*]] = phi ptr [ [[END_0:%.*]], %[[FOR_BODY]] ], [ [[IND_ESCAPE]], %[[MIDDLE_BLOCK]] ]
+; CHECK-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; CHECK:       [[FOR_COND_CLEANUP]]:
+; CHECK-NEXT:    [[RESULT:%.*]] = phi ptr [ [[Y]], %[[ENTRY]] ], [ [[END_0_LCSSA]], %[[FOR_COND_CLEANUP_LOOPEXIT]] ]
+; CHECK-NEXT:    ret ptr [[RESULT]]
+; CHECK:       [[FOR_BODY]]:
+; CHECK-NEXT:    [[END_0]] = phi ptr [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INCDEC_PTR:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL1]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[Y]], i64 [[INDVARS_IV]]
+; CHECK-NEXT:    [[TMP15:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[ADD:%.*]] = fadd fast float [[TMP15]], [[ALPHA]]
+; CHECK-NEXT:    store float [[ADD]], ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[END_0]], i64 1
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+;
+entry:
+  %cmp3 = icmp sgt i32 %N, 0
+  br i1 %cmp3, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:
+  %wide.trip.count = zext nneg i32 %N to i64
+  br label %for.body
+
+for.cond.cleanup:
+  %result = phi ptr [ %y, %entry ], [ %end.0, %for.body ]
+  ret ptr %result
+
+for.body:
+  %end.0 = phi ptr [ %y, %for.body.preheader ], [ %incdec.ptr, %for.body ]
+  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds float, ptr %y, i64 %indvars.iv
+  %0 = load float, ptr %arrayidx, align 4
+  %add = fadd fast float %0, %alpha
+  store float %add, ptr %arrayidx, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+  %incdec.ptr = getelementptr inbounds i8, ptr %end.0, i64 1
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+;.

github-actions · 2024-05-28T08:17:46Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Following up on llvm#83068, when scalarizing VPWidenPointerInductionRecipe, The `onlyScalarsGenerated` checks whether VF is scalable. With scalable VF, it requires all user to use the first lane only. However if any user happens to be VPLiveOut, the check inevitably fails. This patch addresses this by implementing onlyFirstLaneUsed for the VPLiveOut class. It ensures that if the operand is a VPWidenPointerInductionRecipe, it returns true.

arcbbb · 2024-05-28T11:31:10Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  assert(is_contained(operands(), Op) && "Op must be an operand of the recipe");
+
+  return vputils::isUniformAfterVectorization(getOperand(0)) ||
+         isa<VPWidenPointerInductionRecipe>(Op);


fixupIVUsers removes the VPLiveOut after its fix. Given that the end value of IV is derived from TripCount, I just think we might alternatively eliminate the use of ptr-iv by creating a new VPValue for VPLiveOut.

arcbbb · 2024-07-08T11:26:02Z

ping

fhahn · 2024-09-12T14:29:11Z

Is this still needed?

arcbbb · 2024-09-26T06:14:24Z

Is this still needed?

No, after commit fc9cd32 [VPlan] Don't add live-outs for IV phis the need is gone. Thanks!

[VPlan] precommit test for widen-ptr-induction transform

3bde379

arcbbb requested a review from fhahn May 28, 2024 08:14

llvmbot added vectorizers llvm:transforms labels May 28, 2024

arcbbb force-pushed the vpliveout-onlyfirstlane branch from 6ebc786 to 7d5cb13 Compare May 28, 2024 08:50

update sve tests

11243c6

arcbbb commented May 28, 2024

View reviewed changes

arcbbb closed this Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VPlan] Implementation of onlyFirstLaneUsed for VPLiveOut class #93513

[VPlan] Implementation of onlyFirstLaneUsed for VPLiveOut class #93513

Uh oh!

arcbbb commented May 28, 2024

Uh oh!

llvmbot commented May 28, 2024

Uh oh!

github-actions bot commented May 28, 2024 •

edited

Loading

Uh oh!

arcbbb May 28, 2024

Uh oh!

arcbbb commented Jul 8, 2024

Uh oh!

fhahn commented Sep 12, 2024

Uh oh!

arcbbb commented Sep 26, 2024

Uh oh!

Uh oh!

[VPlan] Implementation of onlyFirstLaneUsed for VPLiveOut class #93513

[VPlan] Implementation of onlyFirstLaneUsed for VPLiveOut class #93513

Uh oh!

Conversation

arcbbb commented May 28, 2024

Uh oh!

llvmbot commented May 28, 2024

Uh oh!

github-actions bot commented May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arcbbb May 28, 2024

Choose a reason for hiding this comment

Uh oh!

arcbbb commented Jul 8, 2024

Uh oh!

fhahn commented Sep 12, 2024

Uh oh!

arcbbb commented Sep 26, 2024

Uh oh!

Uh oh!

github-actions bot commented May 28, 2024 •

edited

Loading