-
Notifications
You must be signed in to change notification settings - Fork 14.1k
[VPlan] Implementation of onlyFirstLaneUsed for VPLiveOut class #93513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Shih-Po Hung (arcbbb) ChangesFollowing up on #83068, when scalarizing VPWidenPointerInductionRecipe, This patch addresses this by implementing onlyFirstLaneUsed for the Full diff: https://github.com/llvm/llvm-project/pull/93513.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e75a1de548f7d..a0140f64eb643 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -693,6 +693,8 @@ class VPLiveOut : public VPUser {
return true;
}
+ bool onlyFirstLaneUsed(const VPValue *Op) const override;
+
PHINode *getPhi() const { return Phi; }
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5eb99ffd1e10e..f7a9fac8fb3d7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -186,6 +186,13 @@ bool VPRecipeBase::mayHaveSideEffects() const {
}
}
+bool VPLiveOut::onlyFirstLaneUsed(const VPValue *Op) const {
+ assert(is_contained(operands(), Op) &&
+ "Op must be an operand of the recipe");
+
+ return vputils::isUniformAfterVectorization(getOperand(0)) || isa<VPWidenPointerInductionRecipe>(Op);
+}
+
void VPLiveOut::fixPhi(VPlan &Plan, VPTransformState &State) {
auto Lane = VPLane::getLastLaneForVF(State.VF);
VPValue *ExitValue = getOperand(0);
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-optimize-ptr-induction.ll b/llvm/test/Transforms/LoopVectorize/vplan-optimize-ptr-induction.ll
new file mode 100644
index 0000000000000..25d6c64a5fce6
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/vplan-optimize-ptr-induction.ll
@@ -0,0 +1,98 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -scalable-vectorization=on -force-target-supports-scalable-vectors -passes=loop-vectorize < %s -S | FileCheck %s
+define ptr @foo(ptr %y, float %alpha, i32 %N) {
+; CHECK-LABEL: define ptr @foo(
+; CHECK-SAME: ptr [[Y:%.*]], float [[ALPHA:%.*]], i32 [[N:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: [[CMP3:%.*]] = icmp sgt i32 [[N]], 0
+; CHECK-NEXT: br i1 [[CMP3]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; CHECK: [[FOR_BODY_PREHEADER]]:
+; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[N]] to i64
+; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], [[TMP0]]
+; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], [[TMP1]]
+; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
+; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[Y]], i64 [[N_VEC]]
+; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 1 x float> poison, float [[ALPHA]], i64 0
+; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x float> [[BROADCAST_SPLATINSERT]], <vscale x 1 x float> poison, <vscale x 1 x i32> zeroinitializer
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[TMP6:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT: [[TMP3:%.*]] = call <vscale x 1 x i64> @llvm.experimental.stepvector.nxv1i64()
+; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 1 x i64> poison, i64 [[TMP6]], i64 0
+; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[DOTSPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP8:%.*]] = add <vscale x 1 x i64> zeroinitializer, [[TMP3]]
+; CHECK-NEXT: [[VECTOR_GEP:%.*]] = mul <vscale x 1 x i64> [[TMP8]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 1, i64 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)
+; CHECK-NEXT: [[TMP9:%.*]] = add <vscale x 1 x i64> [[DOTSPLAT]], [[VECTOR_GEP]]
+; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[TMP6]], 0
+; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[Y]], i64 [[TMP7]]
+; CHECK-NEXT: [[TMP10:%.*]] = add i64 [[TMP6]], 0
+; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds float, ptr [[Y]], i64 [[TMP10]]
+; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds float, ptr [[TMP11]], i32 0
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 1 x float>, ptr [[TMP12]], align 4
+; CHECK-NEXT: [[TMP13:%.*]] = fadd fast <vscale x 1 x float> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
+; CHECK-NEXT: store <vscale x 1 x float> [[TMP13]], ptr [[TMP12]], align 4
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[TMP6]], [[TMP2]]
+; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
+; CHECK-NEXT: [[CMO:%.*]] = sub i64 [[N_VEC]], 1
+; CHECK-NEXT: [[IND_ESCAPE:%.*]] = getelementptr i8, ptr [[Y]], i64 [[CMO]]
+; CHECK-NEXT: br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; CHECK: [[SCALAR_PH]]:
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[Y]], %[[FOR_BODY_PREHEADER]] ]
+; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY_PREHEADER]] ]
+; CHECK-NEXT: br label %[[FOR_BODY:.*]]
+; CHECK: [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; CHECK-NEXT: [[END_0_LCSSA:%.*]] = phi ptr [ [[END_0:%.*]], %[[FOR_BODY]] ], [ [[IND_ESCAPE]], %[[MIDDLE_BLOCK]] ]
+; CHECK-NEXT: br label %[[FOR_COND_CLEANUP]]
+; CHECK: [[FOR_COND_CLEANUP]]:
+; CHECK-NEXT: [[RESULT:%.*]] = phi ptr [ [[Y]], %[[ENTRY]] ], [ [[END_0_LCSSA]], %[[FOR_COND_CLEANUP_LOOPEXIT]] ]
+; CHECK-NEXT: ret ptr [[RESULT]]
+; CHECK: [[FOR_BODY]]:
+; CHECK-NEXT: [[END_0]] = phi ptr [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INCDEC_PTR:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL1]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[Y]], i64 [[INDVARS_IV]]
+; CHECK-NEXT: [[TMP15:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP15]], [[ALPHA]]
+; CHECK-NEXT: store float [[ADD]], ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[END_0]], i64 1
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+;
+entry:
+ %cmp3 = icmp sgt i32 %N, 0
+ br i1 %cmp3, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:
+ %wide.trip.count = zext nneg i32 %N to i64
+ br label %for.body
+
+for.cond.cleanup:
+ %result = phi ptr [ %y, %entry ], [ %end.0, %for.body ]
+ ret ptr %result
+
+for.body:
+ %end.0 = phi ptr [ %y, %for.body.preheader ], [ %incdec.ptr, %for.body ]
+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+ %arrayidx = getelementptr inbounds float, ptr %y, i64 %indvars.iv
+ %0 = load float, ptr %arrayidx, align 4
+ %add = fadd fast float %0, %alpha
+ store float %add, ptr %arrayidx, align 4
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+ %incdec.ptr = getelementptr inbounds i8, ptr %end.0, i64 1
+ br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+;.
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Following up on llvm#83068, when scalarizing VPWidenPointerInductionRecipe, The `onlyScalarsGenerated` checks whether VF is scalable. With scalable VF, it requires all user to use the first lane only. However if any user happens to be VPLiveOut, the check inevitably fails. This patch addresses this by implementing onlyFirstLaneUsed for the VPLiveOut class. It ensures that if the operand is a VPWidenPointerInductionRecipe, it returns true.
6ebc786
to
7d5cb13
Compare
assert(is_contained(operands(), Op) && "Op must be an operand of the recipe"); | ||
|
||
return vputils::isUniformAfterVectorization(getOperand(0)) || | ||
isa<VPWidenPointerInductionRecipe>(Op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixupIVUsers removes the VPLiveOut after its fix. Given that the end value of IV is derived from TripCount, I just think we might alternatively eliminate the use of ptr-iv by creating a new VPValue for VPLiveOut.
ping |
Is this still needed? |
No, after commit fc9cd32 |
Following up on #83068, when scalarizing VPWidenPointerInductionRecipe,
The
onlyScalarsGenerated
checks whether VF is scalable. With scalableVF, it requires all user to use the first lane only.
However if any user happens to be VPLiveOut, the check inevitably fails.
This patch addresses this by implementing onlyFirstLaneUsed for the
VPLiveOut class.
It ensures that if the operand is a VPWidenPointerInductionRecipe, it
returns true.