Skip to content

[VPlan] Implementation of onlyFirstLaneUsed for VPLiveOut class #93513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

arcbbb
Copy link
Contributor

@arcbbb arcbbb commented May 28, 2024

Following up on #83068, when scalarizing VPWidenPointerInductionRecipe,
The onlyScalarsGenerated checks whether VF is scalable. With scalable
VF, it requires all user to use the first lane only.
However if any user happens to be VPLiveOut, the check inevitably fails.

This patch addresses this by implementing onlyFirstLaneUsed for the
VPLiveOut class.
It ensures that if the operand is a VPWidenPointerInductionRecipe, it
returns true.

@llvmbot
Copy link
Member

llvmbot commented May 28, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Shih-Po Hung (arcbbb)

Changes

Following up on #83068, when scalarizing VPWidenPointerInductionRecipe,
The onlyScalarsGenerated checks whether VF is scalable. With scalable
VF, it requires all user to use the first lane only.
However if any user happens to be VPLiveOut, the check inevitably fails.

This patch addresses this by implementing onlyFirstLaneUsed for the
VPLiveOut class.
It ensures that if the operand is a VPWidenPointerInductionRecipe, it
returns true.


Full diff: https://github.com/llvm/llvm-project/pull/93513.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+7)
  • (added) llvm/test/Transforms/LoopVectorize/vplan-optimize-ptr-induction.ll (+98)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e75a1de548f7d..a0140f64eb643 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -693,6 +693,8 @@ class VPLiveOut : public VPUser {
     return true;
   }
 
+  bool onlyFirstLaneUsed(const VPValue *Op) const override;
+
   PHINode *getPhi() const { return Phi; }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5eb99ffd1e10e..f7a9fac8fb3d7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -186,6 +186,13 @@ bool VPRecipeBase::mayHaveSideEffects() const {
   }
 }
 
+bool VPLiveOut::onlyFirstLaneUsed(const VPValue *Op) const {
+  assert(is_contained(operands(), Op) &&
+   "Op must be an operand of the recipe");
+
+  return vputils::isUniformAfterVectorization(getOperand(0)) || isa<VPWidenPointerInductionRecipe>(Op);
+}
+
 void VPLiveOut::fixPhi(VPlan &Plan, VPTransformState &State) {
   auto Lane = VPLane::getLastLaneForVF(State.VF);
   VPValue *ExitValue = getOperand(0);
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-optimize-ptr-induction.ll b/llvm/test/Transforms/LoopVectorize/vplan-optimize-ptr-induction.ll
new file mode 100644
index 0000000000000..25d6c64a5fce6
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/vplan-optimize-ptr-induction.ll
@@ -0,0 +1,98 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -scalable-vectorization=on  -force-target-supports-scalable-vectors -passes=loop-vectorize < %s -S | FileCheck %s
+define ptr @foo(ptr %y, float %alpha, i32 %N) {
+; CHECK-LABEL: define ptr @foo(
+; CHECK-SAME: ptr [[Y:%.*]], float [[ALPHA:%.*]], i32 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    [[CMP3:%.*]] = icmp sgt i32 [[N]], 0
+; CHECK-NEXT:    br i1 [[CMP3]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; CHECK:       [[FOR_BODY_PREHEADER]]:
+; CHECK-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext nneg i32 [[N]] to i64
+; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], [[TMP1]]
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
+; CHECK-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[Y]], i64 [[N_VEC]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 1 x float> poison, float [[ALPHA]], i64 0
+; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x float> [[BROADCAST_SPLATINSERT]], <vscale x 1 x float> poison, <vscale x 1 x i32> zeroinitializer
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call <vscale x 1 x i64> @llvm.experimental.stepvector.nxv1i64()
+; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 1 x i64> poison, i64 [[TMP6]], i64 0
+; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[DOTSPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP8:%.*]] = add <vscale x 1 x i64> zeroinitializer, [[TMP3]]
+; CHECK-NEXT:    [[VECTOR_GEP:%.*]] = mul <vscale x 1 x i64> [[TMP8]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 1, i64 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)
+; CHECK-NEXT:    [[TMP9:%.*]] = add <vscale x 1 x i64> [[DOTSPLAT]], [[VECTOR_GEP]]
+; CHECK-NEXT:    [[TMP7:%.*]] = add i64 [[TMP6]], 0
+; CHECK-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[Y]], i64 [[TMP7]]
+; CHECK-NEXT:    [[TMP10:%.*]] = add i64 [[TMP6]], 0
+; CHECK-NEXT:    [[TMP11:%.*]] = getelementptr inbounds float, ptr [[Y]], i64 [[TMP10]]
+; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds float, ptr [[TMP11]], i32 0
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 1 x float>, ptr [[TMP12]], align 4
+; CHECK-NEXT:    [[TMP13:%.*]] = fadd fast <vscale x 1 x float> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
+; CHECK-NEXT:    store <vscale x 1 x float> [[TMP13]], ptr [[TMP12]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[TMP6]], [[TMP2]]
+; CHECK-NEXT:    [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
+; CHECK-NEXT:    [[CMO:%.*]] = sub i64 [[N_VEC]], 1
+; CHECK-NEXT:    [[IND_ESCAPE:%.*]] = getelementptr i8, ptr [[Y]], i64 [[CMO]]
+; CHECK-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[Y]], %[[FOR_BODY_PREHEADER]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL1:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY_PREHEADER]] ]
+; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
+; CHECK:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; CHECK-NEXT:    [[END_0_LCSSA:%.*]] = phi ptr [ [[END_0:%.*]], %[[FOR_BODY]] ], [ [[IND_ESCAPE]], %[[MIDDLE_BLOCK]] ]
+; CHECK-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; CHECK:       [[FOR_COND_CLEANUP]]:
+; CHECK-NEXT:    [[RESULT:%.*]] = phi ptr [ [[Y]], %[[ENTRY]] ], [ [[END_0_LCSSA]], %[[FOR_COND_CLEANUP_LOOPEXIT]] ]
+; CHECK-NEXT:    ret ptr [[RESULT]]
+; CHECK:       [[FOR_BODY]]:
+; CHECK-NEXT:    [[END_0]] = phi ptr [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INCDEC_PTR:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL1]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[Y]], i64 [[INDVARS_IV]]
+; CHECK-NEXT:    [[TMP15:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[ADD:%.*]] = fadd fast float [[TMP15]], [[ALPHA]]
+; CHECK-NEXT:    store float [[ADD]], ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[END_0]], i64 1
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+;
+entry:
+  %cmp3 = icmp sgt i32 %N, 0
+  br i1 %cmp3, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:
+  %wide.trip.count = zext nneg i32 %N to i64
+  br label %for.body
+
+for.cond.cleanup:
+  %result = phi ptr [ %y, %entry ], [ %end.0, %for.body ]
+  ret ptr %result
+
+for.body:
+  %end.0 = phi ptr [ %y, %for.body.preheader ], [ %incdec.ptr, %for.body ]
+  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds float, ptr %y, i64 %indvars.iv
+  %0 = load float, ptr %arrayidx, align 4
+  %add = fadd fast float %0, %alpha
+  store float %add, ptr %arrayidx, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+  %incdec.ptr = getelementptr inbounds i8, ptr %end.0, i64 1
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+;.

Copy link

github-actions bot commented May 28, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Following up on llvm#83068, when scalarizing VPWidenPointerInductionRecipe,
The `onlyScalarsGenerated` checks whether VF is scalable. With scalable
VF, it requires all user to use the first lane only.
However if any user happens to be VPLiveOut, the check inevitably fails.

This patch addresses this by implementing onlyFirstLaneUsed for the
VPLiveOut class.
It ensures that if the operand is a VPWidenPointerInductionRecipe, it
returns true.
@arcbbb arcbbb force-pushed the vpliveout-onlyfirstlane branch from 6ebc786 to 7d5cb13 Compare May 28, 2024 08:50
assert(is_contained(operands(), Op) && "Op must be an operand of the recipe");

return vputils::isUniformAfterVectorization(getOperand(0)) ||
isa<VPWidenPointerInductionRecipe>(Op);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixupIVUsers removes the VPLiveOut after its fix. Given that the end value of IV is derived from TripCount, I just think we might alternatively eliminate the use of ptr-iv by creating a new VPValue for VPLiveOut.

@arcbbb
Copy link
Contributor Author

arcbbb commented Jul 8, 2024

ping

@fhahn
Copy link
Contributor

fhahn commented Sep 12, 2024

Is this still needed?

@arcbbb
Copy link
Contributor Author

arcbbb commented Sep 26, 2024

Is this still needed?

No, after commit fc9cd32 [VPlan] Don't add live-outs for IV phis the need is gone. Thanks!

@arcbbb arcbbb closed this Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants