-
Notifications
You must be signed in to change notification settings - Fork 14.1k
[VPlan] Update VPInst::onlyFirstLaneUsed to check users. #80269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A VPInstruction only has its first lane used if all users use its first lane only. Use vputils::onlyFirstLaneUsed to continue checking the recipe's users to handle more cases. Besides allowing additional introduction of scalar steps when interleaving in some cases, this also enables using an Add VPInstruction to model the increment.
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesA VPInstruction only has its first lane used if all users use its first lane only. Use vputils::onlyFirstLaneUsed to continue checking the recipe's users to handle more cases. Besides allowing additional introduction of scalar steps when interleaving in some cases, this also enables using an Add VPInstruction to model the increment. Patch is 22.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80269.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index a1bd6aaf0e551..1ca2cfef447f6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1397,9 +1397,9 @@ void VPSlotTracker::assignSlots(const VPBasicBlock *VPBB) {
assignSlot(Def);
}
-bool vputils::onlyFirstLaneUsed(VPValue *Def) {
+bool vputils::onlyFirstLaneUsed(const VPValue *Def) {
return all_of(Def->users(),
- [Def](VPUser *U) { return U->onlyFirstLaneUsed(Def); });
+ [Def](const VPUser *U) { return U->onlyFirstLaneUsed(Def); });
}
bool vputils::onlyFirstPartUsed(VPValue *Def) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 20792cb9ac7c1..30dc521947b3b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1256,23 +1256,7 @@ class VPInstruction : public VPRecipeWithIRFlags {
}
}
- /// Returns true if the recipe only uses the first lane of operand \p Op.
- bool onlyFirstLaneUsed(const VPValue *Op) const override {
- assert(is_contained(operands(), Op) &&
- "Op must be an operand of the recipe");
- if (getOperand(0) != Op)
- return false;
- switch (getOpcode()) {
- default:
- return false;
- case VPInstruction::ActiveLaneMask:
- case VPInstruction::CalculateTripCountMinusVF:
- case VPInstruction::CanonicalIVIncrementForPart:
- case VPInstruction::BranchOnCount:
- return true;
- };
- llvm_unreachable("switch should return");
- }
+ bool onlyFirstLaneUsed(const VPValue *Op) const override;
/// Returns true if the recipe only uses the first part of operand \p Op.
bool onlyFirstPartUsed(const VPValue *Op) const override {
@@ -3385,7 +3369,7 @@ class VPlanSlp {
namespace vputils {
/// Returns true if only the first lane of \p Def is used.
-bool onlyFirstLaneUsed(VPValue *Def);
+bool onlyFirstLaneUsed(const VPValue *Def);
/// Returns true if only the first part of \p Def is used.
bool onlyFirstPartUsed(VPValue *Def);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index e51184b0dd1fe..21b8d1eb77bf9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -515,6 +515,24 @@ void VPInstruction::execute(VPTransformState &State) {
State.set(this, GeneratedValue, Part);
}
}
+bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const {
+ assert(is_contained(operands(), Op) && "Op must be an operand of the recipe");
+ if (Instruction::isBinaryOp(getOpcode()))
+ return vputils::onlyFirstLaneUsed(this);
+
+ switch (getOpcode()) {
+ default:
+ return false;
+ case Instruction::ICmp:
+ return vputils::onlyFirstLaneUsed(this);
+ case VPInstruction::ActiveLaneMask:
+ case VPInstruction::CalculateTripCountMinusVF:
+ case VPInstruction::CanonicalIVIncrementForPart:
+ case VPInstruction::BranchOnCount:
+ return getOperand(0) == Op;
+ };
+ llvm_unreachable("switch should return");
+}
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
void VPInstruction::dump() const {
diff --git a/llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll b/llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
index e81fb66239bd4..f05ec30619c5d 100644
--- a/llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
+++ b/llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
@@ -67,7 +67,7 @@ define void @pr45679(ptr %A) optsize {
; CHECK-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1
; CHECK-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1
; CHECK-NEXT: [[COND:%.*]] = icmp eq i32 [[RIVPLUS1]], 14
-; CHECK-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP2:![0-9]+]]
+; CHECK-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
; CHECK: exit:
; CHECK-NEXT: ret void
;
@@ -129,7 +129,7 @@ define void @pr45679(ptr %A) optsize {
; VF2UF2-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1
; VF2UF2-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1
; VF2UF2-NEXT: [[COND:%.*]] = icmp eq i32 [[RIVPLUS1]], 14
-; VF2UF2-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP2:![0-9]+]]
+; VF2UF2-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
; VF2UF2: exit:
; VF2UF2-NEXT: ret void
;
@@ -139,46 +139,42 @@ define void @pr45679(ptr %A) optsize {
; VF1UF4: vector.ph:
; VF1UF4-NEXT: br label [[VECTOR_BODY:%.*]]
; VF1UF4: vector.body:
-; VF1UF4-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE12:%.*]] ]
-; VF1UF4-NEXT: [[VEC_IV:%.*]] = add i32 [[INDEX]], 0
-; VF1UF4-NEXT: [[VEC_IV4:%.*]] = add i32 [[INDEX]], 1
-; VF1UF4-NEXT: [[VEC_IV5:%.*]] = add i32 [[INDEX]], 2
-; VF1UF4-NEXT: [[VEC_IV6:%.*]] = add i32 [[INDEX]], 3
-; VF1UF4-NEXT: [[TMP0:%.*]] = icmp ule i32 [[VEC_IV]], 13
-; VF1UF4-NEXT: [[TMP1:%.*]] = icmp ule i32 [[VEC_IV4]], 13
-; VF1UF4-NEXT: [[TMP2:%.*]] = icmp ule i32 [[VEC_IV5]], 13
-; VF1UF4-NEXT: [[TMP3:%.*]] = icmp ule i32 [[VEC_IV6]], 13
-; VF1UF4-NEXT: br i1 [[TMP0]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
+; VF1UF4-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE6:%.*]] ]
+; VF1UF4-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
+; VF1UF4-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1
+; VF1UF4-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 2
+; VF1UF4-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 3
+; VF1UF4-NEXT: [[TMP4:%.*]] = icmp ule i32 [[TMP0]], 13
+; VF1UF4-NEXT: [[TMP5:%.*]] = icmp ule i32 [[TMP1]], 13
+; VF1UF4-NEXT: [[TMP6:%.*]] = icmp ule i32 [[TMP2]], 13
+; VF1UF4-NEXT: [[TMP7:%.*]] = icmp ule i32 [[TMP3]], 13
+; VF1UF4-NEXT: br i1 [[TMP4]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
; VF1UF4: pred.store.if:
-; VF1UF4-NEXT: [[INDUCTION:%.*]] = add i32 [[INDEX]], 0
-; VF1UF4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i32 [[INDUCTION]]
-; VF1UF4-NEXT: store i32 13, ptr [[TMP4]], align 1
+; VF1UF4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i32 [[TMP0]]
+; VF1UF4-NEXT: store i32 13, ptr [[TMP8]], align 1
; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE]]
; VF1UF4: pred.store.continue:
-; VF1UF4-NEXT: br i1 [[TMP1]], label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8:%.*]]
-; VF1UF4: pred.store.if4:
-; VF1UF4-NEXT: [[INDUCTION1:%.*]] = add i32 [[INDEX]], 1
-; VF1UF4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[INDUCTION1]]
-; VF1UF4-NEXT: store i32 13, ptr [[TMP5]], align 1
-; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE8]]
-; VF1UF4: pred.store.continue5:
-; VF1UF4-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF9:%.*]], label [[PRED_STORE_CONTINUE10:%.*]]
-; VF1UF4: pred.store.if6:
-; VF1UF4-NEXT: [[INDUCTION2:%.*]] = add i32 [[INDEX]], 2
-; VF1UF4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[INDUCTION2]]
-; VF1UF4-NEXT: store i32 13, ptr [[TMP6]], align 1
-; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE10]]
-; VF1UF4: pred.store.continue7:
-; VF1UF4-NEXT: br i1 [[TMP3]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
-; VF1UF4: pred.store.if8:
-; VF1UF4-NEXT: [[INDUCTION3:%.*]] = add i32 [[INDEX]], 3
-; VF1UF4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[INDUCTION3]]
-; VF1UF4-NEXT: store i32 13, ptr [[TMP7]], align 1
-; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE12]]
-; VF1UF4: pred.store.continue9:
+; VF1UF4-NEXT: br i1 [[TMP5]], label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2:%.*]]
+; VF1UF4: pred.store.if1:
+; VF1UF4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[TMP1]]
+; VF1UF4-NEXT: store i32 13, ptr [[TMP9]], align 1
+; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE2]]
+; VF1UF4: pred.store.continue2:
+; VF1UF4-NEXT: br i1 [[TMP6]], label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]]
+; VF1UF4: pred.store.if3:
+; VF1UF4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[TMP2]]
+; VF1UF4-NEXT: store i32 13, ptr [[TMP10]], align 1
+; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE4]]
+; VF1UF4: pred.store.continue4:
+; VF1UF4-NEXT: br i1 [[TMP7]], label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6]]
+; VF1UF4: pred.store.if5:
+; VF1UF4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[TMP3]]
+; VF1UF4-NEXT: store i32 13, ptr [[TMP11]], align 1
+; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE6]]
+; VF1UF4: pred.store.continue6:
; VF1UF4-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
-; VF1UF4-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
-; VF1UF4-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; VF1UF4-NEXT: [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
+; VF1UF4-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; VF1UF4: middle.block:
; VF1UF4-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
; VF1UF4: scalar.ph:
@@ -190,7 +186,7 @@ define void @pr45679(ptr %A) optsize {
; VF1UF4-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1
; VF1UF4-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1
; VF1UF4-NEXT: [[COND:%.*]] = icmp eq i32 [[RIVPLUS1]], 14
-; VF1UF4-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP2:![0-9]+]]
+; VF1UF4-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
; VF1UF4: exit:
; VF1UF4-NEXT: ret void
;
@@ -356,54 +352,50 @@ define void @load_variant(ptr noalias %a, ptr noalias %b) {
; VF1UF4: vector.ph:
; VF1UF4-NEXT: br label [[VECTOR_BODY:%.*]]
; VF1UF4: vector.body:
-; VF1UF4-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE12:%.*]] ]
-; VF1UF4-NEXT: [[VEC_IV:%.*]] = add i64 [[INDEX]], 0
-; VF1UF4-NEXT: [[VEC_IV4:%.*]] = add i64 [[INDEX]], 1
-; VF1UF4-NEXT: [[VEC_IV5:%.*]] = add i64 [[INDEX]], 2
-; VF1UF4-NEXT: [[VEC_IV6:%.*]] = add i64 [[INDEX]], 3
-; VF1UF4-NEXT: [[TMP0:%.*]] = icmp ule i64 [[VEC_IV]], 13
-; VF1UF4-NEXT: [[TMP1:%.*]] = icmp ule i64 [[VEC_IV4]], 13
-; VF1UF4-NEXT: [[TMP2:%.*]] = icmp ule i64 [[VEC_IV5]], 13
-; VF1UF4-NEXT: [[TMP3:%.*]] = icmp ule i64 [[VEC_IV6]], 13
-; VF1UF4-NEXT: br i1 [[TMP0]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
+; VF1UF4-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE6:%.*]] ]
+; VF1UF4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
+; VF1UF4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
+; VF1UF4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
+; VF1UF4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
+; VF1UF4-NEXT: [[TMP4:%.*]] = icmp ule i64 [[TMP0]], 13
+; VF1UF4-NEXT: [[TMP5:%.*]] = icmp ule i64 [[TMP1]], 13
+; VF1UF4-NEXT: [[TMP6:%.*]] = icmp ule i64 [[TMP2]], 13
+; VF1UF4-NEXT: [[TMP7:%.*]] = icmp ule i64 [[TMP3]], 13
+; VF1UF4-NEXT: br i1 [[TMP4]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
; VF1UF4: pred.store.if:
-; VF1UF4-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0
-; VF1UF4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDUCTION]]
-; VF1UF4-NEXT: [[TMP5:%.*]] = load i64, ptr [[TMP4]], align 8
-; VF1UF4-NEXT: store i64 [[TMP5]], ptr [[B:%.*]], align 8
+; VF1UF4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[TMP0]]
+; VF1UF4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP8]], align 8
+; VF1UF4-NEXT: store i64 [[TMP9]], ptr [[B:%.*]], align 8
; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE]]
; VF1UF4: pred.store.continue:
-; VF1UF4-NEXT: [[TMP6:%.*]] = phi i64 [ poison, [[VECTOR_BODY]] ], [ [[TMP5]], [[PRED_STORE_IF]] ]
-; VF1UF4-NEXT: br i1 [[TMP1]], label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8:%.*]]
-; VF1UF4: pred.store.if4:
-; VF1UF4-NEXT: [[INDUCTION1:%.*]] = add i64 [[INDEX]], 1
-; VF1UF4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDUCTION1]]
-; VF1UF4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP7]], align 8
-; VF1UF4-NEXT: store i64 [[TMP8]], ptr [[B]], align 8
-; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE8]]
-; VF1UF4: pred.store.continue5:
-; VF1UF4-NEXT: [[TMP9:%.*]] = phi i64 [ poison, [[PRED_STORE_CONTINUE]] ], [ [[TMP8]], [[PRED_STORE_IF7]] ]
-; VF1UF4-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF9:%.*]], label [[PRED_STORE_CONTINUE10:%.*]]
-; VF1UF4: pred.store.if6:
-; VF1UF4-NEXT: [[INDUCTION2:%.*]] = add i64 [[INDEX]], 2
-; VF1UF4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDUCTION2]]
-; VF1UF4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP10]], align 8
-; VF1UF4-NEXT: store i64 [[TMP11]], ptr [[B]], align 8
-; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE10]]
-; VF1UF4: pred.store.continue7:
-; VF1UF4-NEXT: [[TMP12:%.*]] = phi i64 [ poison, [[PRED_STORE_CONTINUE8]] ], [ [[TMP11]], [[PRED_STORE_IF9]] ]
-; VF1UF4-NEXT: br i1 [[TMP3]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
-; VF1UF4: pred.store.if8:
-; VF1UF4-NEXT: [[INDUCTION3:%.*]] = add i64 [[INDEX]], 3
-; VF1UF4-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDUCTION3]]
-; VF1UF4-NEXT: [[TMP14:%.*]] = load i64, ptr [[TMP13]], align 8
-; VF1UF4-NEXT: store i64 [[TMP14]], ptr [[B]], align 8
-; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE12]]
-; VF1UF4: pred.store.continue9:
-; VF1UF4-NEXT: [[TMP15:%.*]] = phi i64 [ poison, [[PRED_STORE_CONTINUE10]] ], [ [[TMP14]], [[PRED_STORE_IF11]] ]
+; VF1UF4-NEXT: [[TMP10:%.*]] = phi i64 [ poison, [[VECTOR_BODY]] ], [ [[TMP9]], [[PRED_STORE_IF]] ]
+; VF1UF4-NEXT: br i1 [[TMP5]], label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2:%.*]]
+; VF1UF4: pred.store.if1:
+; VF1UF4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
+; VF1UF4-NEXT: [[TMP12:%.*]] = load i64, ptr [[TMP11]], align 8
+; VF1UF4-NEXT: store i64 [[TMP12]], ptr [[B]], align 8
+; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE2]]
+; VF1UF4: pred.store.continue2:
+; VF1UF4-NEXT: [[TMP13:%.*]] = phi i64 [ poison, [[PRED_STORE_CONTINUE]] ], [ [[TMP12]], [[PRED_STORE_IF1]] ]
+; VF1UF4-NEXT: br i1 [[TMP6]], label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]]
+; VF1UF4: pred.store.if3:
+; VF1UF4-NEXT: [[TMP14:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
+; VF1UF4-NEXT: [[TMP15:%.*]] = load i64, ptr [[TMP14]], align 8
+; VF1UF4-NEXT: store i64 [[TMP15]], ptr [[B]], align 8
+; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE4]]
+; VF1UF4: pred.store.continue4:
+; VF1UF4-NEXT: [[TMP16:%.*]] = phi i64 [ poison, [[PRED_STORE_CONTINUE2]] ], [ [[TMP15]], [[PRED_STORE_IF3]] ]
+; VF1UF4-NEXT: br i1 [[TMP7]], label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6]]
+; VF1UF4: pred.store.if5:
+; VF1UF4-NEXT: [[TMP17:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
+; VF1UF4-NEXT: [[TMP18:%.*]] = load i64, ptr [[TMP17]], align 8
+; VF1UF4-NEXT: store i64 [[TMP18]], ptr [[B]], align 8
+; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE6]]
+; VF1UF4: pred.store.continue6:
+; VF1UF4-NEXT: [[TMP19:%.*]] = phi i64 [ poison, [[PRED_STORE_CONTINUE4]] ], [ [[TMP18]], [[PRED_STORE_IF5]] ]
; VF1UF4-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
-; VF1UF4-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
-; VF1UF4-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; VF1UF4-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
+; VF1UF4-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; VF1UF4: middle.block:
; VF1UF4-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
; VF1UF4: scalar.ph:
@@ -416,7 +408,7 @@ define void @load_variant(ptr noalias %a, ptr noalias %b) {
; VF1UF4-NEXT: store i64 [[V]], ptr [[B]], align 8
; VF1UF4-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
; VF1UF4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 14
-; VF1UF4-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; VF1UF4-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; VF1UF4: for.end:
; VF1UF4-NEXT: ret void
;
diff --git a/llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1.ll b/llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1.ll
index c07512644f721..0c659a550b31e 100644
--- a/llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1.ll
+++ b/llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1.ll
@@ -16,43 +16,39 @@ define void @VF1-VPlanExe(ptr %dst) {
; CHECK: vector.ph:
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
-; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE9:%.*]] ]
-; CHECK-NEXT: [[VEC_IV:%.*]] = add i64 [[INDEX]], 0
-; CHECK-NEXT: [[VEC_IV1:%.*]] = add i64 [[INDEX]], 1
-; CHECK-NEXT: [[VEC_IV2:%.*]] = add i64 [[INDEX]], 2
-; CHECK-NEXT: [[VEC_IV3:%.*]] = add i64 [[INDEX]], 3
-; CHECK-NEXT: [[TMP0:%.*]] = icmp ule i64 [[VEC_IV]], 14
-; CHECK-NEXT: [[TMP1:%.*]] = icmp ule i64 [[VEC_IV1]], 14
-; CHECK-NEXT: [[TMP2:%.*]] = icmp ule i64 [[VEC_IV2]], 14
-; CHECK-NEXT: [[TMP3:%.*]] = icmp ule i64 [[VEC_IV3]], 14
-; CHECK-NEXT: br i1 [[TMP0]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE6:%.*]] ]
+; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
+; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
+; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
+; CHECK-NEXT: [[TMP4:%.*]] = icmp ule i64 [[TMP0]], 14
+; CHECK-NEXT: [[TMP5:%.*]] = icmp ule i64 [[TMP1]], 14
+; CHECK-NEXT: [[TMP6:%.*]] = icmp ule i64 [[TMP2]], 14
+; CHECK-NEXT: [[TMP7:%.*]] = icmp ule i64 [[TMP3]], 14
+; CHECK-NEXT: br i1 [[TMP4]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
; CHECK: pred.store.if:
-; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 0
-; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[DST:%.*]], i64 [[TMP4]]
-; CHECK-NEXT: store i32 0, ptr [[TMP5]], align 4
+; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[DST:%.*]], i64 [[TMP0]]
+; CHECK-NEXT: store i32 0, ptr [[TMP8]], align 4
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
; CHECK: pred.store.continue:
-; CHECK-NEXT: br i1 [[TMP1]], label [[PRED_STORE_IF4:%.*]], label [[PRED_STORE_CONTINUE5:%.*]]
-; CHECK: pred.store.if4:
-; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 1
-; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP6]]
-; CHECK-NEXT: store i32 0, ptr [[TMP7]], align 4
-; CHECK-NEXT: br label [[PRED_STORE_CONTINUE5]]
-; CHECK: pred.store.continue5:
-; CHECK-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF6:%.*]], label [[PRED_STORE_CONTINUE7:%.*]]
-; CHECK: pred.store.if6:
-; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 2
-; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP8]]
+; CHECK-NEXT: br i1 [[TMP5]], label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2:%.*]]
+; CHECK: pred.store.if1:
+; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP1]]
; CHECK-NEXT: store i32 0, ptr [[TMP9]], align 4
-; CHECK-NEXT: br label [[PRED_STORE_CONTINUE7]]
-; CHECK: pred.store.continue7:
-; CHECK-NEXT: br i1 [[TMP3]], label [[PRED_STORE_IF8:%.*]], label [[PRED_STORE_CONTINUE9]]
-; C...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, adding various comments.
The only observable changes to existing tests involve cse'ing IV steps from replicate regions when unrolling w/o vectorizing, iinm. Is there some other scenario worth testing?
"... this also enables using an Add VPInstruction to model the increment" - as a follow-up?
bool vputils::onlyFirstLaneUsed(const VPValue *Def) { | ||
return all_of(Def->users(), | ||
[Def](VPUser *U) { return U->onlyFirstLaneUsed(Def); }); | ||
[Def](const VPUser *U) { return U->onlyFirstLaneUsed(Def); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this NFC of adding const could be pushed separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split off to 6936479, thanks!
return all_of(Def->users(), | ||
[Def](VPUser *U) { return U->onlyFirstLaneUsed(Def); }); | ||
[Def](const VPUser *U) { return U->onlyFirstLaneUsed(Def); }); | ||
} | ||
|
||
bool vputils::onlyFirstPartUsed(VPValue *Def) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can add const here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3444240, thanks!
@@ -1256,23 +1256,7 @@ class VPInstruction : public VPRecipeWithIRFlags { | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: leave the documentation line here in .h even when the implementation moves to .cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added back, thanks!
case VPInstruction::CalculateTripCountMinusVF: | ||
case VPInstruction::CanonicalIVIncrementForPart: | ||
case VPInstruction::BranchOnCount: | ||
return getOperand(0) == Op; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This maintains current functionality, but leaves room for further improvement - to cover additional operands (of ActiveLaneMask, BranchOnCount) and opcodes (Not, Select, BranchOnCond) - worth a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
; VF1UF4-NEXT: [[INDUCTION:%.*]] = add i32 [[INDEX]], 0 | ||
; VF1UF4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i32 [[INDUCTION]] | ||
; VF1UF4-NEXT: store i32 13, ptr [[TMP4]], align 1 | ||
; VF1UF4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i32 [[TMP0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting: this is cse'ing TMP0 (as with TMP1,2,3 below), compared to currently re-computing it. In this case, should be dropped being a zero-add.
I wasn't able to come up with a different test with vectorizing. |
Split off llvm#80269 as suggested.
Split off llvm#80269 as suggested.
A VPInstruction only has its first lane used if all users use its first lane only. Use vputils::onlyFirstLaneUsed to continue checking the recipe's users to handle more cases. Besides allowing additional introduction of scalar steps when interleaving in some cases, this also enables using an Add VPInstruction to model the increment - as a follow up.
At the moment, some VPInstructions create only a single scalar value, but use VPTransformatState's 'vector' storage for this value. Those values are effectively uniform-per-VF (or in some cases uniform-across-VF-and-UF). Using the vector/per-part storage doesn't interact well with other recipes, that more accurately using (Part, Lane) to look up scalar values and prevents VPInstructions creating scalars from interacting with other recipes working with scalars. This PR tries to unify handling of scalars by using (Part, 0) for scalar values where only the first lane is demanded. This allows using VPInstructions with other recipes like VPScalarCastRecipe and is also needed when using VPInstructions in more cases otuside the vector loop region to generate scalars. Depends on #80269
A VPInstruction only has its first lane used if all users use its first lane only. Use vputils::onlyFirstLaneUsed to continue checking the recipe's users to handle more cases.
Besides allowing additional introduction of scalar steps when interleaving in some cases, this also enables using an Add VPInstruction to model the increment - as a follow up.