-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LV] Limits the splat operations be hoisted must not be defined by a recipe. #117138
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Mel Chen (Mel-Chen) ChangesFull diff: https://github.com/llvm/llvm-project/pull/117138.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 8b1a4aeb88f81f..32996426c28490 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -266,7 +266,7 @@ Value *VPTransformState::get(VPValue *Def, bool NeedsScalar) {
return Data.VPV2Vector[Def];
auto GetBroadcastInstrs = [this, Def](Value *V) {
- bool SafeToHoist = Def->isDefinedOutsideLoopRegions();
+ bool SafeToHoist = !Def->hasDefiningRecipe();
if (VF.isScalar())
return V;
// Place the code for broadcasting invariant variables in the new preheader.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll b/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
index aa78113ebaa48c..b1c202eab9dd3d 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
@@ -23,9 +23,9 @@
; Check that the extractvalue operands are actually free in vector code.
; FORCED: [[E1:%.+]] = extractvalue { i64, i64 } %sv, 0
-; FORCED-NEXT: [[E2:%.+]] = extractvalue { i64, i64 } %sv, 1
; FORCED-NEXT: %broadcast.splatinsert = insertelement <2 x i64> poison, i64 [[E1]], i64 0
; FORCED-NEXT: %broadcast.splat = shufflevector <2 x i64> %broadcast.splatinsert, <2 x i64> poison, <2 x i32> zeroinitializer
+; FORCED-NEXT: [[E2:%.+]] = extractvalue { i64, i64 } %sv, 1
; FORCED-NEXT: %broadcast.splatinsert1 = insertelement <2 x i64> poison, i64 [[E2]], i64 0
; FORCED-NEXT: %broadcast.splat2 = shufflevector <2 x i64> %broadcast.splatinsert1, <2 x i64> poison, <2 x i32> zeroinitializer
; FORCED-NEXT: [[ADD:%.+]] = add <2 x i64> %broadcast.splat, %broadcast.splat2
@@ -75,9 +75,9 @@ declare float @powf(float, float) readnone nounwind
; FORCED-LABEL: define void @test_getVectorCallCost
; FORCED: [[E1:%.+]] = extractvalue { float, float } %sv, 0
-; FORCED-NEXT: [[E2:%.+]] = extractvalue { float, float } %sv, 1
; FORCED-NEXT: %broadcast.splatinsert = insertelement <2 x float> poison, float [[E1]], i64 0
; FORCED-NEXT: %broadcast.splat = shufflevector <2 x float> %broadcast.splatinsert, <2 x float> poison, <2 x i32> zeroinitializer
+; FORCED-NEXT: [[E2:%.+]] = extractvalue { float, float } %sv, 1
; FORCED-NEXT: %broadcast.splatinsert1 = insertelement <2 x float> poison, float [[E2]], i64 0
; FORCED-NEXT: %broadcast.splat2 = shufflevector <2 x float> %broadcast.splatinsert1, <2 x float> poison, <2 x i32> zeroinitializer
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll
index 7778f01c58dc34..fb5087db254b23 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll
@@ -5,9 +5,9 @@ target triple = "aarch64-unknown-linux-gnu"
define void @widen_extractvalue(ptr %dst, {i64, i64} %sv) #0 {
; CHECK-LABEL: @widen_extractvalue(
; CHECK: [[EXTRACT0:%.*]] = extractvalue { i64, i64 } [[SV:%.*]], 0
-; CHECK-NEXT: [[EXTRACT1:%.*]] = extractvalue { i64, i64 } [[SV]], 1
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[EXTRACT0]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
+; CHECK-NEXT: [[EXTRACT1:%.*]] = extractvalue { i64, i64 } [[SV]], 1
; CHECK-NEXT: [[DOTSPLATINSERT1:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[EXTRACT1]], i64 0
; CHECK-NEXT: [[DOTSPLAT2:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT1]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
; CHECK: [[ADD:%.*]] = add <vscale x 2 x i64> [[DOTSPLAT]], [[DOTSPLAT2]]
diff --git a/llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll b/llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll
index e94bd841360256..7840a9dec794b3 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll
@@ -137,6 +137,8 @@ define void @test_induction_step_needs_expansion(ptr noalias %j, ptr %k, i64 %l,
; CHECK-LABEL: @test_induction_step_needs_expansion(
; CHECK-NEXT: iter.check:
; CHECK-NEXT: [[TMP0:%.*]] = sub i16 0, [[OFF:%.*]]
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i16> poison, i16 [[TMP0]], i64 0
+; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[BROADCAST_SPLATINSERT]], <16 x i16> poison, <16 x i32> zeroinitializer
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[L:%.*]], 8
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH:%.*]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.*]]
; CHECK: vector.main.loop.iter.check:
@@ -145,8 +147,6 @@ define void @test_induction_step_needs_expansion(ptr noalias %j, ptr %k, i64 %l,
; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[L]], 64
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[L]], [[N_MOD_VF]]
-; CHECK-NEXT: [[DOTSPLATINSERT2:%.*]] = insertelement <16 x i16> poison, i16 [[TMP0]], i64 0
-; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[DOTSPLATINSERT2]], <16 x i16> poison, <16 x i32> zeroinitializer
; CHECK-NEXT: [[TMP1:%.*]] = mul <16 x i16> splat (i16 16), [[TMP2]]
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <16 x i16> poison, i16 [[TMP0]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <16 x i16> [[DOTSPLATINSERT]], <16 x i16> poison, <16 x i32> zeroinitializer
|
How about adding a VPDominatorTree to VPTransformState, using that to check for dominance? VPlan's CFG shouldn't change during execution, so no need to worry about updates |
ca00f44
to
2153bf3
Compare
@@ -389,6 +391,9 @@ struct VPTransformState { | |||
|
|||
/// VPlan-based type analysis. | |||
VPTypeAnalysis TypeAnalysis; | |||
|
|||
/// VPlan-based dominator tree. | |||
VPDominatorTree *VPDT = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fhahn I attempted to use VPDominatorTree instead of VPDominatorTree *. However, there's a circular dependency between VPlan.h and VPlanDominatorTree.h. I've been trying to break the dependency for a few days but haven't had much success. Do you have any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, let me check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fhahn Any news on this? Or could we proceed with using the VPDominatorTree pointer?
@@ -267,7 +271,11 @@ Value *VPTransformState::get(VPValue *Def, bool NeedsScalar) { | |||
return Data.VPV2Vector[Def]; | |||
|
|||
auto GetBroadcastInstrs = [this, Def](Value *V) { | |||
bool SafeToHoist = Def->isDefinedOutsideLoopRegions(); | |||
bool SafeToHoist = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there might be a bug in a function called licm
in VPlanTransforms.cpp. Perhaps worth fixing that in this patch too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
licm
only operates on the loop region, so we should not visit recipes in the middle block (or successors of it) and using isDefinedOutsideLoopRegions
should be sufficient?
2153bf3
to
4b77adf
Compare
Ping @fhahn, do you have any thoughts on circular dependency between VPlan.h and VPlanDominatorTree.h? Or could we proceed with using the VPDominatorTree pointer? |
4b77adf
to
40c2dee
Compare
@fhahn 40c2dee
And also, could you please take a look the issue about circular dependency between VPlan.h and VPlanDominatorTree.h? |
// Update VPDominatorTree since VPBasicBlock may be removed after State wsa | ||
// constucted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some typos in the comment: wsa & constucted
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext, VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate header to reduce the size of widely used VPlan.h. This is a first step towards more cleanly separating declarations in VPlan. Besides reducing VPlan.h's size, this also allows including additional VPlan-related headers in VPlanHelpers.h for use there. An example is using VPDominatorTree in VPTransformState (llvm#117138).
I think it should be possible to organize the declarations in a better way to enable this (with the main other benefit being that it also reduces the size of VPlan.h): #124104 |
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext, VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate header to reduce the size of widely used VPlan.h. This is a first step towards more cleanly separating declarations in VPlan. Besides reducing VPlan.h's size, this also allows including additional VPlan-related headers in VPlanHelpers.h for use there. An example is using VPDominatorTree in VPTransformState (llvm#117138).
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext, VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate header to reduce the size of widely used VPlan.h. This is a first step towards more cleanly separating declarations in VPlan. Besides reducing VPlan.h's size, this also allows including additional VPlan-related headers in VPlanHelpers.h for use there. An example is using VPDominatorTree in VPTransformState (llvm#117138).
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext, VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate header to reduce the size of widely used VPlan.h. This is a first step towards more cleanly separating declarations in VPlan. Besides reducing VPlan.h's size, this also allows including additional VPlan-related headers in VPlanHelpers.h for use there. An example is using VPDominatorTree in VPTransformState (#117138). PR: #124104
…(#124104) Nothing in VPlan.h directly depends on VPTransformState, VPCostContext, VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate header to reduce the size of widely used VPlan.h. This is a first step towards more cleanly separating declarations in VPlan. Besides reducing VPlan.h's size, this also allows including additional VPlan-related headers in VPlanHelpers.h for use there. An example is using VPDominatorTree in VPTransformState (llvm/llvm-project#117138). PR: llvm/llvm-project#124104
Issue #117139