-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[VPlan] Don't fold live ins with both scalar and vector operands #154067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] Don't fold live ins with both scalar and vector operands #154067
Conversation
If we end up with a extract_element VPInstruction where both operands are live-ins, we will try to fold the live-ins even though the first operand is vector and the live-in is scalar. I don't think it's possible for us to fold these. We can't create a vector version of the live-in either because we don't know VF at this stage. This removes the handling for opcodes that may have both scalar and vector operands. From some quick testing we previously never hit these folds anyway, and were probably just missing test coverage. Fixes llvm#154045
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-vectorizers Author: Luke Lau (lukel97) ChangesIf we end up with a extract_element VPInstruction where both operands are live-ins, we will try to fold the live-ins even though the first operand is vector and the live-in is scalar. I don't think it's possible for us to fold these. We can't create a vector version of the live-in either because we don't know VF at this stage. This removes the handling for opcodes that may have both scalar and vector operands. From some quick testing we previously never hit these folds anyway, and were probably just missing test coverage. Fixes #154045 Full diff: https://github.com/llvm/llvm-project/pull/154067.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index de0c1e4d177bb..022d9b749f978 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -976,14 +976,9 @@ static Value *tryToFoldLiveIns(const VPRecipeBase &R, unsigned Opcode,
RFlags.getGEPNoWrapFlags());
}
case VPInstruction::PtrAdd:
- case VPInstruction::WidePtrAdd:
return Folder.FoldGEP(IntegerType::getInt8Ty(TypeInfo.getContext()), Ops[0],
Ops[1],
cast<VPRecipeWithIRFlags>(R).getGEPNoWrapFlags());
- case Instruction::InsertElement:
- return Folder.FoldInsertElement(Ops[0], Ops[1], Ops[2]);
- case Instruction::ExtractElement:
- return Folder.FoldExtractElement(Ops[0], Ops[1]);
}
return nullptr;
}
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll b/llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll
new file mode 100644
index 0000000000000..185071707b6e8
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll
@@ -0,0 +1,75 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
+; RUN: opt -p loop-vectorize -S %s | FileCheck %s
+
+; Make sure we don't try to fold a Instruction::ExtractElement ir<0>, ir<0>, ,
+; since we can't materialize the live-in for the vector operand.
+
+target triple = "x86_64"
+
+define void @test(ptr %p, i1 %c, i64 %x) {
+; CHECK-LABEL: define void @test(
+; CHECK-SAME: ptr [[P:%.*]], i1 [[C:%.*]], i64 [[X:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i1> poison, i1 [[C]], i64 0
+; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i1> [[BROADCAST_SPLATINSERT]], <2 x i1> poison, <2 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP0:%.*]] = xor <2 x i1> [[BROADCAST_SPLAT]], splat (i1 true)
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i1> [[TMP0]], i32 0
+; CHECK-NEXT: br i1 [[TMP1]], label %[[PRED_SREM_IF:.*]], label %[[PRED_SREM_CONTINUE:.*]]
+; CHECK: [[PRED_SREM_IF]]:
+; CHECK-NEXT: br label %[[PRED_SREM_CONTINUE]]
+; CHECK: [[PRED_SREM_CONTINUE]]:
+; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP0]], i32 1
+; CHECK-NEXT: br i1 [[TMP2]], label %[[PRED_SREM_IF1:.*]], label %[[PRED_SREM_CONTINUE2:.*]]
+; CHECK: [[PRED_SREM_IF1]]:
+; CHECK-NEXT: br label %[[PRED_SREM_CONTINUE2]]
+; CHECK: [[PRED_SREM_CONTINUE2]]:
+; CHECK-NEXT: store i32 0, ptr [[P]], align 4
+; CHECK-NEXT: br label %[[MIDDLE_BLOCK:.*]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK: [[SCALAR_PH]]:
+; CHECK-NEXT: br label %[[LOOP:.*]]
+; CHECK: [[LOOP]]:
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ]
+; CHECK-NEXT: br i1 [[C]], label %[[LATCH]], label %[[ELSE:.*]]
+; CHECK: [[ELSE]]:
+; CHECK-NEXT: [[REM:%.*]] = srem i64 0, [[X]]
+; CHECK-NEXT: br label %[[LATCH]]
+; CHECK: [[LATCH]]:
+; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[REM]], %[[ELSE]] ], [ 0, %[[LOOP]] ]
+; CHECK-NEXT: [[PHI_TRUNC:%.*]] = trunc i64 [[PHI]] to i32
+; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[PHI_TRUNC]], 0
+; CHECK-NEXT: store i32 [[SHL]], ptr [[P]], align 4
+; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
+; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 1
+; CHECK-NEXT: br i1 [[EXITCOND]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
+ br i1 %c, label %latch, label %else
+
+else:
+ %rem = srem i64 0, %x
+ br label %latch
+
+latch:
+ %phi = phi i64 [ %rem, %else], [ 0, %loop ]
+ %phi.trunc = trunc i64 %phi to i32
+ %shl = shl i32 %phi.trunc, 0
+ store i32 %shl, ptr %p
+ %iv.next = add i64 %iv, 1
+ %exitcond = icmp eq i64 %iv, 1
+ br i1 %exitcond, label %exit, label %loop
+
+exit:
+ ret void
+}
|
artagnon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure I understand the problem yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make this test target-independent and combine it with constantfolder.ll, for instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a try but I couldn't get it to trigger without -mtriple x86_64. I think it has something to do with how the srem can't be widened on x86 and needs to be replicated instead. I'm not sure if there's an easy way of forcing that on the command line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to move this to the target-independent file, it probably needs -force-widen-divrem-via-safe-divisor=false passing and the loop must have a trip count of at least 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out it specifically needed a trip count of 2, otherwise a widened canonical IV recipe was created which seemed to throw things off. I've moved it into a separate target-agnostic file in caca6ea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that's strange, it worked as expected for me when I changed it to have trip count of 4 with VF=4. The wide canonical IV is created if the trip count is < VF, as then it tries to fold the tail instead of creating an un-excutable vector loop
llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll
Outdated
Show resolved
Hide resolved
llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll
Outdated
Show resolved
Hide resolved
live-ins are always scalar, and just broadcasted during recipe execution if a vector is needed AFAIK. So currently we try to constant fold |
| RFlags.getGEPNoWrapFlags()); | ||
| } | ||
| case VPInstruction::PtrAdd: | ||
| case VPInstruction::WidePtrAdd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? If both are constants, they both should be scalars, and the fold should just work, and the result re-broadcasted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second operand to WidePtrAdd is a vector. I don't think it will ever end up being a live-in? At least not the way it's used by expandVPWidenPointerInduction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment yes, but that may change n the future. If it is a live-in, the value must be a scalar.
| case Instruction::InsertElement: | ||
| return Folder.FoldInsertElement(Ops[0], Ops[1], Ops[2]); | ||
| case Instruction::ExtractElement: | ||
| return Folder.FoldExtractElement(Ops[0], Ops[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could still fold extracts if we have a constant for Ops[0], for which we could create a vector constant fairly easily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't create a vector constant because we don't know the exact VF, so we can't create a vector type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so extracting any lane will return the live-in operand, which we can do w/o calling the folder, together with comment for why we are not calling the folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I don't think we can do that for InsertElement though, so are we ok to remove that fold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to move this to the target-independent file, it probably needs -force-widen-divrem-via-safe-divisor=false passing and the loop must have a trip count of at least 4
If we end up with a extract_element VPInstruction where both operands are live-ins, we will try to fold the live-ins even though the first operand is a vector whilst the live-in is scalar.
This fixes it by just returning the vector live-in instead of calling the folder, and removes the handling for insertelement where we aren't able to do the fold. From some quick testing we previously never hit this fold anyway, and were probably just missing test coverage.
Fixes #154045