-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[VPlan] Don't fold live ins with both scalar and vector operands #154067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
4c0ebae
2d656bf
5dc33bc
caca6ea
46ddab5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -976,14 +976,9 @@ static Value *tryToFoldLiveIns(const VPRecipeBase &R, unsigned Opcode, | |
| RFlags.getGEPNoWrapFlags()); | ||
| } | ||
| case VPInstruction::PtrAdd: | ||
| case VPInstruction::WidePtrAdd: | ||
| return Folder.FoldGEP(IntegerType::getInt8Ty(TypeInfo.getContext()), Ops[0], | ||
| Ops[1], | ||
| cast<VPRecipeWithIRFlags>(R).getGEPNoWrapFlags()); | ||
| case Instruction::InsertElement: | ||
| return Folder.FoldInsertElement(Ops[0], Ops[1], Ops[2]); | ||
| case Instruction::ExtractElement: | ||
| return Folder.FoldExtractElement(Ops[0], Ops[1]); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess we could still fold extracts if we have a constant for Ops[0], for which we could create a vector constant fairly easily
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't create a vector constant because we don't know the exact VF, so we can't create a vector type
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, so extracting any lane will return the live-in operand, which we can do w/o calling the folder, together with comment for why we are not calling the folder
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. I don't think we can do that for InsertElement though, so are we ok to remove that fold? |
||
| } | ||
| return nullptr; | ||
| } | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to make this test target-independent and combine it with constantfolder.ll, for instance?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I gave this a try but I couldn't get it to trigger without -mtriple x86_64. I think it has something to do with how the srem can't be widened on x86 and needs to be replicated instead. I'm not sure if there's an easy way of forcing that on the command line
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You should be able to move this to the target-independent file, it probably needs
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Turns out it specifically needed a trip count of 2, otherwise a widened canonical IV recipe was created which seemed to throw things off. I've moved it into a separate target-agnostic file in caca6ea
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok that's strange, it worked as expected for me when I changed it to have trip count of 4 with VF=4. The wide canonical IV is created if the trip count is < VF, as then it tries to fold the tail instead of creating an un-excutable vector loop |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5 | ||
| ; RUN: opt -p loop-vectorize -S %s | FileCheck %s | ||
|
|
||
| ; Make sure we don't try to fold a Instruction::ExtractElement ir<0>, ir<0>, | ||
| ; since we can't materialize the live-in for the vector operand. | ||
|
|
||
| target triple = "x86_64" | ||
|
|
||
| define void @test(ptr %p, i1 %c, i64 %x) { | ||
lukel97 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ; CHECK-LABEL: define void @test( | ||
| ; CHECK-SAME: ptr [[P:%.*]], i1 [[C:%.*]], i64 [[X:%.*]]) { | ||
| ; CHECK-NEXT: [[ENTRY:.*:]] | ||
| ; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]] | ||
| ; CHECK: [[VECTOR_PH]]: | ||
| ; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i1> poison, i1 [[C]], i64 0 | ||
| ; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i1> [[BROADCAST_SPLATINSERT]], <2 x i1> poison, <2 x i32> zeroinitializer | ||
| ; CHECK-NEXT: [[TMP0:%.*]] = xor <2 x i1> [[BROADCAST_SPLAT]], splat (i1 true) | ||
| ; CHECK-NEXT: br label %[[VECTOR_BODY:.*]] | ||
| ; CHECK: [[VECTOR_BODY]]: | ||
| ; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i1> [[TMP0]], i32 0 | ||
| ; CHECK-NEXT: br i1 [[TMP1]], label %[[PRED_SREM_IF:.*]], label %[[PRED_SREM_CONTINUE:.*]] | ||
| ; CHECK: [[PRED_SREM_IF]]: | ||
| ; CHECK-NEXT: br label %[[PRED_SREM_CONTINUE]] | ||
| ; CHECK: [[PRED_SREM_CONTINUE]]: | ||
| ; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP0]], i32 1 | ||
| ; CHECK-NEXT: br i1 [[TMP2]], label %[[PRED_SREM_IF1:.*]], label %[[PRED_SREM_CONTINUE2:.*]] | ||
| ; CHECK: [[PRED_SREM_IF1]]: | ||
| ; CHECK-NEXT: br label %[[PRED_SREM_CONTINUE2]] | ||
| ; CHECK: [[PRED_SREM_CONTINUE2]]: | ||
| ; CHECK-NEXT: store i32 0, ptr [[P]], align 4 | ||
| ; CHECK-NEXT: br label %[[MIDDLE_BLOCK:.*]] | ||
| ; CHECK: [[MIDDLE_BLOCK]]: | ||
| ; CHECK-NEXT: br label %[[EXIT:.*]] | ||
| ; CHECK: [[SCALAR_PH]]: | ||
| ; CHECK-NEXT: br label %[[LOOP:.*]] | ||
| ; CHECK: [[LOOP]]: | ||
| ; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ] | ||
| ; CHECK-NEXT: br i1 [[C]], label %[[LATCH]], label %[[ELSE:.*]] | ||
| ; CHECK: [[ELSE]]: | ||
| ; CHECK-NEXT: [[REM:%.*]] = srem i64 0, [[X]] | ||
| ; CHECK-NEXT: br label %[[LATCH]] | ||
| ; CHECK: [[LATCH]]: | ||
| ; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[REM]], %[[ELSE]] ], [ 0, %[[LOOP]] ] | ||
| ; CHECK-NEXT: [[PHI_TRUNC:%.*]] = trunc i64 [[PHI]] to i32 | ||
| ; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[PHI_TRUNC]], 0 | ||
| ; CHECK-NEXT: store i32 [[SHL]], ptr [[P]], align 4 | ||
| ; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1 | ||
| ; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 1 | ||
| ; CHECK-NEXT: br i1 [[EXITCOND]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP0:![0-9]+]] | ||
| ; CHECK: [[EXIT]]: | ||
| ; CHECK-NEXT: ret void | ||
| ; | ||
| entry: | ||
| br label %loop | ||
|
|
||
| loop: | ||
| %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ] | ||
| br i1 %c, label %latch, label %else | ||
|
|
||
| else: | ||
| %rem = srem i64 0, %x | ||
| br label %latch | ||
|
|
||
| latch: | ||
| %phi = phi i64 [ %rem, %else], [ 0, %loop ] | ||
lukel97 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| %phi.trunc = trunc i64 %phi to i32 | ||
| %shl = shl i32 %phi.trunc, 0 | ||
| store i32 %shl, ptr %p | ||
| %iv.next = add i64 %iv, 1 | ||
| %exitcond = icmp eq i64 %iv, 1 | ||
| br i1 %exitcond, label %exit, label %loop | ||
|
|
||
| exit: | ||
| ret void | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? If both are constants, they both should be scalars, and the fold should just work, and the result re-broadcasted?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second operand to WidePtrAdd is a vector. I don't think it will ever end up being a live-in? At least not the way it's used by
expandVPWidenPointerInduction.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment yes, but that may change n the future. If it is a live-in, the value must be a scalar.