Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -976,14 +976,9 @@ static Value *tryToFoldLiveIns(const VPRecipeBase &R, unsigned Opcode,
RFlags.getGEPNoWrapFlags());
}
case VPInstruction::PtrAdd:
case VPInstruction::WidePtrAdd:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? If both are constants, they both should be scalars, and the fold should just work, and the result re-broadcasted?

Copy link
Contributor Author

@lukel97 lukel97 Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second operand to WidePtrAdd is a vector. I don't think it will ever end up being a live-in? At least not the way it's used by expandVPWidenPointerInduction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment yes, but that may change n the future. If it is a live-in, the value must be a scalar.

return Folder.FoldGEP(IntegerType::getInt8Ty(TypeInfo.getContext()), Ops[0],
Ops[1],
cast<VPRecipeWithIRFlags>(R).getGEPNoWrapFlags());
case Instruction::InsertElement:
return Folder.FoldInsertElement(Ops[0], Ops[1], Ops[2]);
case Instruction::ExtractElement:
return Folder.FoldExtractElement(Ops[0], Ops[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could still fold extracts if we have a constant for Ops[0], for which we could create a vector constant fairly easily

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't create a vector constant because we don't know the exact VF, so we can't create a vector type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so extracting any lane will return the live-in operand, which we can do w/o calling the folder, together with comment for why we are not calling the folder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I don't think we can do that for InsertElement though, so are we ok to remove that fold?

}
return nullptr;
}
Expand Down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make this test target-independent and combine it with constantfolder.ll, for instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a try but I couldn't get it to trigger without -mtriple x86_64. I think it has something to do with how the srem can't be widened on x86 and needs to be replicated instead. I'm not sure if there's an easy way of forcing that on the command line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to move this to the target-independent file, it probably needs -force-widen-divrem-via-safe-divisor=false passing and the loop must have a trip count of at least 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out it specifically needed a trip count of 2, otherwise a widened canonical IV recipe was created which seemed to throw things off. I've moved it into a separate target-agnostic file in caca6ea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that's strange, it worked as expected for me when I changed it to have trip count of 4 with VF=4. The wide canonical IV is created if the trip count is < VF, as then it tries to fold the tail instead of creating an un-excutable vector loop

Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
; RUN: opt -p loop-vectorize -S %s | FileCheck %s

; Make sure we don't try to fold a Instruction::ExtractElement ir<0>, ir<0>,
; since we can't materialize the live-in for the vector operand.

target triple = "x86_64"

define void @test(ptr %p, i1 %c, i64 %x) {
; CHECK-LABEL: define void @test(
; CHECK-SAME: ptr [[P:%.*]], i1 [[C:%.*]], i64 [[X:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i1> poison, i1 [[C]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i1> [[BROADCAST_SPLATINSERT]], <2 x i1> poison, <2 x i32> zeroinitializer
; CHECK-NEXT: [[TMP0:%.*]] = xor <2 x i1> [[BROADCAST_SPLAT]], splat (i1 true)
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i1> [[TMP0]], i32 0
; CHECK-NEXT: br i1 [[TMP1]], label %[[PRED_SREM_IF:.*]], label %[[PRED_SREM_CONTINUE:.*]]
; CHECK: [[PRED_SREM_IF]]:
; CHECK-NEXT: br label %[[PRED_SREM_CONTINUE]]
; CHECK: [[PRED_SREM_CONTINUE]]:
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP0]], i32 1
; CHECK-NEXT: br i1 [[TMP2]], label %[[PRED_SREM_IF1:.*]], label %[[PRED_SREM_CONTINUE2:.*]]
; CHECK: [[PRED_SREM_IF1]]:
; CHECK-NEXT: br label %[[PRED_SREM_CONTINUE2]]
; CHECK: [[PRED_SREM_CONTINUE2]]:
; CHECK-NEXT: store i32 0, ptr [[P]], align 4
; CHECK-NEXT: br label %[[MIDDLE_BLOCK:.*]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br label %[[EXIT:.*]]
; CHECK: [[SCALAR_PH]]:
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ]
; CHECK-NEXT: br i1 [[C]], label %[[LATCH]], label %[[ELSE:.*]]
; CHECK: [[ELSE]]:
; CHECK-NEXT: [[REM:%.*]] = srem i64 0, [[X]]
; CHECK-NEXT: br label %[[LATCH]]
; CHECK: [[LATCH]]:
; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[REM]], %[[ELSE]] ], [ 0, %[[LOOP]] ]
; CHECK-NEXT: [[PHI_TRUNC:%.*]] = trunc i64 [[PHI]] to i32
; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[PHI_TRUNC]], 0
; CHECK-NEXT: store i32 [[SHL]], ptr [[P]], align 4
; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 1
; CHECK-NEXT: br i1 [[EXITCOND]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: [[EXIT]]:
; CHECK-NEXT: ret void
;
entry:
br label %loop

loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
br i1 %c, label %latch, label %else

else:
%rem = srem i64 0, %x
br label %latch

latch:
%phi = phi i64 [ %rem, %else], [ 0, %loop ]
%phi.trunc = trunc i64 %phi to i32
%shl = shl i32 %phi.trunc, 0
store i32 %shl, ptr %p
%iv.next = add i64 %iv, 1
%exitcond = icmp eq i64 %iv, 1
br i1 %exitcond, label %exit, label %loop

exit:
ret void
}