[VPlan] Don't fold live ins with both scalar and vector operands #154067

lukel97 · 2025-08-18T06:24:37Z

If we end up with a extract_element VPInstruction where both operands are live-ins, we will try to fold the live-ins even though the first operand is a vector whilst the live-in is scalar.

This fixes it by just returning the vector live-in instead of calling the folder, and removes the handling for insertelement where we aren't able to do the fold. From some quick testing we previously never hit this fold anyway, and were probably just missing test coverage.

Fixes #154045

If we end up with a extract_element VPInstruction where both operands are live-ins, we will try to fold the live-ins even though the first operand is vector and the live-in is scalar. I don't think it's possible for us to fold these. We can't create a vector version of the live-in either because we don't know VF at this stage. This removes the handling for opcodes that may have both scalar and vector operands. From some quick testing we previously never hit these folds anyway, and were probably just missing test coverage. Fixes llvm#154045

llvmbot · 2025-08-18T06:25:13Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Luke Lau (lukel97)

Changes

If we end up with a extract_element VPInstruction where both operands are live-ins, we will try to fold the live-ins even though the first operand is vector and the live-in is scalar.

I don't think it's possible for us to fold these. We can't create a vector version of the live-in either because we don't know VF at this stage.

This removes the handling for opcodes that may have both scalar and vector operands. From some quick testing we previously never hit these folds anyway, and were probably just missing test coverage.

Fixes #154045

Full diff: https://github.com/llvm/llvm-project/pull/154067.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (-5)
(added) llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll (+75)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index de0c1e4d177bb..022d9b749f978 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -976,14 +976,9 @@ static Value *tryToFoldLiveIns(const VPRecipeBase &R, unsigned Opcode,
                           RFlags.getGEPNoWrapFlags());
   }
   case VPInstruction::PtrAdd:
-  case VPInstruction::WidePtrAdd:
     return Folder.FoldGEP(IntegerType::getInt8Ty(TypeInfo.getContext()), Ops[0],
                           Ops[1],
                           cast<VPRecipeWithIRFlags>(R).getGEPNoWrapFlags());
-  case Instruction::InsertElement:
-    return Folder.FoldInsertElement(Ops[0], Ops[1], Ops[2]);
-  case Instruction::ExtractElement:
-    return Folder.FoldExtractElement(Ops[0], Ops[1]);
   }
   return nullptr;
 }
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll b/llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll
new file mode 100644
index 0000000000000..185071707b6e8
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll
@@ -0,0 +1,75 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
+; RUN: opt -p loop-vectorize -S %s | FileCheck %s
+
+; Make sure we don't try to fold a Instruction::ExtractElement ir<0>, ir<0>, ,
+; since we can't materialize the live-in for the vector operand.
+
+target triple = "x86_64"
+
+define void @test(ptr %p, i1 %c, i64 %x) {
+; CHECK-LABEL: define void @test(
+; CHECK-SAME: ptr [[P:%.*]], i1 [[C:%.*]], i64 [[X:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i1> poison, i1 [[C]], i64 0
+; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i1> [[BROADCAST_SPLATINSERT]], <2 x i1> poison, <2 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP0:%.*]] = xor <2 x i1> [[BROADCAST_SPLAT]], splat (i1 true)
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x i1> [[TMP0]], i32 0
+; CHECK-NEXT:    br i1 [[TMP1]], label %[[PRED_SREM_IF:.*]], label %[[PRED_SREM_CONTINUE:.*]]
+; CHECK:       [[PRED_SREM_IF]]:
+; CHECK-NEXT:    br label %[[PRED_SREM_CONTINUE]]
+; CHECK:       [[PRED_SREM_CONTINUE]]:
+; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <2 x i1> [[TMP0]], i32 1
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[PRED_SREM_IF1:.*]], label %[[PRED_SREM_CONTINUE2:.*]]
+; CHECK:       [[PRED_SREM_IF1]]:
+; CHECK-NEXT:    br label %[[PRED_SREM_CONTINUE2]]
+; CHECK:       [[PRED_SREM_CONTINUE2]]:
+; CHECK-NEXT:    store i32 0, ptr [[P]], align 4
+; CHECK-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ]
+; CHECK-NEXT:    br i1 [[C]], label %[[LATCH]], label %[[ELSE:.*]]
+; CHECK:       [[ELSE]]:
+; CHECK-NEXT:    [[REM:%.*]] = srem i64 0, [[X]]
+; CHECK-NEXT:    br label %[[LATCH]]
+; CHECK:       [[LATCH]]:
+; CHECK-NEXT:    [[PHI:%.*]] = phi i64 [ [[REM]], %[[ELSE]] ], [ 0, %[[LOOP]] ]
+; CHECK-NEXT:    [[PHI_TRUNC:%.*]] = trunc i64 [[PHI]] to i32
+; CHECK-NEXT:    [[SHL:%.*]] = shl i32 [[PHI_TRUNC]], 0
+; CHECK-NEXT:    store i32 [[SHL]], ptr [[P]], align 4
+; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 1
+; CHECK-NEXT:    br i1 [[EXITCOND]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
+  br i1 %c, label %latch, label %else
+
+else:
+  %rem = srem i64 0, %x
+  br label %latch
+
+latch:
+  %phi = phi i64 [ %rem, %else], [ 0, %loop ]
+  %phi.trunc = trunc i64 %phi to i32
+  %shl = shl i32 %phi.trunc, 0
+  store i32 %shl, ptr %p
+  %iv.next = add i64 %iv, 1
+  %exitcond = icmp eq i64 %iv, 1
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}

artagnon

Not 100% sure I understand the problem yet.

artagnon · 2025-08-18T10:45:39Z

llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll

Is it possible to make this test target-independent and combine it with constantfolder.ll, for instance?

I gave this a try but I couldn't get it to trigger without -mtriple x86_64. I think it has something to do with how the srem can't be widened on x86 and needs to be replicated instead. I'm not sure if there's an easy way of forcing that on the command line

You should be able to move this to the target-independent file, it probably needs -force-widen-divrem-via-safe-divisor=false passing and the loop must have a trip count of at least 4

Turns out it specifically needed a trip count of 2, otherwise a widened canonical IV recipe was created which seemed to throw things off. I've moved it into a separate target-agnostic file in caca6ea

Ok that's strange, it worked as expected for me when I changed it to have trip count of 4 with VF=4. The wide canonical IV is created if the trip count is < VF, as then it tries to fold the tail instead of creating an un-excutable vector loop

llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll

lukel97 · 2025-08-18T10:50:12Z

Not 100% sure I understand the problem yet.

live-ins are always scalar, and just broadcasted during recipe execution if a vector is needed AFAIK.

So currently we try to constant fold FoldExtractElement(Ops[0], Ops[1]) with two scalars. But the first operand needs to be a vector, hence the crash.

fhahn · 2025-08-18T10:56:08Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

                          RFlags.getGEPNoWrapFlags());
  }
  case VPInstruction::PtrAdd:
-  case VPInstruction::WidePtrAdd:


Do we need this? If both are constants, they both should be scalars, and the fold should just work, and the result re-broadcasted?

The second operand to WidePtrAdd is a vector. I don't think it will ever end up being a live-in? At least not the way it's used by expandVPWidenPointerInduction.

At the moment yes, but that may change n the future. If it is a live-in, the value must be a scalar.

fhahn · 2025-08-18T10:57:38Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

-  case Instruction::InsertElement:
-    return Folder.FoldInsertElement(Ops[0], Ops[1], Ops[2]);
-  case Instruction::ExtractElement:
-    return Folder.FoldExtractElement(Ops[0], Ops[1]);


I guess we could still fold extracts if we have a constant for Ops[0], for which we could create a vector constant fairly easily

We can't create a vector constant because we don't know the exact VF, so we can't create a vector type

Right, so extracting any lane will return the live-in operand, which we can do w/o calling the folder, together with comment for why we are not calling the folder

I see. I don't think we can do that for InsertElement though, so are we ok to remove that fold?

fhahn · 2025-08-18T20:10:10Z

llvm/test/Transforms/LoopVectorize/X86/pr154045-dont-fold-extractelement-livein.ll

You should be able to move this to the target-independent file, it probably needs -force-widen-divrem-via-safe-divisor=false passing and the loop must have a trip count of at least 4

lukel97 requested review from artagnon, david-arm and fhahn August 18, 2025 06:24

llvmbot added vectorizers llvm:transforms labels Aug 18, 2025

Fix typo

2d656bf

artagnon reviewed Aug 18, 2025

View reviewed changes

fhahn reviewed Aug 18, 2025

View reviewed changes

Add back WidePtrAdd + InsertElement, adjust test

5dc33bc

fhahn approved these changes Aug 18, 2025

View reviewed changes

lukel97 added 2 commits August 19, 2025 11:27

Make test target agnostic

caca6ea

Merge branch 'main' into loop-vectorize/dont-fold-mixed-type-liveins

46ddab5

lukel97 enabled auto-merge (squash) August 19, 2025 03:31

lukel97 merged commit 144736b into llvm:main Aug 19, 2025
9 checks passed

[VPlan] Don't fold live ins with both scalar and vector operands #154067

[VPlan] Don't fold live ins with both scalar and vector operands #154067

Uh oh!

Conversation

lukel97 commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artagnon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lukel97 commented Aug 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lukel97 commented Aug 18, 2025 •

edited

Loading

llvmbot commented Aug 18, 2025 •

edited

Loading

lukel97 Aug 18, 2025 •

edited

Loading