-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[InstCombine] Remove the canonicalization of trunc
to i1
#84628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-llvm-transforms Author: Monad (YanWQ-monad) ChangesRemove the canonicalization of llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp Lines 737 to 745 in a84e66a
Patch is 66.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/84628.diff 16 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
index 45afa6363ae01f..2f34d825ceed58 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
@@ -734,15 +734,6 @@ Instruction *InstCombinerImpl::visitTrunc(TruncInst &Trunc) {
if (DestWidth == 1) {
Value *Zero = Constant::getNullValue(SrcTy);
- if (DestTy->isIntegerTy()) {
- // Canonicalize trunc x to i1 -> icmp ne (and x, 1), 0 (scalar only).
- // TODO: We canonicalize to more instructions here because we are probably
- // lacking equivalent analysis for trunc relative to icmp. There may also
- // be codegen concerns. If those trunc limitations were removed, we could
- // remove this transform.
- Value *And = Builder.CreateAnd(Src, ConstantInt::get(SrcTy, 1));
- return new ICmpInst(ICmpInst::ICMP_NE, And, Zero);
- }
// For vectors, we do not canonicalize all truncs to icmp, so optimize
// patterns that would be covered within visitICmpInst.
diff --git a/llvm/test/Transforms/InstCombine/X86/x86-avx512-inseltpoison.ll b/llvm/test/Transforms/InstCombine/X86/x86-avx512-inseltpoison.ll
index 9b990480709c97..80d8e1b16ed28b 100644
--- a/llvm/test/Transforms/InstCombine/X86/x86-avx512-inseltpoison.ll
+++ b/llvm/test/Transforms/InstCombine/X86/x86-avx512-inseltpoison.ll
@@ -39,10 +39,9 @@ define <4 x float> @test_add_ss_mask(<4 x float> %a, <4 x float> %b, <4 x float>
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[A:%.*]], i64 0
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = fadd float [[TMP1]], [[TMP2]]
-; CHECK-NEXT: [[TMP4:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP4]], 0
+; CHECK-NEXT: [[TMP4:%.*]] = trunc i8 [[MASK:%.*]] to i1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[C:%.*]], i64 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], float [[TMP5]], float [[TMP3]]
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP4]], float [[TMP3]], float [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[A]], float [[TMP6]], i64 0
; CHECK-NEXT: ret <4 x float> [[TMP7]]
;
@@ -117,10 +116,9 @@ define <2 x double> @test_add_sd_mask(<2 x double> %a, <2 x double> %b, <2 x dou
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x double> [[A:%.*]], i64 0
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = fadd double [[TMP1]], [[TMP2]]
-; CHECK-NEXT: [[TMP4:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP4]], 0
+; CHECK-NEXT: [[TMP4:%.*]] = trunc i8 [[MASK:%.*]] to i1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[C:%.*]], i64 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], double [[TMP5]], double [[TMP3]]
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP4]], double [[TMP3]], double [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[A]], double [[TMP6]], i64 0
; CHECK-NEXT: ret <2 x double> [[TMP7]]
;
@@ -191,10 +189,9 @@ define <4 x float> @test_sub_ss_mask(<4 x float> %a, <4 x float> %b, <4 x float>
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[A:%.*]], i64 0
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = fsub float [[TMP1]], [[TMP2]]
-; CHECK-NEXT: [[TMP4:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP4]], 0
+; CHECK-NEXT: [[TMP4:%.*]] = trunc i8 [[MASK:%.*]] to i1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[C:%.*]], i64 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], float [[TMP5]], float [[TMP3]]
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP4]], float [[TMP3]], float [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[A]], float [[TMP6]], i64 0
; CHECK-NEXT: ret <4 x float> [[TMP7]]
;
@@ -269,10 +266,9 @@ define <2 x double> @test_sub_sd_mask(<2 x double> %a, <2 x double> %b, <2 x dou
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x double> [[A:%.*]], i64 0
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = fsub double [[TMP1]], [[TMP2]]
-; CHECK-NEXT: [[TMP4:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP4]], 0
+; CHECK-NEXT: [[TMP4:%.*]] = trunc i8 [[MASK:%.*]] to i1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[C:%.*]], i64 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], double [[TMP5]], double [[TMP3]]
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP4]], double [[TMP3]], double [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[A]], double [[TMP6]], i64 0
; CHECK-NEXT: ret <2 x double> [[TMP7]]
;
@@ -343,10 +339,9 @@ define <4 x float> @test_mul_ss_mask(<4 x float> %a, <4 x float> %b, <4 x float>
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[A:%.*]], i64 0
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = fmul float [[TMP1]], [[TMP2]]
-; CHECK-NEXT: [[TMP4:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP4]], 0
+; CHECK-NEXT: [[TMP4:%.*]] = trunc i8 [[MASK:%.*]] to i1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[C:%.*]], i64 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], float [[TMP5]], float [[TMP3]]
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP4]], float [[TMP3]], float [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[A]], float [[TMP6]], i64 0
; CHECK-NEXT: ret <4 x float> [[TMP7]]
;
@@ -421,10 +416,9 @@ define <2 x double> @test_mul_sd_mask(<2 x double> %a, <2 x double> %b, <2 x dou
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x double> [[A:%.*]], i64 0
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = fmul double [[TMP1]], [[TMP2]]
-; CHECK-NEXT: [[TMP4:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP4]], 0
+; CHECK-NEXT: [[TMP4:%.*]] = trunc i8 [[MASK:%.*]] to i1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[C:%.*]], i64 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], double [[TMP5]], double [[TMP3]]
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP4]], double [[TMP3]], double [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[A]], double [[TMP6]], i64 0
; CHECK-NEXT: ret <2 x double> [[TMP7]]
;
@@ -495,10 +489,9 @@ define <4 x float> @test_div_ss_mask(<4 x float> %a, <4 x float> %b, <4 x float>
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[A:%.*]], i64 0
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = fdiv float [[TMP1]], [[TMP2]]
-; CHECK-NEXT: [[TMP4:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP4]], 0
+; CHECK-NEXT: [[TMP4:%.*]] = trunc i8 [[MASK:%.*]] to i1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[C:%.*]], i64 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], float [[TMP5]], float [[TMP3]]
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP4]], float [[TMP3]], float [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[A]], float [[TMP6]], i64 0
; CHECK-NEXT: ret <4 x float> [[TMP7]]
;
@@ -573,10 +566,9 @@ define <2 x double> @test_div_sd_mask(<2 x double> %a, <2 x double> %b, <2 x dou
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x double> [[A:%.*]], i64 0
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = fdiv double [[TMP1]], [[TMP2]]
-; CHECK-NEXT: [[TMP4:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP4]], 0
+; CHECK-NEXT: [[TMP4:%.*]] = trunc i8 [[MASK:%.*]] to i1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[C:%.*]], i64 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], double [[TMP5]], double [[TMP3]]
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP4]], double [[TMP3]], double [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[A]], double [[TMP6]], i64 0
; CHECK-NEXT: ret <2 x double> [[TMP7]]
;
@@ -981,9 +973,8 @@ define <4 x float> @test_mask_vfmadd_ss(<4 x float> %a, <4 x float> %b, <4 x flo
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call float @llvm.fma.f32(float [[TMP1]], float [[TMP2]], float [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], float [[TMP1]], float [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], float [[TMP4]], float [[TMP1]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[A]], float [[TMP6]], i64 0
; CHECK-NEXT: ret <4 x float> [[TMP7]]
;
@@ -1011,9 +1002,8 @@ define float @test_mask_vfmadd_ss_0(<4 x float> %a, <4 x float> %b, <4 x float>
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call float @llvm.fma.f32(float [[TMP1]], float [[TMP2]], float [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], float [[TMP1]], float [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], float [[TMP4]], float [[TMP1]]
; CHECK-NEXT: ret float [[TMP6]]
;
%1 = insertelement <4 x float> %a, float 1.000000e+00, i32 1
@@ -1060,9 +1050,8 @@ define <2 x double> @test_mask_vfmadd_sd(<2 x double> %a, <2 x double> %b, <2 x
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call double @llvm.fma.f64(double [[TMP1]], double [[TMP2]], double [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], double [[TMP1]], double [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], double [[TMP4]], double [[TMP1]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[A]], double [[TMP6]], i64 0
; CHECK-NEXT: ret <2 x double> [[TMP7]]
;
@@ -1086,9 +1075,8 @@ define double @test_mask_vfmadd_sd_0(<2 x double> %a, <2 x double> %b, <2 x doub
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call double @llvm.fma.f64(double [[TMP1]], double [[TMP2]], double [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], double [[TMP1]], double [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], double [[TMP4]], double [[TMP1]]
; CHECK-NEXT: ret double [[TMP6]]
;
%1 = insertelement <2 x double> %a, double 1.000000e+00, i32 1
@@ -1129,9 +1117,8 @@ define <4 x float> @test_maskz_vfmadd_ss(<4 x float> %a, <4 x float> %b, <4 x fl
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call float @llvm.fma.f32(float [[TMP1]], float [[TMP2]], float [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], float 0.000000e+00, float [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], float [[TMP4]], float 0.000000e+00
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[A]], float [[TMP6]], i64 0
; CHECK-NEXT: ret <4 x float> [[TMP7]]
;
@@ -1159,9 +1146,8 @@ define float @test_maskz_vfmadd_ss_0(<4 x float> %a, <4 x float> %b, <4 x float>
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call float @llvm.fma.f32(float [[TMP1]], float [[TMP2]], float [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], float 0.000000e+00, float [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], float [[TMP4]], float 0.000000e+00
; CHECK-NEXT: ret float [[TMP6]]
;
%1 = insertelement <4 x float> %a, float 1.000000e+00, i32 1
@@ -1206,9 +1192,8 @@ define <2 x double> @test_maskz_vfmadd_sd(<2 x double> %a, <2 x double> %b, <2 x
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call double @llvm.fma.f64(double [[TMP1]], double [[TMP2]], double [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], double 0.000000e+00, double [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], double [[TMP4]], double 0.000000e+00
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[A]], double [[TMP6]], i64 0
; CHECK-NEXT: ret <2 x double> [[TMP7]]
;
@@ -1232,9 +1217,8 @@ define double @test_maskz_vfmadd_sd_0(<2 x double> %a, <2 x double> %b, <2 x dou
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call double @llvm.fma.f64(double [[TMP1]], double [[TMP2]], double [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], double 0.000000e+00, double [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], double [[TMP4]], double 0.000000e+00
; CHECK-NEXT: ret double [[TMP6]]
;
%1 = insertelement <2 x double> %a, double 1.000000e+00, i32 1
@@ -1275,9 +1259,8 @@ define <4 x float> @test_mask3_vfmadd_ss(<4 x float> %a, <4 x float> %b, <4 x fl
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call float @llvm.fma.f32(float [[TMP1]], float [[TMP2]], float [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], float [[TMP3]], float [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], float [[TMP4]], float [[TMP3]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[C]], float [[TMP6]], i64 0
; CHECK-NEXT: ret <4 x float> [[TMP7]]
;
@@ -1305,9 +1288,8 @@ define float @test_mask3_vfmadd_ss_0(<4 x float> %a, <4 x float> %b, <4 x float>
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call float @llvm.fma.f32(float [[TMP1]], float [[TMP2]], float [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], float [[TMP3]], float [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], float [[TMP4]], float [[TMP3]]
; CHECK-NEXT: ret float [[TMP6]]
;
%1 = insertelement <4 x float> %c, float 1.000000e+00, i32 1
@@ -1352,9 +1334,8 @@ define <2 x double> @test_mask3_vfmadd_sd(<2 x double> %a, <2 x double> %b, <2 x
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call double @llvm.fma.f64(double [[TMP1]], double [[TMP2]], double [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], double [[TMP3]], double [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], double [[TMP4]], double [[TMP3]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[C]], double [[TMP6]], i64 0
; CHECK-NEXT: ret <2 x double> [[TMP7]]
;
@@ -1378,9 +1359,8 @@ define double @test_mask3_vfmadd_sd_0(<2 x double> %a, <2 x double> %b, <2 x dou
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[B:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[C:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = call double @llvm.fma.f64(double [[TMP1]], double [[TMP2]], double [[TMP3]])
-; CHECK-NEXT: [[TMP5:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP5]], 0
-; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[DOTNOT]], double [[TMP3]], double [[TMP4]]
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], double [[TMP4]], double [[TMP3]]
; CHECK-NEXT: ret double [[TMP6]]
;
%1 = insertelement <2 x double> %c, double 1.000000e+00, i32 1
@@ -1423,9 +1403,8 @@ define <4 x float> @test_mask3_vfmsub_ss(<4 x float> %a, <4 x float> %b, <4 x fl
; CHECK-NEXT: [[TMP4:%.*]] = fneg float [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = call float @llvm.fma.f32(float [[TMP1]], float [[TMP2]], float [[TMP4]])
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[C]], i64 0
-; CHECK-NEXT: [[TMP7:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP7]], 0
-; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[DOTNOT]], float [[TMP6]], float [[TMP5]]
+; CHECK-NEXT: [[TMP7:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float [[TMP6]]
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> [[C]], float [[TMP8]], i64 0
; CHECK-NEXT: ret <4 x float> [[TMP9]]
;
@@ -1457,9 +1436,8 @@ define float @test_mask3_vfmsub_ss_0(<4 x float> %a, <4 x float> %b, <4 x float>
; CHECK-NEXT: [[TMP4:%.*]] = fneg float [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = call float @llvm.fma.f32(float [[TMP1]], float [[TMP2]], float [[TMP4]])
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[C]], i64 0
-; CHECK-NEXT: [[TMP7:%.*]] = and i8 [[MASK:%.*]], 1
-; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP7]], 0
-; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[DOTNOT]], float [[TMP6]], float [[TMP5]]
+; CHECK-NEXT: [[TMP7:%.*]] = trunc i8 [[MASK:%.*]] to i1
+; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float [[TMP6]]
; CHECK-NEXT: ret float [[TMP8]]
;
%1 = insertelement <4 x float> %c, float 1.000000e+00, i32 1
@@ -1532,9 +1510,8 @@ define <2 x double> @test_mask3_vfmsub_sd(<2 x double> %a, <2 x double> %b, <2 x
; CHECK-NEXT: [[TMP4:%.*]] = fneg double [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = call double @llvm.fma.f64(double [[TMP1]], double [[TMP2]], double [[TMP4]])
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[C]], i64 0
-; CHECK-NEXT: [[TMP7:%.*]] = an...
[truncated]
|
Please update the test. |
This isn't a bad thought, but think the comment still applies, looking through the test changes there are some regressions that indicate we handle I am supportive of this effort, but can you please also make patches to cleanup the regressions? |
May I ask a question about fixing this regression? With the original canonicalization, llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Lines 3678 to 3682 in b4001e3
The problem is that, the parameters for that series of functions are passed using |
It is a limitation that these folds take completed Another approach might be to create a pattern match |
After examining the code, I found that those folds are not the key problem.
Making those folds inside |
Seems reasonable, not sure if this canonicalization is relied on elsewhere thought. |
ping? Please let me know if there are further changes needed, thanks. |
Im generally fine with this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Value *X; | ||
const APInt *C1; | ||
Constant *C2; | ||
if (match(Src, m_OneUse(m_LShr(m_Shl(m_Power2(C1), m_Value(X)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide the alive2 proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it's https://alive2.llvm.org/ce/z/6Frpdc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could actually be any right shift: https://alive2.llvm.org/ce/z/vmPhjM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see m_Shr
)
ping? |
@YanWQ-monad Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested Please check whether problems have been caused by your change specifically, as How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
With the addition of llvm#84628, truncs to i1 are being emitted as conditions to branch instructions. This caused significant regressions in cases which were previously improved by loop unswitch. Adding truncs to i1 restore the previous performance seen.
With the addition of #84628, truncs to i1 are being emitted as conditions to branch instructions. This caused significant regressions in cases which were previously improved by loop unswitch. Adding truncs to i1 restore the previous performance seen.
This patch attempts to help the SelectOpt pass detect select groups made up of conditions and not(conditions). Usually these are canonicalized in instcombine to remove the not and invert the true/false values, but this will not happen for Loginal operations, which can be beneficial to convert if they are part of a larger select group. The handling for not's are mostly handled in the SelectLike, which can be marked as Inverted in order to reverse the TrueValue and FalseValue. This helps fix a regression in fortran minloc constructs, after llvm#84628 helped simplify a loop with branches into a loop with selects.
This patch attempts to help the SelectOpt pass detect select groups made up of conditions and not(conditions). Usually these are canonicalized in instcombine to remove the not and invert the true/false values, but this will not happen for Loginal operations, which can be beneficial to convert if they are part of a larger select group. The handling for not's are mostly handled in the SelectLike, which can be marked as Inverted in order to reverse the TrueValue and FalseValue. This helps fix a regression in fortran minloc constructs, after llvm#84628 helped simplify a loop with branches into a loop with selects.
This patch attempts to help the SelectOpt pass detect select groups made up of conditions and not(conditions). Usually these are canonicalized in instcombine to remove the not and invert the true/false values, but this will not happen for Loginal operations, which can be beneficial to convert if they are part of a larger select group. The handling for not's are mostly handled in the SelectLike, which can be marked as Inverted in order to reverse the TrueValue and FalseValue. This helps fix a regression in fortran minloc constructs, after llvm#84628 helped simplify a loop with branches into a loop with selects.
This patch attempts to help the SelectOpt pass detect select groups made up of conditions and not(conditions). Usually these are canonicalized in instcombine to remove the not and invert the true/false values, but this will not happen for Loginal operations, which can be beneficial to convert if they are part of a larger select group. The handling for not's are mostly handled in the SelectLike, which can be marked as Inverted in order to reverse the TrueValue and FalseValue. This helps fix a regression in fortran minloc constructs, after #84628 helped simplify a loop with branches into a loop with selects.
…lvm#84628)" This reverts commit 56b3222.
…lvm#84628)" This reverts commit 56b3222.
Remove the canonicalization of
trunc
toi1
according to the suggestion of #83829 (comment)llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
Lines 737 to 745 in a84e66a
Alive2: https://alive2.llvm.org/ce/z/cacYVA