-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AArch64] Change cost of (s|z)ext <4 x i8> -> <4 x i32> to 2. #141029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Increase the cost for zext <4 x i8> -> <4 x i32> to 2 to match codegen, which currently uses 2 instructions (commonly ushll.8h, ushll.4s). Example lowering: https://llvm.godbolt.org/z/58TEoxh4v
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-aarch64 Author: Florian Hahn (fhahn) ChangesIncrease the cost for zext <4 x i8> -> <4 x i32> to 2 to match codegen, which currently uses 2 instructions (commonly ushll.8h, ushll.4s). Example lowering: https://llvm.godbolt.org/z/58TEoxh4v Full diff: https://github.com/llvm/llvm-project/pull/141029.diff 6 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 3f10da23b3494..43d03de2f46cf 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -3161,6 +3161,8 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
{ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 3},
{ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i32, 2},
{ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 2},
+ {ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i8, 2},
+ {ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i8, 2},
{ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i8, 3},
{ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i8, 3},
{ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 2},
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll b/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
index 7e1588f427be4..d1b93667c5506 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
@@ -293,15 +293,15 @@ define void @extaddv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_16 = add <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asw_8_32 = add <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asl_8_32 = add <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azw_8_32 = add <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_32 = add <4 x i32> %zl1_8_32, %zl2_8_32
; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = add <4 x i64> %i64, %sw_8_64
@@ -988,15 +988,15 @@ define void @extsubv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_16 = sub <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asw_8_32 = sub <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asl_8_32 = sub <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azw_8_32 = sub <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_32 = sub <4 x i32> %zl1_8_32, %zl2_8_32
; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = sub <4 x i64> %i64, %sw_8_64
@@ -1683,15 +1683,15 @@ define void @extmulv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_16 = mul <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asw_8_32 = mul <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asl_8_32 = mul <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azw_8_32 = mul <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_32 = mul <4 x i32> %zl1_8_32, %zl2_8_32
; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
; CHECK-NEXT: Cost Model: Found costs of RThru:28 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = mul <4 x i64> %i64, %sw_8_64
diff --git a/llvm/test/Analysis/CostModel/AArch64/cast.ll b/llvm/test/Analysis/CostModel/AArch64/cast.ll
index 38bd98ffd343f..618d71b6c125d 100644
--- a/llvm/test/Analysis/CostModel/AArch64/cast.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/cast.ll
@@ -41,8 +41,8 @@ define void @ext() {
; CHECK-NEXT: Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -883,8 +883,8 @@ define i32 @load_extends() {
; CHECK-NEXT: Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
; CHECK-NEXT: Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
diff --git a/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll b/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
index 7595abc71a9d8..1d1a5169691cd 100644
--- a/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
@@ -582,7 +582,7 @@ define <8 x i16> @neg_dissimilar_operand_kind_0(<8 x i8> %a, <8 x i8> %b) {
}
; COST-LABEL: neg_dissimilar_operand_kind_1
-; COST-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp0 = zext <4 x i8> %a to <4 x i32>
+; COST-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %tmp0 = zext <4 x i8> %a to <4 x i32>
; COST-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %tmp1 = zext <4 x i16> %b to <4 x i32>
define <4 x i32> @neg_dissimilar_operand_kind_1(<4 x i8> %a, <4 x i16> %b) {
%tmp0 = zext <4 x i8> %a to <4 x i32>
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll b/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
index cfb130eb5ec32..9c114ca85bd3b 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
@@ -42,8 +42,8 @@ define void @ext() {
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -184,8 +184,8 @@ define void @ext() {
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-256-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-256-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -255,8 +255,8 @@ define void @ext() {
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-2048-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-2048-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -1809,8 +1809,8 @@ define i32 @load_extends() #0 {
; CHECK-SVE-NEXT: Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
@@ -1899,8 +1899,8 @@ define i32 @load_extends() #0 {
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-256-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-256-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
@@ -1944,8 +1944,8 @@ define i32 @load_extends() #0 {
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-2048-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-2048-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
index 36826eb6681c8..4ef5221e93147 100644
--- a/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
@@ -883,7 +883,7 @@ entry:
; COST-LABEL: Function: mla_v4i8_i32
-; COST: Cost: '-6'
+; COST: Cost: '-4'
define i32 @mla_v4i8_i32(ptr %x, ptr %y) "target-features"="+dotprod" {
; CHECK-LABEL: @mla_v4i8_i32(
; CHECK-NEXT: entry:
|
@llvm/pr-subscribers-llvm-analysis Author: Florian Hahn (fhahn) ChangesIncrease the cost for zext <4 x i8> -> <4 x i32> to 2 to match codegen, which currently uses 2 instructions (commonly ushll.8h, ushll.4s). Example lowering: https://llvm.godbolt.org/z/58TEoxh4v Full diff: https://github.com/llvm/llvm-project/pull/141029.diff 6 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 3f10da23b3494..43d03de2f46cf 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -3161,6 +3161,8 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
{ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 3},
{ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i32, 2},
{ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 2},
+ {ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i8, 2},
+ {ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i8, 2},
{ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i8, 3},
{ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i8, 3},
{ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 2},
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll b/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
index 7e1588f427be4..d1b93667c5506 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-widening.ll
@@ -293,15 +293,15 @@ define void @extaddv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_16 = add <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asw_8_32 = add <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asl_8_32 = add <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azw_8_32 = add <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_32 = add <4 x i32> %zl1_8_32, %zl2_8_32
; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = add <4 x i64> %i64, %sw_8_64
@@ -988,15 +988,15 @@ define void @extsubv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_16 = sub <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asw_8_32 = sub <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asl_8_32 = sub <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azw_8_32 = sub <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_32 = sub <4 x i32> %zl1_8_32, %zl2_8_32
; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = sub <4 x i64> %i64, %sw_8_64
@@ -1683,15 +1683,15 @@ define void @extmulv4(<4 x i8> %i8, <4 x i16> %i16, <4 x i32> %i32, <4 x i64> %i
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_16 = zext <4 x i8> %i8 to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_16 = mul <4 x i16> %zl1_8_16, %zl2_8_16
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asw_8_32 = mul <4 x i32> %i32, %sw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl1_8_32 = sext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %sl2_8_32 = sext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %asl_8_32 = mul <4 x i32> %sl1_8_32, %sl2_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zw_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azw_8_32 = mul <4 x i32> %i32, %zw_8_32
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl1_8_32 = zext <4 x i8> %i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %zl2_8_32 = zext <4 x i8> %i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %azl_8_32 = mul <4 x i32> %zl1_8_32, %zl2_8_32
; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %sw_8_64 = sext <4 x i8> %i8 to <4 x i64>
; CHECK-NEXT: Cost Model: Found costs of RThru:28 CodeSize:1 Lat:1 SizeLat:1 for: %asw_8_64 = mul <4 x i64> %i64, %sw_8_64
diff --git a/llvm/test/Analysis/CostModel/AArch64/cast.ll b/llvm/test/Analysis/CostModel/AArch64/cast.ll
index 38bd98ffd343f..618d71b6c125d 100644
--- a/llvm/test/Analysis/CostModel/AArch64/cast.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/cast.ll
@@ -41,8 +41,8 @@ define void @ext() {
; CHECK-NEXT: Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -883,8 +883,8 @@ define i32 @load_extends() {
; CHECK-NEXT: Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
; CHECK-NEXT: Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; CHECK-NEXT: Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
; CHECK-NEXT: Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
diff --git a/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll b/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
index 7595abc71a9d8..1d1a5169691cd 100644
--- a/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll
@@ -582,7 +582,7 @@ define <8 x i16> @neg_dissimilar_operand_kind_0(<8 x i8> %a, <8 x i8> %b) {
}
; COST-LABEL: neg_dissimilar_operand_kind_1
-; COST-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp0 = zext <4 x i8> %a to <4 x i32>
+; COST-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %tmp0 = zext <4 x i8> %a to <4 x i32>
; COST-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %tmp1 = zext <4 x i16> %b to <4 x i32>
define <4 x i32> @neg_dissimilar_operand_kind_1(<4 x i8> %a, <4 x i16> %b) {
%tmp0 = zext <4 x i8> %a to <4 x i32>
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll b/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
index cfb130eb5ec32..9c114ca85bd3b 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-cast.ll
@@ -42,8 +42,8 @@ define void @ext() {
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:3 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -184,8 +184,8 @@ define void @ext() {
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-256-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-256-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -255,8 +255,8 @@ define void @ext() {
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
-; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
-; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-2048-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
+; FIXED-MIN-2048-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
@@ -1809,8 +1809,8 @@ define i32 @load_extends() #0 {
; CHECK-SVE-NEXT: Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; CHECK-SVE-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
; CHECK-SVE-NEXT: Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
@@ -1899,8 +1899,8 @@ define i32 @load_extends() #0 {
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-256-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-256-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
@@ -1944,8 +1944,8 @@ define i32 @load_extends() #0 {
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 0 for: %r11 = zext i32 %loadi32 to i64
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v0 = sext <8 x i8> %loadv8i8 to <8 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v1 = zext <8 x i8> %loadv8i8 to <8 x i16>
-; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
-; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-2048-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v2 = sext <4 x i8> %loadv4i8 to <4 x i32>
+; FIXED-MIN-2048-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %v3 = zext <4 x i8> %loadv4i8 to <4 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v4 = sext <2 x i8> %loadv2i8 to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v5 = zext <2 x i8> %loadv2i8 to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found costs of 1 for: %v6 = sext <4 x i16> %loadv4i16 to <4 x i32>
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
index 36826eb6681c8..4ef5221e93147 100644
--- a/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/vecreduceadd.ll
@@ -883,7 +883,7 @@ entry:
; COST-LABEL: Function: mla_v4i8_i32
-; COST: Cost: '-6'
+; COST: Cost: '-4'
define i32 @mla_v4i8_i32(ptr %x, ptr %y) "target-features"="+dotprod" {
; CHECK-LABEL: @mla_v4i8_i32(
; CHECK-NEXT: entry:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to note separately is that the costs from the custom tables are only used for RThru costs, the others default to 1, which is not really accurate either, but that's the same for all entries in the lookup tables
Do you have any details about what this improves for you? I am not against it, it sounds sensible, but if you look at the cost of the load it is already 2, as it includes the cost of
I think it might have ran into problems with unrolling, when I tried it. I may give it another go, I had a big stack of patches for them that I didn't do much with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any details about what this improves for you? I am not against it, it sounds sensible, but if you look at the cost of the load it is already 2, as it includes the cost of
v4i8 load + zext to v4i16
. A v4i8 is legalized to a v4i16 and the cost model somewhat attempts to account for that. In some cases the cost of will certainly be 2 though.
I don't have a super strong motivation, this is just something I noticed while working on #141109. This also changes vectorization factors in a number of cases, but I still need to check those. As you said, it will probably be a wash, with some cases more accruate while others less accurate, due to not being context-sensitive enough
Increase the cost for zext <4 x i8> -> <4 x i32> to 2 to match codegen, which currently uses 2 instructions (commonly ushll.8h, ushll.4s).
Example lowering: https://llvm.godbolt.org/z/58TEoxh4v