-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[TTI][AArch64] Detect OperandInfo from scalable splats. #122469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: David Green (davemgreen) ChangesPulled out of #122236, this allows Splats constants to be recognized by getOperandInfo, allowing "better" costs for instructions like divides by constants to be produced (which are expanded into mul+add+shift). Some of the costs are not very accurate yet, but the comparison of scalar vs fixed-width vs scalable for the same div can become more accurate, especially with patches like #122236. Patch is 44.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122469.diff 3 Files Affected:
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index b32dffa9f0fe86..13a56709ed10f5 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -893,7 +893,8 @@ TargetTransformInfo::getOperandInfo(const Value *V) {
// Check for a splat of a constant or for a non uniform vector of constants
// and check if the constant(s) are all powers of two.
- if (isa<ConstantVector>(V) || isa<ConstantDataVector>(V)) {
+ if (isa<ConstantVector>(V) || isa<ConstantDataVector>(V) ||
+ isa<ConstantExpr>(V)) {
OpInfo = OK_NonUniformConstantValue;
if (Splat) {
OpInfo = OK_UniformConstantValue;
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-div.ll b/llvm/test/Analysis/CostModel/AArch64/sve-div.ll
index 4c25e3003177d9..ac5638de332039 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-div.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-div.ll
@@ -181,22 +181,22 @@ define void @sdiv_uniformconst() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = sdiv <16 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V32i8 = sdiv <32 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %V64i8 = sdiv <64 x i8> undef, splat (i8 7)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i64 = sdiv <vscale x 2 x i64> undef, splat (i64 7)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV2i64 = sdiv <vscale x 2 x i64> undef, splat (i64 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NV4i64 = sdiv <vscale x 4 x i64> undef, splat (i64 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i64 = sdiv <vscale x 8 x i64> undef, splat (i64 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i32 = sdiv <vscale x 2 x i32> undef, splat (i32 7)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i32 = sdiv <vscale x 4 x i32> undef, splat (i32 7)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV4i32 = sdiv <vscale x 4 x i32> undef, splat (i32 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NV8i32 = sdiv <vscale x 8 x i32> undef, splat (i32 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV16i32 = sdiv <vscale x 16 x i32> undef, splat (i32 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i16 = sdiv <vscale x 2 x i16> undef, splat (i16 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i16 = sdiv <vscale x 4 x i16> undef, splat (i16 7)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i16 = sdiv <vscale x 8 x i16> undef, splat (i16 7)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV8i16 = sdiv <vscale x 8 x i16> undef, splat (i16 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %NV16i16 = sdiv <vscale x 16 x i16> undef, splat (i16 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %NV32i16 = sdiv <vscale x 32 x i16> undef, splat (i16 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i8 = sdiv <vscale x 2 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i8 = sdiv <vscale x 4 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i8 = sdiv <vscale x 8 x i8> undef, splat (i8 7)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %NV16i8 = sdiv <vscale x 16 x i8> undef, splat (i8 7)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV16i8 = sdiv <vscale x 16 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %NV32i8 = sdiv <vscale x 32 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %NV64i8 = sdiv <vscale x 64 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
@@ -260,22 +260,22 @@ define void @udiv_uniformconst() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = udiv <16 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V32i8 = udiv <32 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %V64i8 = udiv <64 x i8> undef, splat (i8 7)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i64 = udiv <vscale x 2 x i64> undef, splat (i64 7)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV2i64 = udiv <vscale x 2 x i64> undef, splat (i64 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NV4i64 = udiv <vscale x 4 x i64> undef, splat (i64 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i64 = udiv <vscale x 8 x i64> undef, splat (i64 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i32 = udiv <vscale x 2 x i32> undef, splat (i32 7)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i32 = udiv <vscale x 4 x i32> undef, splat (i32 7)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV4i32 = udiv <vscale x 4 x i32> undef, splat (i32 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NV8i32 = udiv <vscale x 8 x i32> undef, splat (i32 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV16i32 = udiv <vscale x 16 x i32> undef, splat (i32 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i16 = udiv <vscale x 2 x i16> undef, splat (i16 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i16 = udiv <vscale x 4 x i16> undef, splat (i16 7)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i16 = udiv <vscale x 8 x i16> undef, splat (i16 7)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV8i16 = udiv <vscale x 8 x i16> undef, splat (i16 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %NV16i16 = udiv <vscale x 16 x i16> undef, splat (i16 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %NV32i16 = udiv <vscale x 32 x i16> undef, splat (i16 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i8 = udiv <vscale x 2 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i8 = udiv <vscale x 4 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i8 = udiv <vscale x 8 x i8> undef, splat (i8 7)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %NV16i8 = udiv <vscale x 16 x i8> undef, splat (i8 7)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV16i8 = udiv <vscale x 16 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %NV32i8 = udiv <vscale x 32 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %NV64i8 = udiv <vscale x 64 x i8> undef, splat (i8 7)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
@@ -339,24 +339,24 @@ define void @sdiv_uniformconstpow2() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 99 for instruction: %V16i8 = sdiv <16 x i8> undef, splat (i8 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 198 for instruction: %V32i8 = sdiv <32 x i8> undef, splat (i8 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 396 for instruction: %V64i8 = sdiv <64 x i8> undef, splat (i8 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i64 = sdiv <vscale x 2 x i64> undef, splat (i64 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NV4i64 = sdiv <vscale x 4 x i64> undef, splat (i64 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i64 = sdiv <vscale x 8 x i64> undef, splat (i64 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i32 = sdiv <vscale x 2 x i32> undef, splat (i32 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i32 = sdiv <vscale x 4 x i32> undef, splat (i32 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NV8i32 = sdiv <vscale x 8 x i32> undef, splat (i32 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV16i32 = sdiv <vscale x 16 x i32> undef, splat (i32 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i16 = sdiv <vscale x 2 x i16> undef, splat (i16 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i16 = sdiv <vscale x 4 x i16> undef, splat (i16 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i16 = sdiv <vscale x 8 x i16> undef, splat (i16 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %NV16i16 = sdiv <vscale x 16 x i16> undef, splat (i16 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %NV32i16 = sdiv <vscale x 32 x i16> undef, splat (i16 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i8 = sdiv <vscale x 2 x i8> undef, splat (i8 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i8 = sdiv <vscale x 4 x i8> undef, splat (i8 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i8 = sdiv <vscale x 8 x i8> undef, splat (i8 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %NV16i8 = sdiv <vscale x 16 x i8> undef, splat (i8 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %NV32i8 = sdiv <vscale x 32 x i8> undef, splat (i8 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %NV64i8 = sdiv <vscale x 64 x i8> undef, splat (i8 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %NV2i64 = sdiv <vscale x 2 x i64> undef, splat (i64 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %NV4i64 = sdiv <vscale x 4 x i64> undef, splat (i64 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %NV8i64 = sdiv <vscale x 8 x i64> undef, splat (i64 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %NV2i32 = sdiv <vscale x 2 x i32> undef, splat (i32 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %NV4i32 = sdiv <vscale x 4 x i32> undef, splat (i32 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %NV8i32 = sdiv <vscale x 8 x i32> undef, splat (i32 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %NV16i32 = sdiv <vscale x 16 x i32> undef, splat (i32 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %NV2i16 = sdiv <vscale x 2 x i16> undef, splat (i16 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %NV4i16 = sdiv <vscale x 4 x i16> undef, splat (i16 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %NV8i16 = sdiv <vscale x 8 x i16> undef, splat (i16 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %NV16i16 = sdiv <vscale x 16 x i16> undef, splat (i16 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %NV32i16 = sdiv <vscale x 32 x i16> undef, splat (i16 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %NV2i8 = sdiv <vscale x 2 x i8> undef, splat (i8 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %NV4i8 = sdiv <vscale x 4 x i8> undef, splat (i8 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %NV8i8 = sdiv <vscale x 8 x i8> undef, splat (i8 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %NV16i8 = sdiv <vscale x 16 x i8> undef, splat (i8 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %NV32i8 = sdiv <vscale x 32 x i8> undef, splat (i8 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %NV64i8 = sdiv <vscale x 64 x i8> undef, splat (i8 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
%V2i64 = sdiv <2 x i64> undef, splat (i64 16)
@@ -418,22 +418,22 @@ define void @udiv_uniformconstpow2() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = udiv <16 x i8> undef, splat (i8 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V32i8 = udiv <32 x i8> undef, splat (i8 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %V64i8 = udiv <64 x i8> undef, splat (i8 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i64 = udiv <vscale x 2 x i64> undef, splat (i64 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV2i64 = udiv <vscale x 2 x i64> undef, splat (i64 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NV4i64 = udiv <vscale x 4 x i64> undef, splat (i64 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i64 = udiv <vscale x 8 x i64> undef, splat (i64 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i32 = udiv <vscale x 2 x i32> undef, splat (i32 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i32 = udiv <vscale x 4 x i32> undef, splat (i32 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV4i32 = udiv <vscale x 4 x i32> undef, splat (i32 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NV8i32 = udiv <vscale x 8 x i32> undef, splat (i32 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV16i32 = udiv <vscale x 16 x i32> undef, splat (i32 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i16 = udiv <vscale x 2 x i16> undef, splat (i16 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i16 = udiv <vscale x 4 x i16> undef, splat (i16 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i16 = udiv <vscale x 8 x i16> undef, splat (i16 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV8i16 = udiv <vscale x 8 x i16> undef, splat (i16 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %NV16i16 = udiv <vscale x 16 x i16> undef, splat (i16 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %NV32i16 = udiv <vscale x 32 x i16> undef, splat (i16 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i8 = udiv <vscale x 2 x i8> undef, splat (i8 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i8 = udiv <vscale x 4 x i8> undef, splat (i8 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i8 = udiv <vscale x 8 x i8> undef, splat (i8 16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %NV16i8 = udiv <vscale x 16 x i8> undef, splat (i8 16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV16i8 = udiv <vscale x 16 x i8> undef, splat (i8 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %NV32i8 = udiv <vscale x 32 x i8> undef, splat (i8 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %NV64i8 = udiv <vscale x 64 x i8> undef, splat (i8 16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
@@ -497,22 +497,22 @@ define void @sdiv_uniformconstnegpow2() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = sdiv <16 x i8> undef, splat (i8 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V32i8 = sdiv <32 x i8> undef, splat (i8 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %V64i8 = sdiv <64 x i8> undef, splat (i8 -16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i64 = sdiv <vscale x 2 x i64> undef, splat (i64 -16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV2i64 = sdiv <vscale x 2 x i64> undef, splat (i64 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NV4i64 = sdiv <vscale x 4 x i64> undef, splat (i64 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i64 = sdiv <vscale x 8 x i64> undef, splat (i64 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i32 = sdiv <vscale x 2 x i32> undef, splat (i32 -16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i32 = sdiv <vscale x 4 x i32> undef, splat (i32 -16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV4i32 = sdiv <vscale x 4 x i32> undef, splat (i32 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NV8i32 = sdiv <vscale x 8 x i32> undef, splat (i32 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV16i32 = sdiv <vscale x 16 x i32> undef, splat (i32 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i16 = sdiv <vscale x 2 x i16> undef, splat (i16 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i16 = sdiv <vscale x 4 x i16> undef, splat (i16 -16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i16 = sdiv <vscale x 8 x i16> undef, splat (i16 -16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV8i16 = sdiv <vscale x 8 x i16> undef, splat (i16 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %NV16i16 = sdiv <vscale x 16 x i16> undef, splat (i16 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %NV32i16 = sdiv <vscale x 32 x i16> undef, splat (i16 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i8 = sdiv <vscale x 2 x i8> undef, splat (i8 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV4i8 = sdiv <vscale x 4 x i8> undef, splat (i8 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %NV8i8 = sdiv <vscale x 8 x i8> undef, splat (i8 -16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %NV16i8 = sdiv <vscale x 16 x i8> undef, splat (i8 -16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV16i8 = sdiv <vscale x 16 x i8> undef, splat (i8 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %NV32i8 = sdiv <vscale x 32 x i8> undef, splat (i8 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %NV64i8 = sdiv <vscale x 64 x i8> undef, splat (i8 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
@@ -576,22 +576,22 @@ define void @udiv_uniformconstnegpow2() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = udiv <16 x i8> undef, splat (i8 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V32i8 = udiv <32 x i8> undef, splat (i8 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %V64i8 = udiv <64 x i8> undef, splat (i8 -16)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i64 = udiv <vscale x 2 x i64> undef, splat (i64 -16)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV2i64 = udiv <vscale x 2 x i64> undef, splat (i64 -16)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %NV4i64 = udiv <vscale x 4 x i64> undef, splat (i64 -16)
; CHECK-NEXT: Cost Model: Found a...
[truncated]
|
@@ -260,22 +260,22 @@ define void @udiv_uniformconst() { | |||
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = udiv <16 x i8> undef, splat (i8 7) | |||
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V32i8 = udiv <32 x i8> undef, splat (i8 7) | |||
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %V64i8 = udiv <64 x i8> undef, splat (i8 7) | |||
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %NV2i64 = udiv <vscale x 2 x i64> undef, splat (i64 7) | |||
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %NV2i64 = udiv <vscale x 2 x i64> undef, splat (i64 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi David,
I see why the cost could be changed, but I can't understand how it makes sense that the legal type of <vscal x 2 x i64> has higher cost than a custom type like the one below: <vscale x 4 x i64>.
If expanding the div op for <vscal x 2 x i64> to the sequence (MULHS + ADD/SUB + SRA + SRL + ADD) results in higher cost, why do we do that expansion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi - yeah I agree it is a bit odd, the costs are not super accurate so far. It is hard to find a very accurate cost for something that depends on the input data, but they should be improved in future patches.
The cost of a udiv/sdiv should not be 1 (or 2), which I think some of these are falling back to, neither for scalar or for sve vectors. The divides use an iterative algorithm that takes multiple cycles to complete and block other operations until they finish. So whilst the codesize cost can be 1, the recip-throughput and latency should be higher.
I believe the expansion is still better than using udiv/sdiv instructions (from experiments). These costs look better in #122236, and the others should be brought into line in future patches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @davemgreen, thanks for splitting this patch out from #122236, it's really helpful to see which changes are affecting which costs. However, I do agree with @hassnaaHamdi that something looks wrong here, which isn't fixed by #122236 either. I would assume that in @sdiv_uniformconst
sdiv <vscale x 8 x i32> undef, splat (i64 7)
gets legalised into two sdiv <vscale x 4 x i32> undef, splat (i64 7)
instructions with the results being concatenated together. Right now the vectoriser will believe that VF=vscale x 8 is almost half the cost of VF=vscale x 4, which doesn't seem right. Something similar happens in sdiv_uniformconstnegpow2
.
Having said that, in general I do agree this is a step in the right direction for most cases because previously the divide costs were too low. The throughputs for udiv/sdiv are at best 1/7 on neoverse-v1 according to the optimisation guide. This patch just increases the costs for divides of splats, and then #122236 follows on for the more general case. I do like this patch and happy to accept it, but perhaps worth understanding what's going on first with the illegal types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it makes sense to investigate why the cost for illegal types gets assigned a lower cost. In a way this regresses the cost-model because for some types it now assumes that a div of a legal type is more expensive than an illegal type that requires legalisation.
if (isa<ConstantVector>(V) || isa<ConstantDataVector>(V) || | ||
isa<ConstantExpr>(V)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than adding another case, which I think makes this sensitive to the plethora of constant subclasses, I think the logic here would be easier to follow if it would be structured as:
// Handle splats first since those are all uniform
if (const Value *Splat = getSplatValue(V)) {
if (auto *CI = dyn_cast<ConstantInt>(Splat)) {
OpInfo = OK_UniformConstantValue;
...
} else if (isa<Constant>(Splat))
OpInfo = OK_UniformConstantValue;
else if (isa<Argument>(Splat) || isa<GlobalValue>(Splat))
OpInfo = OK_UniformValue;
} else if (isa<Constant>(V)) {
OpInfo = OK_NonUniformConstantValue;
if (const auto *CDS = dyn_cast<ConstantDataSequential>(V)) {
...
}
}
This also allows folding in the argument/global value splats case.
7321d54
to
d48e165
Compare
Changed to use the splat to detect constants - this will make the difference that zeroinitializer is now treated as a OK_UniformConstantValue too. That can be changed if necessary. |
Pulled out of llvm#122236, this allows Splats contants to be recognized in by getOperandInfo, allowing "better" costs for instructions like divides by constants to be produced (which are expanded into mul+add+shift). Some of the costs are not very accurate yet, but the comparison of scalar vs fixed-with vs scalable for the same fiv can become more accurate, especially with patches like llvm#122236.
d48e165
to
6fb39ff
Compare
Rebase and ping - this is hopefully a little simpler now. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %NV16i8 = sdiv <vscale x 16 x i8> undef, splat (i8 16) | ||
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %NV32i8 = sdiv <vscale x 32 x i8> undef, splat (i8 16) | ||
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %NV64i8 = sdiv <vscale x 64 x i8> undef, splat (i8 16) | ||
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %NV2i64 = sdiv <vscale x 2 x i64> undef, splat (i64 16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it that the issue that @hassnaaHamdi reported has gone away with the latest revision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep for the moment, due to the opt-opt in llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp. They should change again in a future patch, but that should happen in a more reliable way where it applies to all vector types.
Pulled out of #122236, this allows Splats constants to be recognized by getOperandInfo, allowing "better" costs for instructions like divides by constants to be produced (which are expanded into mul+add+shift). Some of the costs are not very accurate yet, but the comparison of scalar vs fixed-width vs scalable for the same div can become more accurate, especially with patches like #122236.