Skip to content

Commit 5be6551

Browse files
committed
[AArch64] Tweak truncate costs for some scalable vector types
* We were previously returning an invalid cost when truncating anything to <vscale x 2 x i1>, which is incorrect since we can generate perfectly good code for this. * The costs for truncating legal or unpacked types to predicates seemed overly optimistic. For example, when truncating <vscale x 8 x i16> to <vscale x 8 x i1> we typically do something like and z0.h, z0.h, #0x1 cmpne p0.h, p0/z, z0.h, #0 I guess it might depend upon whether the input value is generated in the same block or not and if we can avoid the inreg zero-extend. However, it feels safe to take the more conservative cost here. * The costs for some truncates such as trunc <vscale x 2 x i32> %a to <vscale x 2 x i16> were 1, whereas in actual fact they are free and no instructions are required. Also, for this trunc <vscale x 8 x i32> %a to <vscale x 8 x i16> it's just a single uzp1 instruction so I reduced the cost to 1.
1 parent 8ed0278 commit 5be6551

File tree

4 files changed

+84
-67
lines changed

4 files changed

+84
-67
lines changed

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2782,22 +2782,39 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
27822782
{ISD::TRUNCATE, MVT::v16i32, MVT::v16i64, 4}, // 4 x uzp1
27832783

27842784
// Truncations on nxvmiN
2785-
{ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i16, 1},
2786-
{ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i32, 1},
2787-
{ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i64, 1},
2788-
{ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i16, 1},
2789-
{ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i32, 1},
2790-
{ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i64, 2},
2791-
{ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i16, 1},
2792-
{ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i32, 3},
2793-
{ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i64, 5},
2794-
{ISD::TRUNCATE, MVT::nxv16i1, MVT::nxv16i8, 1},
2795-
{ISD::TRUNCATE, MVT::nxv2i16, MVT::nxv2i32, 1},
2796-
{ISD::TRUNCATE, MVT::nxv2i32, MVT::nxv2i64, 1},
2797-
{ISD::TRUNCATE, MVT::nxv4i16, MVT::nxv4i32, 1},
2798-
{ISD::TRUNCATE, MVT::nxv4i32, MVT::nxv4i64, 2},
2799-
{ISD::TRUNCATE, MVT::nxv8i16, MVT::nxv8i32, 3},
2800-
{ISD::TRUNCATE, MVT::nxv8i32, MVT::nxv8i64, 6},
2785+
{ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i8, 2},
2786+
{ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i16, 2},
2787+
{ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i32, 2},
2788+
{ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i64, 2},
2789+
{ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i8, 2},
2790+
{ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i16, 2},
2791+
{ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i32, 2},
2792+
{ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i64, 5},
2793+
{ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i8, 2},
2794+
{ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i16, 2},
2795+
{ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i32, 5},
2796+
{ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i64, 11},
2797+
{ISD::TRUNCATE, MVT::nxv16i1, MVT::nxv16i8, 2},
2798+
{ISD::TRUNCATE, MVT::nxv2i8, MVT::nxv2i16, 0},
2799+
{ISD::TRUNCATE, MVT::nxv2i8, MVT::nxv2i32, 0},
2800+
{ISD::TRUNCATE, MVT::nxv2i8, MVT::nxv2i64, 0},
2801+
{ISD::TRUNCATE, MVT::nxv2i16, MVT::nxv2i32, 0},
2802+
{ISD::TRUNCATE, MVT::nxv2i16, MVT::nxv2i64, 0},
2803+
{ISD::TRUNCATE, MVT::nxv2i32, MVT::nxv2i64, 0},
2804+
{ISD::TRUNCATE, MVT::nxv4i8, MVT::nxv4i16, 0},
2805+
{ISD::TRUNCATE, MVT::nxv4i8, MVT::nxv4i32, 0},
2806+
{ISD::TRUNCATE, MVT::nxv4i8, MVT::nxv4i64, 1},
2807+
{ISD::TRUNCATE, MVT::nxv4i16, MVT::nxv4i32, 0},
2808+
{ISD::TRUNCATE, MVT::nxv4i16, MVT::nxv4i64, 1},
2809+
{ISD::TRUNCATE, MVT::nxv4i32, MVT::nxv4i64, 1},
2810+
{ISD::TRUNCATE, MVT::nxv8i8, MVT::nxv8i16, 0},
2811+
{ISD::TRUNCATE, MVT::nxv8i8, MVT::nxv8i32, 1},
2812+
{ISD::TRUNCATE, MVT::nxv8i8, MVT::nxv8i64, 3},
2813+
{ISD::TRUNCATE, MVT::nxv8i16, MVT::nxv8i32, 1},
2814+
{ISD::TRUNCATE, MVT::nxv8i16, MVT::nxv8i64, 3},
2815+
{ISD::TRUNCATE, MVT::nxv16i8, MVT::nxv16i16, 1},
2816+
{ISD::TRUNCATE, MVT::nxv16i8, MVT::nxv16i32, 3},
2817+
{ISD::TRUNCATE, MVT::nxv16i8, MVT::nxv16i64, 7},
28012818

28022819
// The number of shll instructions for the extension.
28032820
{ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 3},

llvm/test/Analysis/CostModel/AArch64/sve-cast.ll

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -418,27 +418,27 @@ define void @trunc() {
418418
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i8i16 = trunc <2 x i16> undef to <2 x i8>
419419
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i8i32 = trunc <2 x i32> undef to <2 x i8>
420420
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i8i64 = trunc <2 x i64> undef to <2 x i8>
421-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i16i32 = trunc <2 x i32> undef to <2 x i16>
421+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i16i32 = trunc <2 x i32> undef to <2 x i16>
422422
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i16i64 = trunc <2 x i64> undef to <2 x i16>
423-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i32i64 = trunc <2 x i64> undef to <2 x i32>
423+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i32i64 = trunc <2 x i64> undef to <2 x i32>
424424
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i16 = trunc <4 x i16> undef to <4 x i8>
425425
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i32 = trunc <4 x i32> undef to <4 x i8>
426426
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i64 = trunc <4 x i64> undef to <4 x i8>
427-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i16i32 = trunc <4 x i32> undef to <4 x i16>
427+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i16i32 = trunc <4 x i32> undef to <4 x i16>
428428
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i16i64 = trunc <4 x i64> undef to <4 x i16>
429-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
429+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
430430
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i16 = trunc <8 x i16> undef to <8 x i8>
431431
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i32 = trunc <8 x i32> undef to <8 x i8>
432432
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i64 = trunc <8 x i64> undef to <8 x i8>
433-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
433+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
434434
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i64 = trunc <8 x i64> undef to <8 x i16>
435-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
435+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
436436
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i16 = trunc <16 x i16> undef to <16 x i8>
437437
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i32 = trunc <16 x i32> undef to <16 x i8>
438438
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i64 = trunc <16 x i64> undef to <16 x i8>
439-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
439+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
440440
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i64 = trunc <16 x i64> undef to <16 x i16>
441-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
441+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
442442
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
443443
;
444444
; FIXED-MIN-256-LABEL: 'trunc'
@@ -463,19 +463,19 @@ define void @trunc() {
463463
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i64 = trunc <4 x i64> undef to <4 x i8>
464464
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i16i32 = trunc <4 x i32> undef to <4 x i16>
465465
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i16i64 = trunc <4 x i64> undef to <4 x i16>
466-
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
466+
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
467467
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i8i16 = trunc <8 x i16> undef to <8 x i8>
468468
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i32 = trunc <8 x i32> undef to <8 x i8>
469469
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i64 = trunc <8 x i64> undef to <8 x i8>
470-
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
470+
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
471471
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i64 = trunc <8 x i64> undef to <8 x i16>
472-
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
472+
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
473473
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i16 = trunc <16 x i16> undef to <16 x i8>
474474
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i32 = trunc <16 x i32> undef to <16 x i8>
475475
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i64 = trunc <16 x i64> undef to <16 x i8>
476-
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
476+
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
477477
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i64 = trunc <16 x i64> undef to <16 x i16>
478-
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
478+
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
479479
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
480480
;
481481
; FIXED-MIN-2048-LABEL: 'trunc'
@@ -500,19 +500,19 @@ define void @trunc() {
500500
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i64 = trunc <4 x i64> undef to <4 x i8>
501501
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i16i32 = trunc <4 x i32> undef to <4 x i16>
502502
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i16i64 = trunc <4 x i64> undef to <4 x i16>
503-
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
503+
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
504504
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i8i16 = trunc <8 x i16> undef to <8 x i8>
505505
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i32 = trunc <8 x i32> undef to <8 x i8>
506506
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i64 = trunc <8 x i64> undef to <8 x i8>
507-
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
507+
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
508508
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i64 = trunc <8 x i64> undef to <8 x i16>
509-
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
509+
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
510510
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i16 = trunc <16 x i16> undef to <16 x i8>
511511
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i32 = trunc <16 x i32> undef to <16 x i8>
512512
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i64 = trunc <16 x i64> undef to <16 x i8>
513-
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
513+
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
514514
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i64 = trunc <16 x i64> undef to <16 x i16>
515-
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
515+
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
516516
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
517517
;
518518
%r8 = trunc i8 undef to i1

0 commit comments

Comments
 (0)