Skip to content

Commit 6c2cc82

Browse files
committed
[AArch64] Improve cost of non-zero lane splats
This adds a cost for non-zero lane splats, which is not included by default in SK_Broadcast but can be handled by aarch64 dup lane instruction.
1 parent 8891fd5 commit 6c2cc82

File tree

2 files changed

+7
-4
lines changed

2 files changed

+7
-4
lines changed

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3939,7 +3939,10 @@ InstructionCost AArch64TTIImpl::getShuffleCost(
39393939
LT.second.getVectorNumElements() == Mask.size() &&
39403940
(Kind == TTI::SK_PermuteTwoSrc || Kind == TTI::SK_PermuteSingleSrc) &&
39413941
(isZIPMask(Mask, LT.second, Unused) ||
3942-
isUZPMask(Mask, LT.second, Unused)))
3942+
isUZPMask(Mask, LT.second, Unused) ||
3943+
// Check for non-zero lane splats
3944+
all_of(drop_begin(Mask),
3945+
[&Mask](int M) { return M < 0 || M == Mask[0]; })))
39433946
return 1;
39443947

39453948
if (Kind == TTI::SK_Broadcast || Kind == TTI::SK_Transpose ||

llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -361,9 +361,9 @@ define void @uzp() {
361361
define void @multipart() {
362362
; CHECK-LABEL: 'multipart'
363363
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16a = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>
364-
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v16b = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
365-
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v16c = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
366-
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v16d = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
364+
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16b = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
365+
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16c = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
366+
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16d = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
367367
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32a = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
368368
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v32a4 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>
369369
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v32idrev = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 15, i32 14, i32 13, i32 12, i32 16, i32 17, i32 18, i32 19, i32 31, i32 30, i32 29, i32 28>

0 commit comments

Comments
 (0)