-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AArch64] Ensure we transferImpOps on BSP pseudo expansions. #149456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) ChangesThis ensures that we transfer implicit operands to the new expanded pseudos if necessary, similarly to other pseudo expansions. Full diff: https://github.com/llvm/llvm-project/pull/149456.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp b/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
index 12fc976a70ea7..201bfe0a443d6 100644
--- a/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
@@ -1205,32 +1205,36 @@ bool AArch64ExpandPseudo::expandMI(MachineBasicBlock &MBB,
Register DstReg = MI.getOperand(0).getReg();
if (DstReg == MI.getOperand(3).getReg()) {
// Expand to BIT
- BuildMI(MBB, MBBI, MI.getDebugLoc(),
- TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BITv8i8
- : AArch64::BITv16i8))
- .add(MI.getOperand(0))
- .add(MI.getOperand(3))
- .add(MI.getOperand(2))
- .add(MI.getOperand(1));
+ auto I = BuildMI(MBB, MBBI, MI.getDebugLoc(),
+ TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BITv8i8
+ : AArch64::BITv16i8))
+ .add(MI.getOperand(0))
+ .add(MI.getOperand(3))
+ .add(MI.getOperand(2))
+ .add(MI.getOperand(1));
+ transferImpOps(MI, I, I);
} else if (DstReg == MI.getOperand(2).getReg()) {
// Expand to BIF
- BuildMI(MBB, MBBI, MI.getDebugLoc(),
- TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BIFv8i8
- : AArch64::BIFv16i8))
- .add(MI.getOperand(0))
- .add(MI.getOperand(2))
- .add(MI.getOperand(3))
- .add(MI.getOperand(1));
+ auto I = BuildMI(MBB, MBBI, MI.getDebugLoc(),
+ TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BIFv8i8
+ : AArch64::BIFv16i8))
+ .add(MI.getOperand(0))
+ .add(MI.getOperand(2))
+ .add(MI.getOperand(3))
+ .add(MI.getOperand(1));
+ transferImpOps(MI, I, I);
} else {
// Expand to BSL, use additional move if required
if (DstReg == MI.getOperand(1).getReg()) {
- BuildMI(MBB, MBBI, MI.getDebugLoc(),
- TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BSLv8i8
- : AArch64::BSLv16i8))
- .add(MI.getOperand(0))
- .add(MI.getOperand(1))
- .add(MI.getOperand(2))
- .add(MI.getOperand(3));
+ auto I =
+ BuildMI(MBB, MBBI, MI.getDebugLoc(),
+ TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BSLv8i8
+ : AArch64::BSLv16i8))
+ .add(MI.getOperand(0))
+ .add(MI.getOperand(1))
+ .add(MI.getOperand(2))
+ .add(MI.getOperand(3));
+ transferImpOps(MI, I, I);
} else {
BuildMI(MBB, MBBI, MI.getDebugLoc(),
TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::ORRv8i8
@@ -1240,15 +1244,17 @@ bool AArch64ExpandPseudo::expandMI(MachineBasicBlock &MBB,
getRenamableRegState(MI.getOperand(0).isRenamable()))
.add(MI.getOperand(1))
.add(MI.getOperand(1));
- BuildMI(MBB, MBBI, MI.getDebugLoc(),
- TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BSLv8i8
- : AArch64::BSLv16i8))
- .add(MI.getOperand(0))
- .addReg(DstReg,
- RegState::Kill |
- getRenamableRegState(MI.getOperand(0).isRenamable()))
- .add(MI.getOperand(2))
- .add(MI.getOperand(3));
+ auto I2 =
+ BuildMI(MBB, MBBI, MI.getDebugLoc(),
+ TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BSLv8i8
+ : AArch64::BSLv16i8))
+ .add(MI.getOperand(0))
+ .addReg(DstReg,
+ RegState::Kill | getRenamableRegState(
+ MI.getOperand(0).isRenamable()))
+ .add(MI.getOperand(2))
+ .add(MI.getOperand(3));
+ transferImpOps(MI, I2, I2);
}
}
MI.eraseFromParent();
diff --git a/llvm/test/CodeGen/AArch64/bsp_implicit_ops.mir b/llvm/test/CodeGen/AArch64/bsp_implicit_ops.mir
new file mode 100644
index 0000000000000..bc7cc7887ff59
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/bsp_implicit_ops.mir
@@ -0,0 +1,98 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64-none-linux-gnu -run-pass aarch64-expand-pseudo -verify-machineinstrs %s -o - | FileCheck %s
+
+
+---
+name: BIF
+tracksRegLiveness: true
+body: |
+ bb.0.entry:
+ liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+
+
+ ; CHECK-LABEL: name: BIF
+ ; CHECK: liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: renamable $q2 = ORRv16i8 killed renamable $q20, killed renamable $q20
+ ; CHECK-NEXT: renamable $q2 = BSLv16i8 killed renamable $q2, renamable $q21, renamable $q6, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+ ; CHECK-NEXT: $q22 = ORRv16i8 $q0, killed $q0
+ ; CHECK-NEXT: $q23 = ORRv16i8 $q1, killed $q1
+ ; CHECK-NEXT: $q24 = ORRv16i8 $q2, killed $q2
+ ; CHECK-NEXT: $q25 = ORRv16i8 $q3, killed $q3
+ ; CHECK-NEXT: RET undef $lr, implicit $q22
+ renamable $q2 = BSPv16i8 killed renamable $q20, renamable $q21, renamable $q6, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+ $q22 = ORRv16i8 $q0, killed $q0
+ $q23 = ORRv16i8 $q1, killed $q1
+ $q24 = ORRv16i8 $q2, killed $q2
+ $q25 = ORRv16i8 $q3, killed $q3
+ RET_ReallyLR implicit $q22
+...
+---
+name: BSL
+tracksRegLiveness: true
+body: |
+ bb.0.entry:
+ liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+
+ ; CHECK-LABEL: name: BSL
+ ; CHECK: liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: renamable $q2 = BSLv16i8 killed renamable $q2, renamable $q21, renamable $q6, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+ ; CHECK-NEXT: $q22 = ORRv16i8 $q0, killed $q0
+ ; CHECK-NEXT: $q23 = ORRv16i8 $q1, killed $q1
+ ; CHECK-NEXT: $q24 = ORRv16i8 $q2, killed $q2
+ ; CHECK-NEXT: $q25 = ORRv16i8 $q3, killed $q3
+ ; CHECK-NEXT: RET undef $lr, implicit $q22
+ renamable $q2 = BSPv16i8 killed renamable $q2, renamable $q21, renamable $q6, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+ $q22 = ORRv16i8 $q0, killed $q0
+ $q23 = ORRv16i8 $q1, killed $q1
+ $q24 = ORRv16i8 $q2, killed $q2
+ $q25 = ORRv16i8 $q3, killed $q3
+ RET_ReallyLR implicit $q22
+...
+---
+name: BIT
+tracksRegLiveness: true
+body: |
+ bb.0.entry:
+ liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+
+ ; CHECK-LABEL: name: BIT
+ ; CHECK: liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: renamable $q2 = BIFv16i8 renamable $q2, renamable $q6, killed renamable $q20, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+ ; CHECK-NEXT: $q22 = ORRv16i8 $q0, killed $q0
+ ; CHECK-NEXT: $q23 = ORRv16i8 $q1, killed $q1
+ ; CHECK-NEXT: $q24 = ORRv16i8 $q2, killed $q2
+ ; CHECK-NEXT: $q25 = ORRv16i8 $q3, killed $q3
+ ; CHECK-NEXT: RET undef $lr, implicit $q22
+ renamable $q2 = BSPv16i8 killed renamable $q20, renamable $q2, renamable $q6, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+ $q22 = ORRv16i8 $q0, killed $q0
+ $q23 = ORRv16i8 $q1, killed $q1
+ $q24 = ORRv16i8 $q2, killed $q2
+ $q25 = ORRv16i8 $q3, killed $q3
+ RET_ReallyLR implicit $q22
+...
+---
+name: BSL_COPY
+tracksRegLiveness: true
+body: |
+ bb.0.entry:
+ liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+
+ ; CHECK-LABEL: name: BSL_COPY
+ ; CHECK: liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: renamable $q2 = BITv16i8 renamable $q2, renamable $q21, killed renamable $q20, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+ ; CHECK-NEXT: $q22 = ORRv16i8 $q0, killed $q0
+ ; CHECK-NEXT: $q23 = ORRv16i8 $q1, killed $q1
+ ; CHECK-NEXT: $q24 = ORRv16i8 $q2, killed $q2
+ ; CHECK-NEXT: $q25 = ORRv16i8 $q3, killed $q3
+ ; CHECK-NEXT: RET undef $lr, implicit $q22
+ renamable $q2 = BSPv16i8 killed renamable $q20, renamable $q21, renamable $q2, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+ $q22 = ORRv16i8 $q0, killed $q0
+ $q23 = ORRv16i8 $q1, killed $q1
+ $q24 = ORRv16i8 $q2, killed $q2
+ $q25 = ORRv16i8 $q3, killed $q3
+ RET_ReallyLR implicit $q22
+...
|
rj-jesus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change looks good, but the original input from #149380 looks suspicious (unless I'm missing something).
What I mean is this (https://godbolt.org/z/j15MoWf9E):
...
renamable $q16_q17_q18_q19 = LD1Fourv2d $x21 :: (load (s512) from %stack.8, align 16)
renamable $q17 = BSPv16i8 killed renamable $q0, renamable $q1, renamable $q23, implicit killed $q16_q17_q18_q19, implicit-def $q16_q17_q18_q19
ST1Fourv2d killed renamable $q16_q17_q18_q19, killed $x21 :: (store (s512) into %stack.8, align 16)
$q27 = ORRv16i8 $q20, $q20
renamable $q0 = CMHIv2i64 renamable $q1, $q20
renamable $q9 = BSPv16i8 killed renamable $q0, renamable $q1, $q20, implicit killed $q1_q2_q3, implicit killed $q8_q9_q10_q11, implicit-def $q8_q9_q10_q11
$x21 = ADDXri $sp, 976, 0
ST1Fourv2d killed renamable $q8_q9_q10_q11, killed $x21 :: (store (s512) into %stack.23, align 16)
renamable $x21 = ADDXri renamable $x19, 144, 0
renamable $q0 = CMHIv2i64 renamable $q4, renamable $q21
renamable $q1_q2_q3 = LD3Threev2d killed renamable $x21 :: (load (s384) from %ir.35, align 64)
renamable $q7 = CMHIv2i64 renamable $q1, renamable $q21
renamable $q16 = BSPv16i8 killed renamable $q0, renamable $q4, renamable $q21, implicit-def $q16_q17_q18_q19
renamable $q19 = BSPv16i8 killed renamable $q7, renamable $q1, renamable $q21, implicit killed $q16_q17_q18_q19, implicit-def $q16_q17_q18_q19
$q12 = ORRv16i8 $q16, killed $q16
$q13 = ORRv16i8 $q17, killed $q17
$q14 = ORRv16i8 $q18, killed $q18
$q15 = ORRv16i8 $q19, killed $q19
...
The first store seems to kill $q16_q17_q18_q19 before we actually reach the later instructions using q16, q18 and q19. Are they expected to be live then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: would it be worth declaring MachineInstrBuilder MIB before the if-else chain and moving transferImpOps(MI, MIB, MIB); just before MI.eraseFromParent?
Is seems OK to me. What do you think is wrong with it? (Other than subreg-liveness inefficiencies). |
This ensures that we transfer implicit operands to the new expanded pseudos if necessary, similarly to other pseudo expansions.
22c2fad to
0d6557c
Compare
I think it must be something I'm misunderstanding, I agree it doesn't look wrong. What's confusing me is the implicit-def operand in |
|
I think it is fine, if a bit inefficient. They come from register allocation of the large vectors (something that subregliveness helps with). The values are not coming from the store, the BSP will implicit-def $q16_q17_q18_q19 and insert q16 into it. The others are overridden later before they are used, I believe (or unused, but it looks like they updated by other BSPs). q19 is inserted in the last instruction, the others I think get inserted into q13 and q14 after the moves. |
|
I see, thanks very much for the explanation. I was mainly concerned about the |
…9456) This ensures that we transfer implicit operands to the new expanded pseudos if necessary, similarly to other pseudo expansions.
This ensures that we transfer implicit operands to the new expanded pseudos if necessary, similarly to other pseudo expansions.