Skip to content

Conversation

@davemgreen
Copy link
Collaborator

@davemgreen davemgreen commented Jul 18, 2025

This ensures that we transfer implicit operands to the new expanded pseudos if necessary, similarly to other pseudo expansions.

@llvmbot
Copy link
Member

llvmbot commented Jul 18, 2025

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

This ensures that we transfer implicit operands to the new expanded pseudos if necessary, similarly to other pseudo expansions.


Full diff: https://github.com/llvm/llvm-project/pull/149456.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp (+36-30)
  • (added) llvm/test/CodeGen/AArch64/bsp_implicit_ops.mir (+98)
diff --git a/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp b/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
index 12fc976a70ea7..201bfe0a443d6 100644
--- a/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
@@ -1205,32 +1205,36 @@ bool AArch64ExpandPseudo::expandMI(MachineBasicBlock &MBB,
     Register DstReg = MI.getOperand(0).getReg();
     if (DstReg == MI.getOperand(3).getReg()) {
       // Expand to BIT
-      BuildMI(MBB, MBBI, MI.getDebugLoc(),
-              TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BITv8i8
-                                                  : AArch64::BITv16i8))
-          .add(MI.getOperand(0))
-          .add(MI.getOperand(3))
-          .add(MI.getOperand(2))
-          .add(MI.getOperand(1));
+      auto I = BuildMI(MBB, MBBI, MI.getDebugLoc(),
+                       TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BITv8i8
+                                                           : AArch64::BITv16i8))
+                   .add(MI.getOperand(0))
+                   .add(MI.getOperand(3))
+                   .add(MI.getOperand(2))
+                   .add(MI.getOperand(1));
+      transferImpOps(MI, I, I);
     } else if (DstReg == MI.getOperand(2).getReg()) {
       // Expand to BIF
-      BuildMI(MBB, MBBI, MI.getDebugLoc(),
-              TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BIFv8i8
-                                                  : AArch64::BIFv16i8))
-          .add(MI.getOperand(0))
-          .add(MI.getOperand(2))
-          .add(MI.getOperand(3))
-          .add(MI.getOperand(1));
+      auto I = BuildMI(MBB, MBBI, MI.getDebugLoc(),
+                       TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BIFv8i8
+                                                           : AArch64::BIFv16i8))
+                   .add(MI.getOperand(0))
+                   .add(MI.getOperand(2))
+                   .add(MI.getOperand(3))
+                   .add(MI.getOperand(1));
+      transferImpOps(MI, I, I);
     } else {
       // Expand to BSL, use additional move if required
       if (DstReg == MI.getOperand(1).getReg()) {
-        BuildMI(MBB, MBBI, MI.getDebugLoc(),
-                TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BSLv8i8
-                                                    : AArch64::BSLv16i8))
-            .add(MI.getOperand(0))
-            .add(MI.getOperand(1))
-            .add(MI.getOperand(2))
-            .add(MI.getOperand(3));
+        auto I =
+            BuildMI(MBB, MBBI, MI.getDebugLoc(),
+                    TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BSLv8i8
+                                                        : AArch64::BSLv16i8))
+                .add(MI.getOperand(0))
+                .add(MI.getOperand(1))
+                .add(MI.getOperand(2))
+                .add(MI.getOperand(3));
+        transferImpOps(MI, I, I);
       } else {
         BuildMI(MBB, MBBI, MI.getDebugLoc(),
                 TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::ORRv8i8
@@ -1240,15 +1244,17 @@ bool AArch64ExpandPseudo::expandMI(MachineBasicBlock &MBB,
                         getRenamableRegState(MI.getOperand(0).isRenamable()))
             .add(MI.getOperand(1))
             .add(MI.getOperand(1));
-        BuildMI(MBB, MBBI, MI.getDebugLoc(),
-                TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BSLv8i8
-                                                    : AArch64::BSLv16i8))
-            .add(MI.getOperand(0))
-            .addReg(DstReg,
-                    RegState::Kill |
-                        getRenamableRegState(MI.getOperand(0).isRenamable()))
-            .add(MI.getOperand(2))
-            .add(MI.getOperand(3));
+        auto I2 =
+            BuildMI(MBB, MBBI, MI.getDebugLoc(),
+                    TII->get(Opcode == AArch64::BSPv8i8 ? AArch64::BSLv8i8
+                                                        : AArch64::BSLv16i8))
+                .add(MI.getOperand(0))
+                .addReg(DstReg,
+                        RegState::Kill | getRenamableRegState(
+                                             MI.getOperand(0).isRenamable()))
+                .add(MI.getOperand(2))
+                .add(MI.getOperand(3));
+        transferImpOps(MI, I2, I2);
       }
     }
     MI.eraseFromParent();
diff --git a/llvm/test/CodeGen/AArch64/bsp_implicit_ops.mir b/llvm/test/CodeGen/AArch64/bsp_implicit_ops.mir
new file mode 100644
index 0000000000000..bc7cc7887ff59
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/bsp_implicit_ops.mir
@@ -0,0 +1,98 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64-none-linux-gnu -run-pass aarch64-expand-pseudo -verify-machineinstrs %s -o - | FileCheck %s
+
+
+---
+name:            BIF
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+
+
+    ; CHECK-LABEL: name: BIF
+    ; CHECK: liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: renamable $q2 = ORRv16i8 killed renamable $q20, killed renamable $q20
+    ; CHECK-NEXT: renamable $q2 = BSLv16i8 killed renamable $q2, renamable $q21, renamable $q6, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+    ; CHECK-NEXT: $q22 = ORRv16i8 $q0, killed $q0
+    ; CHECK-NEXT: $q23 = ORRv16i8 $q1, killed $q1
+    ; CHECK-NEXT: $q24 = ORRv16i8 $q2, killed $q2
+    ; CHECK-NEXT: $q25 = ORRv16i8 $q3, killed $q3
+    ; CHECK-NEXT: RET undef $lr, implicit $q22
+    renamable $q2 = BSPv16i8 killed renamable $q20, renamable $q21, renamable $q6, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+    $q22 = ORRv16i8 $q0, killed $q0
+    $q23 = ORRv16i8 $q1, killed $q1
+    $q24 = ORRv16i8 $q2, killed $q2
+    $q25 = ORRv16i8 $q3, killed $q3
+    RET_ReallyLR implicit $q22
+...
+---
+name:            BSL
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+
+    ; CHECK-LABEL: name: BSL
+    ; CHECK: liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: renamable $q2 = BSLv16i8 killed renamable $q2, renamable $q21, renamable $q6, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+    ; CHECK-NEXT: $q22 = ORRv16i8 $q0, killed $q0
+    ; CHECK-NEXT: $q23 = ORRv16i8 $q1, killed $q1
+    ; CHECK-NEXT: $q24 = ORRv16i8 $q2, killed $q2
+    ; CHECK-NEXT: $q25 = ORRv16i8 $q3, killed $q3
+    ; CHECK-NEXT: RET undef $lr, implicit $q22
+    renamable $q2 = BSPv16i8 killed renamable $q2, renamable $q21, renamable $q6, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+    $q22 = ORRv16i8 $q0, killed $q0
+    $q23 = ORRv16i8 $q1, killed $q1
+    $q24 = ORRv16i8 $q2, killed $q2
+    $q25 = ORRv16i8 $q3, killed $q3
+    RET_ReallyLR implicit $q22
+...
+---
+name:            BIT
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+
+    ; CHECK-LABEL: name: BIT
+    ; CHECK: liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: renamable $q2 = BIFv16i8 renamable $q2, renamable $q6, killed renamable $q20, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+    ; CHECK-NEXT: $q22 = ORRv16i8 $q0, killed $q0
+    ; CHECK-NEXT: $q23 = ORRv16i8 $q1, killed $q1
+    ; CHECK-NEXT: $q24 = ORRv16i8 $q2, killed $q2
+    ; CHECK-NEXT: $q25 = ORRv16i8 $q3, killed $q3
+    ; CHECK-NEXT: RET undef $lr, implicit $q22
+    renamable $q2 = BSPv16i8 killed renamable $q20, renamable $q2, renamable $q6, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+    $q22 = ORRv16i8 $q0, killed $q0
+    $q23 = ORRv16i8 $q1, killed $q1
+    $q24 = ORRv16i8 $q2, killed $q2
+    $q25 = ORRv16i8 $q3, killed $q3
+    RET_ReallyLR implicit $q22
+...
+---
+name:            BSL_COPY
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+
+    ; CHECK-LABEL: name: BSL_COPY
+    ; CHECK: liveins: $q20, $q21, $q22, $q23, $q6, $q1, $q7
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: renamable $q2 = BITv16i8 renamable $q2, renamable $q21, killed renamable $q20, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+    ; CHECK-NEXT: $q22 = ORRv16i8 $q0, killed $q0
+    ; CHECK-NEXT: $q23 = ORRv16i8 $q1, killed $q1
+    ; CHECK-NEXT: $q24 = ORRv16i8 $q2, killed $q2
+    ; CHECK-NEXT: $q25 = ORRv16i8 $q3, killed $q3
+    ; CHECK-NEXT: RET undef $lr, implicit $q22
+    renamable $q2 = BSPv16i8 killed renamable $q20, renamable $q21, renamable $q2, implicit killed $q21_q22_q23, implicit killed $q0_q1_q2_q3, implicit-def $q0_q1_q2_q3
+    $q22 = ORRv16i8 $q0, killed $q0
+    $q23 = ORRv16i8 $q1, killed $q1
+    $q24 = ORRv16i8 $q2, killed $q2
+    $q25 = ORRv16i8 $q3, killed $q3
+    RET_ReallyLR implicit $q22
+...

Copy link
Contributor

@rj-jesus rj-jesus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change looks good, but the original input from #149380 looks suspicious (unless I'm missing something).

What I mean is this (https://godbolt.org/z/j15MoWf9E):

    ...
    renamable $q16_q17_q18_q19 = LD1Fourv2d $x21 :: (load (s512) from %stack.8, align 16)
    renamable $q17 = BSPv16i8 killed renamable $q0, renamable $q1, renamable $q23, implicit killed $q16_q17_q18_q19, implicit-def $q16_q17_q18_q19
    ST1Fourv2d killed renamable $q16_q17_q18_q19, killed $x21 :: (store (s512) into %stack.8, align 16)
    $q27 = ORRv16i8 $q20, $q20
    renamable $q0 = CMHIv2i64 renamable $q1, $q20
    renamable $q9 = BSPv16i8 killed renamable $q0, renamable $q1, $q20, implicit killed $q1_q2_q3, implicit killed $q8_q9_q10_q11, implicit-def $q8_q9_q10_q11
    $x21 = ADDXri $sp, 976, 0
    ST1Fourv2d killed renamable $q8_q9_q10_q11, killed $x21 :: (store (s512) into %stack.23, align 16)
    renamable $x21 = ADDXri renamable $x19, 144, 0
    renamable $q0 = CMHIv2i64 renamable $q4, renamable $q21
    renamable $q1_q2_q3 = LD3Threev2d killed renamable $x21 :: (load (s384) from %ir.35, align 64)
    renamable $q7 = CMHIv2i64 renamable $q1, renamable $q21
    renamable $q16 = BSPv16i8 killed renamable $q0, renamable $q4, renamable $q21, implicit-def $q16_q17_q18_q19
    renamable $q19 = BSPv16i8 killed renamable $q7, renamable $q1, renamable $q21, implicit killed $q16_q17_q18_q19, implicit-def $q16_q17_q18_q19
    $q12 = ORRv16i8 $q16, killed $q16
    $q13 = ORRv16i8 $q17, killed $q17
    $q14 = ORRv16i8 $q18, killed $q18
    $q15 = ORRv16i8 $q19, killed $q19
    ...

The first store seems to kill $q16_q17_q18_q19 before we actually reach the later instructions using q16, q18 and q19. Are they expected to be live then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would it be worth declaring MachineInstrBuilder MIB before the if-else chain and moving transferImpOps(MI, MIB, MIB); just before MI.eraseFromParent?

@davemgreen
Copy link
Collaborator Author

I think this change looks good, but the original input from #149380 looks suspicious (unless I'm missing something).

What I mean is this (https://godbolt.org/z/j15MoWf9E):

    ...
    renamable $q16_q17_q18_q19 = LD1Fourv2d $x21 :: (load (s512) from %stack.8, align 16)
    renamable $q17 = BSPv16i8 killed renamable $q0, renamable $q1, renamable $q23, implicit killed $q16_q17_q18_q19, implicit-def $q16_q17_q18_q19
    ST1Fourv2d killed renamable $q16_q17_q18_q19, killed $x21 :: (store (s512) into %stack.8, align 16)
    $q27 = ORRv16i8 $q20, $q20
    renamable $q0 = CMHIv2i64 renamable $q1, $q20
    renamable $q9 = BSPv16i8 killed renamable $q0, renamable $q1, $q20, implicit killed $q1_q2_q3, implicit killed $q8_q9_q10_q11, implicit-def $q8_q9_q10_q11
    $x21 = ADDXri $sp, 976, 0
    ST1Fourv2d killed renamable $q8_q9_q10_q11, killed $x21 :: (store (s512) into %stack.23, align 16)
    renamable $x21 = ADDXri renamable $x19, 144, 0
    renamable $q0 = CMHIv2i64 renamable $q4, renamable $q21
    renamable $q1_q2_q3 = LD3Threev2d killed renamable $x21 :: (load (s384) from %ir.35, align 64)
    renamable $q7 = CMHIv2i64 renamable $q1, renamable $q21
    renamable $q16 = BSPv16i8 killed renamable $q0, renamable $q4, renamable $q21, implicit-def $q16_q17_q18_q19
    renamable $q19 = BSPv16i8 killed renamable $q7, renamable $q1, renamable $q21, implicit killed $q16_q17_q18_q19, implicit-def $q16_q17_q18_q19
    $q12 = ORRv16i8 $q16, killed $q16
    $q13 = ORRv16i8 $q17, killed $q17
    $q14 = ORRv16i8 $q18, killed $q18
    $q15 = ORRv16i8 $q19, killed $q19
    ...

The first store seems to kill $q16_q17_q18_q19 before we actually reach the later instructions using q16, q18 and q19. Are they expected to be live then?

Is seems OK to me. What do you think is wrong with it? (Other than subreg-liveness inefficiencies).

This ensures that we transfer implicit operands to the new expanded pseudos if
necessary, similarly to other pseudo expansions.
@davemgreen davemgreen force-pushed the gh-a64-bspimplicit branch from 22c2fad to 0d6557c Compare July 21, 2025 06:35
@rj-jesus
Copy link
Contributor

rj-jesus commented Jul 21, 2025

Is seems OK to me. What do you think is wrong with it? (Other than subreg-liveness inefficiencies).

I think it must be something I'm misunderstanding, I agree it doesn't look wrong. What's confusing me is the implicit-def operand in renamable $q16 = BSPv16i8 killed renamable $q0, renamable $q4, renamable $q21, implicit-def $q16_q17_q18_q19. What's preventing q17, q18 or q19 from being hypothetically defined and killed between ST1Fourv2d killed renamable $q16_q17_q18_q19 (...) and renamable $q16 = BSPv16i8 (...) implicit-def $q16_q17_q18_q19, which would then presumably render the last three ORRs incorrect?

@davemgreen
Copy link
Collaborator Author

I think it is fine, if a bit inefficient. They come from register allocation of the large vectors (something that subregliveness helps with). The values are not coming from the store, the BSP will implicit-def $q16_q17_q18_q19 and insert q16 into it. The others are overridden later before they are used, I believe (or unused, but it looks like they updated by other BSPs). q19 is inserted in the last instruction, the others I think get inserted into q13 and q14 after the moves.

@rj-jesus
Copy link
Contributor

I see, thanks very much for the explanation. I was mainly concerned about the renamable $q17 = BSPv16i8 (...) just before the store (I was worried q17 could hypothetically get clobbered after the store, before reaching the later BSPs and moves). If that's not a concern, then all good. Thanks again. :)

@davemgreen davemgreen merged commit 95201b2 into llvm:main Jul 22, 2025
9 checks passed
@davemgreen davemgreen deleted the gh-a64-bspimplicit branch July 22, 2025 07:10
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
…9456)

This ensures that we transfer implicit operands to the new expanded
pseudos if necessary, similarly to other pseudo expansions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants