[RegAlloc] Sort CopyHint by IsCSR #131046

michaelmaitland · 2025-03-13T00:09:18Z

weightCalcHelper is responsible for adding hints to MRI. Prior to this PR, we fell back on register ID as the last tie breaker for sorting hints. However, there is an opportunity to add an additional sorting characteristic: whether or not a register is a callee-saved-register.

I thought of this idea because I saw that AllocationOrder::create calls RegisterClassInfo::getOrder, which returns a list of registers such that the registers which alias callee-saved-registers come last. From this, I conclude that the register allocator prefers an order such that callee-saved-registers are allocated after non-callee-saved-registers to avoid having to spill the CSR.

This sorting characteristic occurs only as a tie breaker to the Weight calculation. This is a good idea since the weight calculation is pretty complex and I'm sure it is a pretty stable metric. I think its pretty reasonable to agree that whether a register is callee-saved or not is a better tie breaker than register ID. I think this is evident by the test diff, since the changes all seem to have no impact or improve the register allocation.

llvmbot · 2025-03-13T00:09:43Z

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-regalloc

Author: Michael Maitland (michaelmaitland)

Changes

weightCalcHelper is responsible for adding hints to MRI. Prior to this PR, we fell back on register ID as the last tie breaker for sorting hints. However, there is an opportunity to add an additional sorting characteristic: whether or not a register is a callee-saved-register.

I thought of this idea because I saw that AllocationOrder::create calls RegisterClassInfo::getOrder, which returns a list of registers such that the registers which alias callee-saved-registers come last. From this, I conclude that the register allocator prefers an order such that callee-saved-registers are allocated after non-callee-saved-registers.

This sorting characteristic occurs only as a tie breaker to the Weight calculation. This is a good idea since the weight calculation is pretty complex and I'm sure it is a pretty stable metric. I think its pretty reasonable to agree that whether a register is callee-saved or not is a better tie breaker than register ID. I think this is evident by the test diff, since the changes all seem to have no impact or improve the register allocation.

Patch is 24.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131046.diff

19 Files Affected:

(modified) llvm/lib/CodeGen/CalcSpillWeights.cpp (+10-3)
(modified) llvm/test/CodeGen/AArch64/aarch64-signedreturnaddress.ll (+3-7)
(modified) llvm/test/CodeGen/AArch64/ptrauth-ret.ll (+2-5)
(modified) llvm/test/CodeGen/AVR/calling-conv/c/basic_aggr.ll (+2-4)
(modified) llvm/test/CodeGen/AVR/calling-conv/c/stack.ll (+7-9)
(modified) llvm/test/CodeGen/AVR/dynalloca.ll (+10-10)
(modified) llvm/test/CodeGen/AVR/return.ll (+1-1)
(modified) llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll (+1-1)
(modified) llvm/test/CodeGen/SPARC/32abi.ll (+21-21)
(modified) llvm/test/CodeGen/SPARC/64abi.ll (+5-9)
(modified) llvm/test/CodeGen/SPARC/bigreturn.ll (+7-7)
(modified) llvm/test/CodeGen/SPARC/fmuladd-soft-float.ll (+16-16)
(modified) llvm/test/CodeGen/SPARC/leafproc.ll (+5-5)
(modified) llvm/test/CodeGen/SPARC/parts.ll (+5-6)
(modified) llvm/test/CodeGen/SPARC/tailcall.ll (+6-6)
(modified) llvm/test/CodeGen/SPARC/umulo-128-legalisation-lowering.ll (+16-16)
(modified) llvm/test/CodeGen/X86/base-pointer-and-mwaitx.ll (+6-6)
(modified) llvm/test/CodeGen/X86/ghc-cc64.ll (+2-2)
(modified) llvm/test/CodeGen/X86/mwaitx.ll (+2-2)

diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index b78c956947a39..8be48bbed47ef 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -210,13 +210,18 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
   struct CopyHint {
     Register Reg;
     float Weight;
-    CopyHint(Register R, float W) : Reg(R), Weight(W) {}
+    bool IsCSR;
+    CopyHint(Register R, float W, bool IsCSR)
+        : Reg(R), Weight(W), IsCSR(IsCSR) {}
     bool operator<(const CopyHint &Rhs) const {
       // Always prefer any physreg hint.
       if (Reg.isPhysical() != Rhs.Reg.isPhysical())
         return Reg.isPhysical();
       if (Weight != Rhs.Weight)
         return (Weight > Rhs.Weight);
+      // Prefer non-CSR to CSR.
+      if (Reg.isPhysical() && IsCSR != Rhs.IsCSR)
+        return !IsCSR;
       return Reg.id() < Rhs.Reg.id(); // Tie-breaker.
     }
   };
@@ -299,10 +304,12 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
     SmallVector<CopyHint, 8> RegHints;
     for (const auto &[Reg, Weight] : Hint) {
       if (Reg != SkipReg)
-        RegHints.emplace_back(Reg, Weight);
+        RegHints.emplace_back(
+            Reg, Weight,
+            Reg.isPhysical() ? TRI.isCalleeSavedPhysReg(Reg, MF) : false);
     }
     sort(RegHints);
-    for (const auto &[Reg, Weight] : RegHints)
+    for (const auto &[Reg, _, __] : RegHints)
       MRI.addRegAllocationHint(LI.reg(), Reg);
 
     // Weakly boost the spill weight of hinted registers.
diff --git a/llvm/test/CodeGen/AArch64/aarch64-signedreturnaddress.ll b/llvm/test/CodeGen/AArch64/aarch64-signedreturnaddress.ll
index e1f63f93f111d..58db1923c6c66 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-signedreturnaddress.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-signedreturnaddress.ll
@@ -15,10 +15,8 @@ entry:
 ; CHECK-NEXT:     mov     x0, x30
 ; CHECK-NEXT:     ldr     x30, [sp], #16
 ; CHECK-NEXT:     ret
-; CHECKV83:       str     x30, [sp, #-16]!
-; CHECKV83-NEXT:  xpaci   x30
-; CHECKV83-NEXT:  mov     x0, x30
-; CHECKV83-NEXT:  ldr     x30, [sp], #16
+; CHECKV83:       mov     x0, x30
+; CHECKV83-NEXT:  xpaci   x0
 ; CHECKV83-NEXT:  ret
   %0 = tail call ptr @llvm.returnaddress(i32 0)
   ret ptr %0
@@ -35,10 +33,8 @@ entry:
 ; CHECK-NEXT:     hint    #29
 ; CHECK-NEXT:     ret
 ; CHECKV83:       paciasp
-; CHECKV83-NEXT:  str     x30, [sp, #-16]!
-; CHECKV83-NEXT:  xpaci   x30
 ; CHECKV83-NEXT:  mov     x0, x30
-; CHECKV83-NEXT:  ldr     x30, [sp], #16
+; CHECKV83-NEXT:  xpaci   x0
 ; CHECKV83-NEXT:  retaa
   %0 = tail call ptr @llvm.returnaddress(i32 0)
   ret ptr %0
diff --git a/llvm/test/CodeGen/AArch64/ptrauth-ret.ll b/llvm/test/CodeGen/AArch64/ptrauth-ret.ll
index 61f5f6d9d23b7..98e62e57c19e2 100644
--- a/llvm/test/CodeGen/AArch64/ptrauth-ret.ll
+++ b/llvm/test/CodeGen/AArch64/ptrauth-ret.ll
@@ -112,12 +112,9 @@ define void @test_noframe() #0 {
 define ptr @test_returnaddress_0() #0 {
 ; CHECK-LABEL: test_returnaddress_0:
 ; CHECK:       %bb.0:
-; CHECK-NEXT:    pacibsp
-; CHECK-NEXT:    str x30, [sp, #-16]!
-; CHECK-NEXT:    xpaci x30
 ; CHECK-NEXT:    mov x0, x30
-; CHECK-NEXT:    ldr x30, [sp], #16
-; CHECK-NEXT:    retab
+; CHECK-NEXT:    xpaci x0
+; CHECK-NEXT:    ret
   %r = call ptr @llvm.returnaddress(i32 0)
   ret ptr %r
 }
diff --git a/llvm/test/CodeGen/AVR/calling-conv/c/basic_aggr.ll b/llvm/test/CodeGen/AVR/calling-conv/c/basic_aggr.ll
index 96f24b0b96e6e..07d58841dd802 100644
--- a/llvm/test/CodeGen/AVR/calling-conv/c/basic_aggr.ll
+++ b/llvm/test/CodeGen/AVR/calling-conv/c/basic_aggr.ll
@@ -132,17 +132,15 @@ define i8 @foo2([6 x i8] %0, [6 x i8] %1, [6 x i8] %2) {
 define i8 @foo3([9 x i8] %0, [9 x i8] %1) {
 ; CHECK-LABEL: foo3:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    push r16
 ; CHECK-NEXT:    push r28
 ; CHECK-NEXT:    push r29
 ; CHECK-NEXT:    in r28, 61
 ; CHECK-NEXT:    in r29, 62
-; CHECK-NEXT:    ldd r24, Y+6
-; CHECK-NEXT:    sub r16, r24
 ; CHECK-NEXT:    mov r24, r16
+; CHECK-NEXT:    ldd r25, Y+5
+; CHECK-NEXT:    sub r24, r25
 ; CHECK-NEXT:    pop r29
 ; CHECK-NEXT:    pop r28
-; CHECK-NEXT:    pop r16
 ; CHECK-NEXT:    ret
   %3 = extractvalue [9 x i8] %0, 0
   %4 = extractvalue [9 x i8] %1, 0
diff --git a/llvm/test/CodeGen/AVR/calling-conv/c/stack.ll b/llvm/test/CodeGen/AVR/calling-conv/c/stack.ll
index ba2e65eb2bcba..60a45f72aad0a 100644
--- a/llvm/test/CodeGen/AVR/calling-conv/c/stack.ll
+++ b/llvm/test/CodeGen/AVR/calling-conv/c/stack.ll
@@ -74,17 +74,15 @@ define i8 @foo1([19 x i8] %a, i8 %b) {
 define i8 @foo2([17 x i8] %a, i8 %b) {
 ; CHECK-LABEL: foo2:
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    push r8
 ; CHECK-NEXT:    push r28
 ; CHECK-NEXT:    push r29
-; CHECK-NEXT:    in r28, 61
-; CHECK-NEXT:    in r29, 62
-; CHECK-NEXT:    ldd r24, Y+6
-; CHECK-NEXT:    sub r8, r24
-; CHECK-NEXT:    mov r24, r8
-; CHECK-NEXT:    pop r29
-; CHECK-NEXT:    pop r28
-; CHECK-NEXT:    pop r8
+; CHECK-NEXT:    in   r28, 61
+; CHECK-NEXT:    in   r29, 62
+; CHECK-NEXT:    mov  r24, r8
+; CHECK-NEXT:    ldd  r25, Y+5
+; CHECK-NEXT:    sub  r24, r25
+; CHECK-NEXT:    pop  r29
+; CHECK-NEXT:    pop  r28
 ; CHECK-NEXT:    ret
   %c = extractvalue [17 x i8] %a, 0
   %d = sub i8 %c, %b
diff --git a/llvm/test/CodeGen/AVR/dynalloca.ll b/llvm/test/CodeGen/AVR/dynalloca.ll
index 774bb76d0a0e0..3face71c988b0 100644
--- a/llvm/test/CodeGen/AVR/dynalloca.ll
+++ b/llvm/test/CodeGen/AVR/dynalloca.ll
@@ -64,16 +64,16 @@ define void @dynalloca2(i16 %x) {
 ; CHECK-NEXT: out 63, r0
 ; CHECK-NEXT: out 61, {{.*}}
 ; Store values on the stack
-; CHECK: ldi r16, 0
-; CHECK: ldi r17, 0
-; CHECK: std Z+8, r17
-; CHECK: std Z+7, r16
-; CHECK: std Z+6, r17
-; CHECK: std Z+5, r16
-; CHECK: std Z+4, r17
-; CHECK: std Z+3, r16
-; CHECK: std Z+2, r17
-; CHECK: std Z+1, r16
+; CHECK: ldi r20, 0
+; CHECK: ldi r21, 0
+; CHECK: std Z+8, r21
+; CHECK: std Z+7, r20
+; CHECK: std Z+6, r21
+; CHECK: std Z+5, r20
+; CHECK: std Z+4, r21
+; CHECK: std Z+3, r20
+; CHECK: std Z+2, r21
+; CHECK: std Z+1, r20
 ; CHECK: call
 ; Call frame restore
 ; CHECK-NEXT: in r30, 61
diff --git a/llvm/test/CodeGen/AVR/return.ll b/llvm/test/CodeGen/AVR/return.ll
index 207ad2f23a737..93dfa257c4b33 100644
--- a/llvm/test/CodeGen/AVR/return.ll
+++ b/llvm/test/CodeGen/AVR/return.ll
@@ -126,8 +126,8 @@ define i32 @return32_arg(i32 %x) {
 define i32 @return32_arg2(i32 %x, i32 %y, i32 %z) {
 ; AVR-LABEL: return32_arg2:
 ; AVR:       ; %bb.0:
-; AVR-NEXT:    movw r22, r14
 ; AVR-NEXT:    movw r24, r16
+; AVR-NEXT:    movw r22, r14
 ; AVR-NEXT:    ret
 ;
 ; TINY-LABEL: return32_arg2:
diff --git a/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll b/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
index f5cd6c703c9db..9ccd4f1c0ac9a 100644
--- a/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
+++ b/llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll
@@ -60,7 +60,7 @@ entry:
 ;CHECK:      sethi
 ;CHECK:      !NO_APP
 ;CHECK-NEXT: ble
-;CHECK-NEXT: mov
+;CHECK-NEXT: nop
   tail call void asm sideeffect "sethi 0, %g0", ""() nounwind
   %0 = icmp slt i32 %a, 0
   br i1 %0, label %bb, label %bb1
diff --git a/llvm/test/CodeGen/SPARC/32abi.ll b/llvm/test/CodeGen/SPARC/32abi.ll
index 9bf7dcc18f393..928885f027ca0 100644
--- a/llvm/test/CodeGen/SPARC/32abi.ll
+++ b/llvm/test/CodeGen/SPARC/32abi.ll
@@ -143,28 +143,28 @@ define double @floatarg(double %a0,   ; %i0,%i1
 ; CHECK-LABEL: call_floatarg:
 ; HARD: save %sp, -112, %sp
 ; HARD: mov %i2, %o1
+; HARD-NEXT: mov %i0, %o2
 ; HARD-NEXT: mov %i1, %o0
 ; HARD-NEXT: st %i0, [%sp+104]
 ; HARD-NEXT: std %o0, [%sp+96]
 ; HARD-NEXT: st %o1, [%sp+92]
-; HARD-NEXT: mov %i0, %o2
 ; HARD-NEXT: mov %i1, %o3
 ; HARD-NEXT: mov %o1, %o4
 ; HARD-NEXT: mov %i1, %o5
 ; HARD-NEXT: call floatarg
 ; HARD: std %f0, [%i4]
-; SOFT: st %i0, [%sp+104]
-; SOFT-NEXT:  st %i2, [%sp+100]
-; SOFT-NEXT:  st %i1, [%sp+96]
-; SOFT-NEXT:  st %i2, [%sp+92]
-; SOFT-NEXT:  mov  %i1, %o0
-; SOFT-NEXT:  mov  %i2, %o1
-; SOFT-NEXT:  mov  %i0, %o2
-; SOFT-NEXT:  mov  %i1, %o3
-; SOFT-NEXT:  mov  %i2, %o4
-; SOFT-NEXT:  mov  %i1, %o5
-; SOFT-NEXT:  call floatarg
-; SOFT:  std %o0, [%i4]
+; SOFT: mov %i2, %o1
+; SOFT-NEXT: mov %i1, %o0
+; SOFT-NEXT: mov %i0, %o2
+; SOFT-NEXT: st %i0, [%sp+104]
+; SOFT-NEXT: st %i2, [%sp+100]
+; SOFT-NEXT: st %i1, [%sp+96]
+; SOFT-NEXT: st %i2, [%sp+92]
+; SOFT-NEXT: mov %i1, %o3
+; SOFT-NEXT: mov %i2, %o4
+; SOFT-NEXT: mov %i1, %o5
+; SOFT-NEXT: call floatarg
+; SOFT: std %o0, [%i4]
 ; CHECK: restore
 define void @call_floatarg(float %f1, double %d2, float %f5, ptr %p) {
   %r = call double @floatarg(double %d2, float %f1, double %d2, double %d2,
@@ -228,18 +228,18 @@ define i64 @i64arg(i64 %a0,    ; %i0,%i1
 
 ; CHECK-LABEL: call_i64arg:
 ; CHECK: save %sp, -112, %sp
-; CHECK: st %i0, [%sp+104]
+; CHECK: mov %i2, %o1
+; CHECK-NEXT: mov %i1, %o0
+; CHECK-NEXT: mov %i0, %o2
+; CHECK-NEXT: st %i0, [%sp+104]
 ; CHECK-NEXT: st %i2, [%sp+100]
 ; CHECK-NEXT: st %i1, [%sp+96]
 ; CHECK-NEXT: st %i2, [%sp+92]
-; CHECK-NEXT: mov      %i1, %o0
-; CHECK-NEXT: mov      %i2, %o1
-; CHECK-NEXT: mov      %i0, %o2
-; CHECK-NEXT: mov      %i1, %o3
-; CHECK-NEXT: mov      %i2, %o4
-; CHECK-NEXT: mov      %i1, %o5
+; CHECK-NEXT: mov %i1, %o3
+; CHECK-NEXT: mov %i2, %o4
+; CHECK-NEXT: mov %i1, %o5
 ; CHECK-NEXT: call i64arg
-; CHECK: std %o0, [%i3]
+; CHECK:      std %o0, [%i3]
 ; CHECK-NEXT: restore
 
 define void @call_i64arg(i32 %a0, i64 %a1, ptr %p) {
diff --git a/llvm/test/CodeGen/SPARC/64abi.ll b/llvm/test/CodeGen/SPARC/64abi.ll
index 61056f50a8c5d..6485a7f13e8d5 100644
--- a/llvm/test/CodeGen/SPARC/64abi.ll
+++ b/llvm/test/CodeGen/SPARC/64abi.ll
@@ -118,12 +118,10 @@ define double @floatarg(float %a0,    ; %f1
 ; SOFT: stx %i2, [%sp+2239]
 ; SOFT: stx %i2, [%sp+2231]
 ; SOFT: stx %i2, [%sp+2223]
-; SOFT: mov  %i2, %o0
-; SOFT: mov  %i1, %o1
-; SOFT: mov  %i1, %o2
-; SOFT: mov  %i1, %o3
-; SOFT: mov  %i2, %o4
-; SOFT: mov  %i2, %o5
+; SOFT: mov %i1, %o2
+; SOFT: mov %i1, %o3
+; SOFT: mov %i2, %o4
+; SOFT: mov %i2, %o5
 ; CHECK: call floatarg
 ; CHECK-NOT: add %sp
 ; CHECK: restore
@@ -174,11 +172,9 @@ define void @mixedarg(i8 %a0,      ; %i0
 
 ; CHECK-LABEL: call_mixedarg:
 ; CHECK: stx %i2, [%sp+2247]
-; SOFT:  stx %i1, [%sp+2239]
 ; CHECK: stx %i0, [%sp+2223]
 ; HARD: fmovd %f2, %f6
 ; HARD: fmovd %f2, %f16
-; SOFT: mov  %i1, %o3
 ; CHECK: call mixedarg
 ; CHECK-NOT: add %sp
 ; CHECK: restore
@@ -262,8 +258,8 @@ define i32 @inreg_if(float inreg %a0, ; %f0
 }
 
 ; CHECK-LABEL: call_inreg_if:
-; HARD: fmovs %f3, %f0
 ; HARD: mov %i2, %o0
+; HARD: fmovs %f3, %f0
 ; SOFT: srl %i2, 0, %i0
 ; SOFT: sllx %i1, 32, %i1
 ; SOFT: or %i1, %i0, %o0
diff --git a/llvm/test/CodeGen/SPARC/bigreturn.ll b/llvm/test/CodeGen/SPARC/bigreturn.ll
index ef691ef025af6..d546a104f4b6d 100644
--- a/llvm/test/CodeGen/SPARC/bigreturn.ll
+++ b/llvm/test/CodeGen/SPARC/bigreturn.ll
@@ -92,9 +92,9 @@ define i32 @call_ret_i32_arr(i32 %0) {
 ; SPARC-NEXT:    .cfi_def_cfa_register %fp
 ; SPARC-NEXT:    .cfi_window_save
 ; SPARC-NEXT:    .cfi_register %o7, %i7
-; SPARC-NEXT:    add %fp, -64, %i1
-; SPARC-NEXT:    st %i1, [%sp+64]
 ; SPARC-NEXT:    mov %i0, %o0
+; SPARC-NEXT:    add %fp, -64, %i0
+; SPARC-NEXT:    st %i0, [%sp+64]
 ; SPARC-NEXT:    call ret_i32_arr
 ; SPARC-NEXT:    nop
 ; SPARC-NEXT:    unimp 64
@@ -110,8 +110,8 @@ define i32 @call_ret_i32_arr(i32 %0) {
 ; SPARC64-NEXT:    .cfi_def_cfa_register %fp
 ; SPARC64-NEXT:    .cfi_window_save
 ; SPARC64-NEXT:    .cfi_register %o7, %i7
-; SPARC64-NEXT:    add %fp, 1983, %o0
 ; SPARC64-NEXT:    mov %i0, %o1
+; SPARC64-NEXT:    add %fp, 1983, %o0
 ; SPARC64-NEXT:    call ret_i32_arr
 ; SPARC64-NEXT:    nop
 ; SPARC64-NEXT:    ld [%fp+2043], %i0
@@ -220,10 +220,10 @@ define i64 @call_ret_i64_arr(i64 %0) {
 ; SPARC-NEXT:    .cfi_def_cfa_register %fp
 ; SPARC-NEXT:    .cfi_window_save
 ; SPARC-NEXT:    .cfi_register %o7, %i7
-; SPARC-NEXT:    add %fp, -128, %i2
-; SPARC-NEXT:    st %i2, [%sp+64]
-; SPARC-NEXT:    mov %i0, %o0
 ; SPARC-NEXT:    mov %i1, %o1
+; SPARC-NEXT:    mov %i0, %o0
+; SPARC-NEXT:    add %fp, -128, %i0
+; SPARC-NEXT:    st %i0, [%sp+64]
 ; SPARC-NEXT:    call ret_i64_arr
 ; SPARC-NEXT:    nop
 ; SPARC-NEXT:    unimp 128
@@ -239,8 +239,8 @@ define i64 @call_ret_i64_arr(i64 %0) {
 ; SPARC64-NEXT:    .cfi_def_cfa_register %fp
 ; SPARC64-NEXT:    .cfi_window_save
 ; SPARC64-NEXT:    .cfi_register %o7, %i7
-; SPARC64-NEXT:    add %fp, 1919, %o0
 ; SPARC64-NEXT:    mov %i0, %o1
+; SPARC64-NEXT:    add %fp, 1919, %o0
 ; SPARC64-NEXT:    call ret_i64_arr
 ; SPARC64-NEXT:    nop
 ; SPARC64-NEXT:    ldx [%fp+2039], %i0
diff --git a/llvm/test/CodeGen/SPARC/fmuladd-soft-float.ll b/llvm/test/CodeGen/SPARC/fmuladd-soft-float.ll
index b2ea38f294335..26aa46b8a8698 100644
--- a/llvm/test/CodeGen/SPARC/fmuladd-soft-float.ll
+++ b/llvm/test/CodeGen/SPARC/fmuladd-soft-float.ll
@@ -10,9 +10,9 @@ define float @fmuladd_intrinsic_f32(float %a, float %b, float %c) #0 {
 ; SOFT-FLOAT-32-NEXT:    .cfi_def_cfa_register %fp
 ; SOFT-FLOAT-32-NEXT:    .cfi_window_save
 ; SOFT-FLOAT-32-NEXT:    .cfi_register %o7, %i7
-; SOFT-FLOAT-32-NEXT:    mov %i0, %o0
-; SOFT-FLOAT-32-NEXT:    call __mulsf3
 ; SOFT-FLOAT-32-NEXT:    mov %i1, %o1
+; SOFT-FLOAT-32-NEXT:    call __mulsf3
+; SOFT-FLOAT-32-NEXT:    mov %i0, %o0
 ; SOFT-FLOAT-32-NEXT:    call __addsf3
 ; SOFT-FLOAT-32-NEXT:    mov %i2, %o1
 ; SOFT-FLOAT-32-NEXT:    ret
@@ -44,11 +44,11 @@ define double @fmuladd_intrinsic_f64(double %a, double %b, double %c) #0 {
 ; SOFT-FLOAT-32-NEXT:    .cfi_def_cfa_register %fp
 ; SOFT-FLOAT-32-NEXT:    .cfi_window_save
 ; SOFT-FLOAT-32-NEXT:    .cfi_register %o7, %i7
-; SOFT-FLOAT-32-NEXT:    mov %i0, %o0
-; SOFT-FLOAT-32-NEXT:    mov %i1, %o1
+; SOFT-FLOAT-32-NEXT:    mov %i3, %o3
 ; SOFT-FLOAT-32-NEXT:    mov %i2, %o2
+; SOFT-FLOAT-32-NEXT:    mov %i1, %o1
 ; SOFT-FLOAT-32-NEXT:    call __muldf3
-; SOFT-FLOAT-32-NEXT:    mov %i3, %o3
+; SOFT-FLOAT-32-NEXT:    mov %i0, %o0
 ; SOFT-FLOAT-32-NEXT:    mov %i4, %o2
 ; SOFT-FLOAT-32-NEXT:    call __adddf3
 ; SOFT-FLOAT-32-NEXT:    mov %i5, %o3
@@ -63,9 +63,9 @@ define double @fmuladd_intrinsic_f64(double %a, double %b, double %c) #0 {
 ; SOFT-FLOAT-64-NEXT:    .cfi_def_cfa_register %fp
 ; SOFT-FLOAT-64-NEXT:    .cfi_window_save
 ; SOFT-FLOAT-64-NEXT:    .cfi_register %o7, %i7
-; SOFT-FLOAT-64-NEXT:    mov %i0, %o0
-; SOFT-FLOAT-64-NEXT:    call __muldf3
 ; SOFT-FLOAT-64-NEXT:    mov %i1, %o1
+; SOFT-FLOAT-64-NEXT:    call __muldf3
+; SOFT-FLOAT-64-NEXT:    mov %i0, %o0
 ; SOFT-FLOAT-64-NEXT:    call __adddf3
 ; SOFT-FLOAT-64-NEXT:    mov %i2, %o1
 ; SOFT-FLOAT-64-NEXT:    ret
@@ -82,9 +82,9 @@ define float @fmuladd_contract_f32(float %a, float %b, float %c) #0 {
 ; SOFT-FLOAT-32-NEXT:    .cfi_def_cfa_register %fp
 ; SOFT-FLOAT-32-NEXT:    .cfi_window_save
 ; SOFT-FLOAT-32-NEXT:    .cfi_register %o7, %i7
-; SOFT-FLOAT-32-NEXT:    mov %i0, %o0
-; SOFT-FLOAT-32-NEXT:    call __mulsf3
 ; SOFT-FLOAT-32-NEXT:    mov %i1, %o1
+; SOFT-FLOAT-32-NEXT:    call __mulsf3
+; SOFT-FLOAT-32-NEXT:    mov %i0, %o0
 ; SOFT-FLOAT-32-NEXT:    call __addsf3
 ; SOFT-FLOAT-32-NEXT:    mov %i2, %o1
 ; SOFT-FLOAT-32-NEXT:    ret
@@ -117,11 +117,11 @@ define double @fmuladd_contract_f64(double %a, double %b, double %c) #0 {
 ; SOFT-FLOAT-32-NEXT:    .cfi_def_cfa_register %fp
 ; SOFT-FLOAT-32-NEXT:    .cfi_window_save
 ; SOFT-FLOAT-32-NEXT:    .cfi_register %o7, %i7
-; SOFT-FLOAT-32-NEXT:    mov %i0, %o0
-; SOFT-FLOAT-32-NEXT:    mov %i1, %o1
+; SOFT-FLOAT-32-NEXT:    mov %i3, %o3
 ; SOFT-FLOAT-32-NEXT:    mov %i2, %o2
+; SOFT-FLOAT-32-NEXT:    mov %i1, %o1
 ; SOFT-FLOAT-32-NEXT:    call __muldf3
-; SOFT-FLOAT-32-NEXT:    mov %i3, %o3
+; SOFT-FLOAT-32-NEXT:    mov %i0, %o0
 ; SOFT-FLOAT-32-NEXT:    mov %i4, %o2
 ; SOFT-FLOAT-32-NEXT:    call __adddf3
 ; SOFT-FLOAT-32-NEXT:    mov %i5, %o3
@@ -136,9 +136,9 @@ define double @fmuladd_contract_f64(double %a, double %b, double %c) #0 {
 ; SOFT-FLOAT-64-NEXT:    .cfi_def_cfa_register %fp
 ; SOFT-FLOAT-64-NEXT:    .cfi_window_save
 ; SOFT-FLOAT-64-NEXT:    .cfi_register %o7, %i7
-; SOFT-FLOAT-64-NEXT:    mov %i0, %o0
-; SOFT-FLOAT-64-NEXT:    call __muldf3
 ; SOFT-FLOAT-64-NEXT:    mov %i1, %o1
+; SOFT-FLOAT-64-NEXT:    call __muldf3
+; SOFT-FLOAT-64-NEXT:    mov %i0, %o0
 ; SOFT-FLOAT-64-NEXT:    call __adddf3
 ; SOFT-FLOAT-64-NEXT:    mov %i2, %o1
 ; SOFT-FLOAT-64-NEXT:    ret
@@ -162,9 +162,9 @@ define <4 x float> @fmuladd_contract_v4f32(<4 x float> %a, <4 x float> %b, <4 x
 ; SOFT-FLOAT-32-NEXT:    ld [%fp+112], %l3
 ; SOFT-FLOAT-32-NEXT:    ld [%fp+96], %l4
 ; SOFT-FLOAT-32-NEXT:    ld [%fp+92], %l5
-; SOFT-FLOAT-32-NEXT:    mov %i0, %o0
-; SOFT-FLOAT-32-NEXT:    call __mulsf3
 ; SOFT-FLOAT-32-NEXT:    mov %i4, %o1
+; SOFT-FLOAT-32-NEXT:    call __mulsf3
+; SOFT-FLOAT-32-NEXT:    mov %i0, %o0
 ; SOFT-FLOAT-32-NEXT:    mov %o0, %i0
 ; SOFT-FLOAT-32-NEXT:    mov %i1, %o0
 ; SOFT-FLOAT-32-NEXT:    call __mulsf3
diff --git a/llvm/test/CodeGen/SPARC/leafproc.ll b/llvm/test/CodeGen/SPARC/leafproc.ll
index 81dee16159d71..3998ef26f7b20 100644
--- a/llvm/test/CodeGen/SPARC/leafproc.ll
+++ b/llvm/test/CodeGen/SPARC/leafproc.ll
@@ -86,12 +86,12 @@ entry:
 ; CHECK-LABEL: leaf_proc_give_up
 ; CHECK: save %sp, -96, %sp
 ; CHECK: ld [%fp+92], %o5
-; CHECK: mov %i0, %g1
-; CHECK: mov %i1, %o0
-; CHECK: mov %i2, %o1
-; CHECK: mov %i3, %o2
-; CHECK: mov %i4, %o3
 ; CHECK: mov %i5, %o4
+; CHECK: mov %i4, %o3
+; CHECK: mov %i3, %o2
+; CHECK: mov %i2, %o1
+; CHECK: mov %i1, %o0
+; CHECK: mov %i0, %g1
 ; CHECK: ret
 ; CHECK-NEXT: restore %g0, %o0, %o0
 
diff --git a/llvm/test/CodeGen/SPARC/parts.ll b/llvm/test/CodeGen/SPARC/parts.ll
index 938c4471968be..a5fe0212230a0 100644
--- a/llvm/test/CodeGen/SPARC/parts.ll
+++ b/llvm/test/CodeGen/SPARC/parts.ll
@@ -1,12 +1,11 @@
 ; RUN: llc < %s -mtriple=sparcv9 | FileCheck %s
   
 ; CHECK-LABEL: test
-; CHECK:        srl %i1, 0, %o2
-; CHECK-NEXT:   mov %i2, %o0
-; CHECK-NEXT:   call __ashlti3
-; CHECK-NEXT:   mov %i3, %o1
-; CHECK-NEXT:   mov %o0, %i0
-  
+; CHECK: mov %i3, %o1
+; CHECK-NEXT: mov %i2, %o0
+; CHECK-NEXT: call __ashlti3
+; CHECK-NEXT: srl %i1, 0, %o2
+
 define i128 @test(i128 %a, i128 %b) {
 entry:
     %tmp = shl i128 %b, %a
diff --git a/llvm/test/CodeGen/SPARC/tailcall.ll b/llvm/test/CodeGen/SPARC/tailcall.ll
index 45612c51ee133..e9955aea19908 100644
--- a/llvm/test/CodeGen/SPARC/tailcall.ll
+++ b/llvm/test/CodeGen/SPARC/tailcall.ll
@@ -253,13 +253,13 @@ define void @ret_large_struct(ptr noalias sret(%struct.big) %agg.result) #0 {
 ; V9-LABEL: ret_large_struct:
 ; V9:       ! %bb.0: ! %entry
 ; V9-NEXT:    save %sp, -176, %sp
-; V9-NEXT:    sethi %h44(bigstruct), %i1
-; V9-NEXT:    add %i1, %m44(bigstruct), %i1
-; V9-NEXT:    sllx %i1, 12, %i1
-; V9-NEXT:    add %i1, %l44(bigstruct), %o1
-; V9-NEXT:    mov 400, %o2
-; V9-NEXT:    call memcpy
 ; V9-NEXT:    mov %i0, %o0
+; V9-NEXT:    sethi %h44(bigstruct), %i0
+; V9-NEXT:    add %i0, %m44(bigstruct), %i0
+; V9-NEXT:    sllx %i0, 12, %i0
+; V9-NEXT:    add %i0, %l44(bigstruct), %o1
+; V9-NEXT:    call memcpy
+; V9-NEXT:    mov 400, %o2
 ; V9-NEXT:    ret
 ; V9-NEXT:    restore
 entry:
diff --git a/llvm/test/CodeGen/SPARC/umulo-128-legalisation-lowering.ll b/llvm/test/CodeGen/SPARC/umulo-128-legalisation-lowering.ll
index 01383a00c2619..f3835790210a0 100644
--- a/llvm/test/CodeGen/SPARC/umulo-128-legalisation-lowering.ll
+++ b/llvm/test/CodeGen/SPARC/umulo-128-legalisation-lowering.ll
@@ -160,6 +160,7 @@ define { i128, i8 } @muloti_test(i128 %l, i128 %r) nounwind {
 ; SPARC64-NEXT:    .register %g3, #scratch
 ; SPARC64-NEXT:  ! %bb.0: ! %start
 ; SPARC64-NEXT:    save %sp, -176, %sp
+; SPARC64-NEXT:    mov %i0, %l1
 ; SPARC64-NEXT:    mov %g0, %o0
 ; SPARC64-NEXT:    mov %i2, %o1
 ; SPARC64-NEXT:    mov %g0, %o2
@@ -173,30 +174,29 @@ define { i128, i8 } @muloti_test(i128 %l, i128 %r) nounwind {
 ; SPARC64-NEXT:    call __multi3
 ; SPARC64-NEXT:    mov %i3, %o3
 ; SPARC64-NEXT:    mov %o0, %l0
-; SPARC64-NEXT:    add %o1, %i5, %i5
+; SPARC64-NEXT:    add %o1, %i5, %i0
 ; SPARC64-NEXT:    mov %g0, %o0
 ; SPARC64-NEXT:    mov %i1, %o1
 ; SPARC64-NEXT:    mov %g0, %o2
 ; SPARC64-NEXT:    call __multi3
 ; SPARC64-NEXT:    mov %i3, %o3
+; SPARC64-NEXT:    mov %g0, %i1
 ; SPARC64-NEXT:    mov %g0, %i3
+; SPARC64-NEXT:    mov %g0, %i5
 ; SPARC64-NEXT:    mov %g0, %g2
 ; SPARC64-NEXT:    mov %g0, %g3
-; SPARC64-NEXT:    mov %g0, %g4
-; SPARC64-NEXT:    mov %g0, %g5
-; SPARC64-NEXT:    add %o0, %i5, %i1
-; ...
[truncated]

michaelmaitland · 2025-03-13T00:11:11Z

llvm/test/CodeGen/SPARC/2011-01-19-DelaySlot.ll

@@ -60,7 +60,7 @@ entry:
 ;CHECK:      sethi
 ;CHECK:      !NO_APP
 ;CHECK-NEXT: ble
-;CHECK-NEXT: mov
+;CHECK-NEXT: nop


This test is still correct. The mov got hoisted and isn't relevant to this test anymore. There is a nop here now which is the point of this test (an instruction is placed in the delay slot).

`weightCalcHelper` is responsible for adding hints to MRI. Prior to this PR, we fell back on register ID as the last tie breaker for sorting hints. However, there is an opportunity to add an additional sorting characteristic: whether or not a register is a callee-saved-register. I thought of this idea because I saw that `AllocationOrder::create` calls `RegisterClassInfo::getOrder`, which returns a list of registers such that the registers which alias callee-saved-registers come last. From this, I conclude that the register allocator prefers an order such that callee-saved-registers are allocated after non-callee-saved-registers. This sorting characteristic occurs only as a tie breaker to the Weight calculation. This is a good idea since the weight calculation is pretty complex and I'm sure it is a pretty stable metric. I think its pretty reasonable to agree that whether a register is callee-saved or not is a better tie breaker than register ID. I think this is evident by the test diff, since the changes all seem to have no impact or improve the register allocation.

arsenm

Do we actually see many contexts where registers have multiple hints?

Targeted MIR test for this would be nice

michaelmaitland · 2025-03-13T13:43:11Z

Do we actually see many contexts where registers have multiple hints?

@arsenm I instrumented the code like this:

static int I = 0;

weightCalcHelper() {
...
// INSTRUMENT HERE
if (RegHints.size() > 1)
  errs() << "I:" << I++ <"\n";
// END INSTRUMENT
sort(RegHints);
...

I ran llc on perlbench_r for RISC-V and it looks like we had about 18k+ times there were multiple hints. I hope that helps to answer your question.

Targeted MIR test for this would be nice

Working on it

michaelmaitland · 2025-03-24T19:49:04Z

@arsenm I'm having some trouble coming up with a test case that isn't an exact clone of one of the existing tests, even after running through llvm-reduce. I also tried looking for a RISC-V example in spec2017, but there is none. I don't think I have the ability to easily build spec for another target due. Any suggestions?

arsenm · 2025-03-25T00:57:19Z

@arsenm I'm having some trouble coming up with a test case that isn't an exact clone of one of the existing tests, even after running through llvm-reduce.

Which case is this? You were trying llvm-reduce on the MIR? I can try to see if my out of tree patches help reduce it any

michaelmaitland · 2025-03-25T01:01:19Z

@arsenm I'm having some trouble coming up with a test case that isn't an exact clone of one of the existing tests, even after running through llvm-reduce.

Which case is this? You were trying llvm-reduce on the MIR? I can try to see if my out of tree patches help reduce it any

The AArch64 cases were good candidates because they reduced spillage, however they also required us to use a specific mattr (pauth) to get it. I was hoping for something more general. When we reduce them though, they are the same test.

The AVR case foo2 is pretty simple and has less spillage, except the test case is pretty simple already. Is it worth to add a copy of this test in its own file to test this?

The X86 test cases do not reduce spillage, so I didn't think they were good candidates.

michaelmaitland · 2025-03-28T02:16:35Z

llvm/test/CodeGen/AArch64/csr-copy-hint.mir

+    ; CHECK-NEXT: RET undef $lr, implicit killed $x0
+    %0:gpr64 = COPY killed $lr
+    %1:gpr64 = XPACI killed %0
+    $x0 = COPY killed %1


@arsenm how is this?

arsenm · 2025-03-28T03:22:37Z

llvm/test/CodeGen/AArch64/csr-copy-hint.mir

@@ -0,0 +1,22 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=arm64-eabi -mattr=v8.3a \ -stop-after=virtregmap -o - %s | FileCheck %s


Suggested change

# RUN: llc -mtriple=arm64-eabi -mattr=v8.3a \ -stop-after=virtregmap -o - %s | FileCheck %s

# RUN: llc -mtriple=arm64-eabi -mattr=v8.3a -stop-after=virtregmap -o - %s | FileCheck %s

I'm not sure what this run line does, but it probably should be an error. What is the starting point?

Sorry, that was an artifact I forgot to remove when I had the run line on multiple lines. Removed

`weightCalcHelper` is responsible for adding hints to MRI. Prior to this PR, we fell back on register ID as the last tie breaker for sorting hints. However, there is an opportunity to add an additional sorting characteristic: whether or not a register is a callee-saved-register. I thought of this idea because I saw that `AllocationOrder::create` calls `RegisterClassInfo::getOrder`, which returns a list of registers such that the registers which alias callee-saved-registers come last. From this, I conclude that the register allocator prefers an order such that callee-saved-registers are allocated after non-callee-saved-registers to avoid having to spill the CSR. This sorting characteristic occurs only as a tie breaker to the Weight calculation. This is a good idea since the weight calculation is pretty complex and I'm sure it is a pretty stable metric. I think its pretty reasonable to agree that whether a register is callee-saved or not is a better tie breaker than register ID. I think this is evident by the test diff, since the changes all seem to have no impact or improve the register allocation.

michaelmaitland added the llvm:regalloc label Mar 13, 2025

michaelmaitland requested review from jayfoad, arsenm, qcolombet and topperc March 13, 2025 00:09

llvmbot added backend:AArch64 backend:X86 labels Mar 13, 2025

michaelmaitland requested a review from bevin-hansson March 13, 2025 00:09

michaelmaitland commented Mar 13, 2025

View reviewed changes

michaelmaitland force-pushed the csr-copy-hint branch from 2421d8c to a567100 Compare March 13, 2025 00:13

arsenm approved these changes Mar 13, 2025

View reviewed changes

Precommit test case

78dc8f8

fixup! add test

0d0ad25

michaelmaitland commented Mar 28, 2025

View reviewed changes

arsenm reviewed Mar 28, 2025

View reviewed changes

michaelmaitland added 2 commits March 28, 2025 06:25

fixup! test case change

25deaf7

fixup! cleanup

6ef87cd

michaelmaitland requested a review from arsenm March 31, 2025 14:23

fixup! fix test run line

76f0138

arsenm approved these changes Apr 13, 2025

View reviewed changes

michaelmaitland merged commit 74e8f29 into llvm:main Apr 14, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RegAlloc] Sort CopyHint by IsCSR #131046

[RegAlloc] Sort CopyHint by IsCSR #131046

michaelmaitland commented Mar 13, 2025 •

edited

Loading

llvmbot commented Mar 13, 2025 •

edited

Loading

michaelmaitland Mar 13, 2025

arsenm left a comment

michaelmaitland commented Mar 13, 2025 •

edited

Loading

michaelmaitland commented Mar 24, 2025

arsenm commented Mar 25, 2025

michaelmaitland commented Mar 25, 2025

michaelmaitland Mar 28, 2025

arsenm Mar 28, 2025

michaelmaitland Mar 28, 2025

		@@ -0,0 +1,22 @@
		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
		# RUN: llc -mtriple=arm64-eabi -mattr=v8.3a \ -stop-after=virtregmap -o - %s \| FileCheck %s

[RegAlloc] Sort CopyHint by IsCSR #131046

[RegAlloc] Sort CopyHint by IsCSR #131046

Conversation

michaelmaitland commented Mar 13, 2025 • edited Loading

llvmbot commented Mar 13, 2025 • edited Loading

michaelmaitland Mar 13, 2025

Choose a reason for hiding this comment

arsenm left a comment

Choose a reason for hiding this comment

michaelmaitland commented Mar 13, 2025 • edited Loading

michaelmaitland commented Mar 24, 2025

arsenm commented Mar 25, 2025

michaelmaitland commented Mar 25, 2025

michaelmaitland Mar 28, 2025

Choose a reason for hiding this comment

arsenm Mar 28, 2025

Choose a reason for hiding this comment

michaelmaitland Mar 28, 2025

Choose a reason for hiding this comment

michaelmaitland commented Mar 13, 2025 •

edited

Loading

llvmbot commented Mar 13, 2025 •

edited

Loading

michaelmaitland commented Mar 13, 2025 •

edited

Loading