[AMDGPU] Ensure non-reserved CSR spilled regs are live-in #146427

macurtis-amd · 2025-06-30T21:57:35Z

Fixes:

*** Bad machine code: Using an undefined physical register ***
- function:    widget
- basic block: %bb.0 bb (0x564092cbe140)
- instruction: $vgpr63 = V_ACCVGPR_READ_B32_e64 killed $agpr13, implicit $exec
- operand 1:   killed $agpr13
LLVM ERROR: Found 1 machine code errors.

The detailed sequence of events that led to this assert:

MachineVerifier fails because $agpr13 is not defined on line 19 below:

 2:   successors: %bb.1(0x80000000); %bb.1(100.00%)
 3:   liveins: $agpr14, $agpr15, $sgpr12, $sgpr13, $sgpr14, \
 4:       $sgpr15, $sgpr30, $sgpr31, $sgpr34, $sgpr35, \
 5:       $sgpr36, $sgpr37, $sgpr38, $sgpr39, $sgpr48, \
 6:       $sgpr49, $sgpr50, $sgpr51, $sgpr52, $sgpr53, \
 7:       $sgpr54, $sgpr55, $sgpr64, $sgpr65, $sgpr66, \
 8:       $sgpr67, $sgpr68, $sgpr69, $sgpr70, $sgpr71, \
 9:       $sgpr80, $sgpr81, $sgpr82, $sgpr83, $sgpr84, \
10:       $sgpr85, $sgpr86, $sgpr87, $sgpr96, $sgpr97, \
11:       $sgpr98, $sgpr99, $vgpr0, $vgpr31, $vgpr40, $vgpr41, \
12:       $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, \
13:       $sgpr10_sgpr11
14:   $sgpr16 = COPY $sgpr33
15:   $sgpr33 = frame-setup COPY $sgpr32
16:   $sgpr18_sgpr19 = S_XOR_SAVEEXEC_B64 -1, \
17:       implicit-def $exec, implicit-def dead $scc, \
18:       implicit $exec
19:   $vgpr63 = V_ACCVGPR_READ_B32_e64 killed $agpr13, \
20:       implicit $exec
21:   BUFFER_STORE_DWORD_OFFSET killed $vgpr63, \
22:       $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, \
23:       implicit $exec :: (store (s32) into %stack.38, \
24:       addrspace 5)
25:   ...
26:   $vgpr43 = IMPLICIT_DEF
27:   $vgpr43 = SI_SPILL_S32_TO_VGPR $sgpr15, 0, \
28:       killed $vgpr43(tied-def 0)
29:   $vgpr43 = SI_SPILL_S32_TO_VGPR $sgpr14, 1, \
30:       killed $vgpr43(tied-def 0)
31:   $sgpr100_sgpr101 = S_OR_SAVEEXEC_B64 -1, \
32:       implicit-def $exec, implicit-def dead $scc, \
33:       implicit $exec
34:   renamable $agpr13 = COPY killed $vgpr43, implicit $exec

That instruction is created by emitCSRSpillStores (called by SIFrameLowering::emitPrologue) because $agpr13 is in WWMSpills.

See lines 982, 998, and 993 below.

977:   // Spill Whole-Wave Mode VGPRs. Save only the inactive lanes of the scratch
978:   // registers. However, save all lanes of callee-saved VGPRs. Due to this, we
979:   // might end up flipping the EXEC bits twice.
980:   Register ScratchExecCopy;
981:   SmallVector<std::pair<Register, int>, 2> WWMCalleeSavedRegs, WWMScratchRegs;
982:   FuncInfo->splitWWMSpillRegisters(MF, WWMCalleeSavedRegs, WWMScratchRegs);
983:   if (!WWMScratchRegs.empty())
984:     ScratchExecCopy =
985:         buildScratchExecCopy(LiveUnits, MF, MBB, MBBI, DL,
986:                              /*IsProlog*/ true, /*EnableInactiveLanes*/ true);
987:
988:   auto StoreWWMRegisters =
989:       [&](SmallVectorImpl<std::pair<Register, int>> &WWMRegs) {
990:         for (const auto &Reg : WWMRegs) {
991:           Register VGPR = Reg.first;
992:           int FI = Reg.second;
993:           buildPrologSpill(ST, TRI, *FuncInfo, LiveUnits, MF, MBB, MBBI, DL,
994:                            VGPR, FI, FrameReg);
995:         }
996:       };
997:
998:   StoreWWMRegisters(WWMScratchRegs);

$agpr13 got added to WWMSpills by SILowerWWMCopies::run as it processed the WWM_COPY on line 3 below (corresponds to line 34 above in point #1):

1: %45:vgpr_32 = SI_SPILL_S32_TO_VGPR $sgpr15, 0, %45:vgpr_32(tied-def 0)
2: %45:vgpr_32 = SI_SPILL_S32_TO_VGPR $sgpr14, 1, %45:vgpr_32(tied-def 0)
3: %44:av_32 = WWM_COPY %45:vgpr_32

llvmbot · 2025-06-30T21:58:17Z

@llvm/pr-subscribers-backend-amdgpu

Author: None (macurtis-amd)

Changes

Fixes:

*** Bad machine code: Using an undefined physical register ***
- function:    widget
- basic block: %bb.0 bb (0x564092cbe140)
- instruction: $vgpr63 = V_ACCVGPR_READ_B32_e64 killed $agpr13, implicit $exec
- operand 1:   killed $agpr13
LLVM ERROR: Found 1 machine code errors.

The detailed sequence of events that led to this assert:

MachineVerifier fails because $agpr13 is not defined on line 19 below:

 2:   successors: %bb.1(0x80000000); %bb.1(100.00%)
 3:   liveins: $agpr14, $agpr15, $sgpr12, $sgpr13, $sgpr14, \
 4:       $sgpr15, $sgpr30, $sgpr31, $sgpr34, $sgpr35, \
 5:       $sgpr36, $sgpr37, $sgpr38, $sgpr39, $sgpr48, \
 6:       $sgpr49, $sgpr50, $sgpr51, $sgpr52, $sgpr53, \
 7:       $sgpr54, $sgpr55, $sgpr64, $sgpr65, $sgpr66, \
 8:       $sgpr67, $sgpr68, $sgpr69, $sgpr70, $sgpr71, \
 9:       $sgpr80, $sgpr81, $sgpr82, $sgpr83, $sgpr84, \
10:       $sgpr85, $sgpr86, $sgpr87, $sgpr96, $sgpr97, \
11:       $sgpr98, $sgpr99, $vgpr0, $vgpr31, $vgpr40, $vgpr41, \
12:       $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, \
13:       $sgpr10_sgpr11
14:   $sgpr16 = COPY $sgpr33
15:   $sgpr33 = frame-setup COPY $sgpr32
16:   $sgpr18_sgpr19 = S_XOR_SAVEEXEC_B64 -1, \
17:       implicit-def $exec, implicit-def dead $scc, \
18:       implicit $exec
19:   $vgpr63 = V_ACCVGPR_READ_B32_e64 killed $agpr13, \
20:       implicit $exec
21:   BUFFER_STORE_DWORD_OFFSET killed $vgpr63, \
22:       $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, \
23:       implicit $exec :: (store (s32) into %stack.38, \
24:       addrspace 5)
25:   ...
26:   $vgpr43 = IMPLICIT_DEF
27:   $vgpr43 = SI_SPILL_S32_TO_VGPR $sgpr15, 0, \
28:       killed $vgpr43(tied-def 0)
29:   $vgpr43 = SI_SPILL_S32_TO_VGPR $sgpr14, 1, \
30:       killed $vgpr43(tied-def 0)
31:   $sgpr100_sgpr101 = S_OR_SAVEEXEC_B64 -1, \
32:       implicit-def $exec, implicit-def dead $scc, \
33:       implicit $exec
34:   renamable $agpr13 = COPY killed $vgpr43, implicit $exec

That instruction is created by emitCSRSpillStores (called by SIFrameLowering::emitPrologue) because $agpr13 is in WWMSpills.

See lines 982, 998, and 993 below.

977:   // Spill Whole-Wave Mode VGPRs. Save only the inactive lanes of the scratch
978:   // registers. However, save all lanes of callee-saved VGPRs. Due to this, we
979:   // might end up flipping the EXEC bits twice.
980:   Register ScratchExecCopy;
981:   SmallVector&lt;std::pair&lt;Register, int&gt;, 2&gt; WWMCalleeSavedRegs, WWMScratchRegs;
982:   FuncInfo-&gt;splitWWMSpillRegisters(MF, WWMCalleeSavedRegs, WWMScratchRegs);
983:   if (!WWMScratchRegs.empty())
984:     ScratchExecCopy =
985:         buildScratchExecCopy(LiveUnits, MF, MBB, MBBI, DL,
986:                              /*IsProlog*/ true, /*EnableInactiveLanes*/ true);
987:
988:   auto StoreWWMRegisters =
989:       [&amp;](SmallVectorImpl&lt;std::pair&lt;Register, int&gt;&gt; &amp;WWMRegs) {
990:         for (const auto &amp;Reg : WWMRegs) {
991:           Register VGPR = Reg.first;
992:           int FI = Reg.second;
993:           buildPrologSpill(ST, TRI, *FuncInfo, LiveUnits, MF, MBB, MBBI, DL,
994:                            VGPR, FI, FrameReg);
995:         }
996:       };
997:
998:   StoreWWMRegisters(WWMScratchRegs);

$agpr13 got added to WWMSpills by SILowerWWMCopies::run as it processed the WWM_COPY on line 3 below (corresponds to line 34 above in point #1):

1: %45:vgpr_32 = SI_SPILL_S32_TO_VGPR $sgpr15, 0, %45:vgpr_32(tied-def 0)
2: %45:vgpr_32 = SI_SPILL_S32_TO_VGPR $sgpr14, 1, %45:vgpr_32(tied-def 0)
3: %44:av_32 = WWM_COPY %45:vgpr_32

Full diff: https://github.com/llvm/llvm-project/pull/146427.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFrameLowering.cpp (+7)
(added) llvm/test/CodeGen/AMDGPU/bug-undef-spilled-agpr.ll (+126)

diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index 6a3867937d57f..15ad1470036d2 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -973,6 +973,7 @@ void SIFrameLowering::emitCSRSpillStores(
   const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
   const SIInstrInfo *TII = ST.getInstrInfo();
   const SIRegisterInfo &TRI = TII->getRegisterInfo();
+  MachineRegisterInfo &MRI = MF.getRegInfo();
 
   // Spill Whole-Wave Mode VGPRs. Save only the inactive lanes of the scratch
   // registers. However, save all lanes of callee-saved VGPRs. Due to this, we
@@ -995,6 +996,12 @@ void SIFrameLowering::emitCSRSpillStores(
         }
       };
 
+  for (const auto &Reg : WWMScratchRegs) {
+    if (!MRI.isReserved(Reg.first)) {
+      MRI.addLiveIn(Reg.first);
+      MBB.addLiveIn(Reg.first);
+    }
+  }
   StoreWWMRegisters(WWMScratchRegs);
   if (!WWMCalleeSavedRegs.empty()) {
     if (ScratchExecCopy) {
diff --git a/llvm/test/CodeGen/AMDGPU/bug-undef-spilled-agpr.ll b/llvm/test/CodeGen/AMDGPU/bug-undef-spilled-agpr.ll
new file mode 100644
index 0000000000000..ef2ae5ca5169b
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/bug-undef-spilled-agpr.ll
@@ -0,0 +1,126 @@
+; Just ensure that llc -O1 does not error out
+; RUN: llc -O1 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs %s -o - &>/dev/null
+
+define fastcc void @widget(i1 %arg) {
+bb:
+  br label %bb1
+
+bb1:                                              ; preds = %bb3, %bb
+  br i1 %arg, label %bb3, label %bb2
+
+bb2:                                              ; preds = %bb1
+  ret void
+
+bb3:                                              ; preds = %bb1
+  %call = call fastcc i1 @baz(i1 false, float 0.000000e+00, i1 false, float 0.000000e+00, i1 false, i1 false, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, ptr addrspace(5) null, i1 false, ptr null, ptr null, ptr null, ptr null, ptr null, ptr addrspace(5) null, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null, ptr addrspace(5) null)
+  br label %bb1
+}
+
+define fastcc i1 @baz(i1 %arg, float %arg1, i1 %arg2, float %arg3, i1 %arg4, i1 %arg5, float %arg6, float %arg7, float %arg8, float %arg9, ptr addrspace(5) %arg10, i1 %arg11, ptr %arg12, ptr %arg13, ptr %arg14, ptr %arg15, ptr %arg16, ptr addrspace(5) %arg17, ptr %arg18, ptr %arg19, ptr %arg20, ptr %arg21, ptr %arg22, ptr %arg23, ptr %arg24, ptr addrspace(5) %arg25) #0 {
+bb:
+  br i1 %arg, label %bb26, label %bb27
+
+bb26:                                             ; preds = %bb
+  ret i1 false
+
+bb27:                                             ; preds = %bb
+  br i1 %arg, label %bb29, label %bb28
+
+bb28:                                             ; preds = %bb27
+  unreachable
+
+bb29:                                             ; preds = %bb49, %bb47, %bb46, %bb39, %bb36, %bb27
+  br i1 %arg4, label %bb55, label %bb30
+
+bb30:                                             ; preds = %bb29
+  br i1 %arg5, label %bb31, label %bb32
+
+bb31:                                             ; preds = %bb30
+  store i1 false, ptr addrspace(5) null, align 2147483648
+  br label %bb55
+
+bb32:                                             ; preds = %bb30
+  store float %arg3, ptr addrspace(5) null, align 2147483648
+  store float %arg7, ptr addrspace(5) %arg10, align 2147483648
+  br i1 %arg2, label %bb34, label %bb33
+
+bb33:                                             ; preds = %bb32
+  %fcmp = fcmp ogt float %arg6, 0.000000e+00
+  br i1 %fcmp, label %bb34, label %bb35
+
+bb34:                                             ; preds = %bb33, %bb32
+  br i1 %arg11, label %bb37, label %bb36
+
+bb35:                                             ; preds = %bb33
+  store float 0.000000e+00, ptr addrspace(5) null, align 2147483648
+  ret i1 false
+
+bb36:                                             ; preds = %bb34
+  store i32 1, ptr addrspace(5) null, align 2147483648
+  br label %bb29
+
+bb37:                                             ; preds = %bb34
+  %load = load i8, ptr %arg12, align 2
+  %trunc = trunc i8 %load to i1
+  br i1 %trunc, label %bb38, label %bb54
+
+bb38:                                             ; preds = %bb37
+  br i1 %arg4, label %bb39, label %bb53
+
+bb39:                                             ; preds = %bb38
+  store float %arg1, ptr addrspace(5) null, align 2147483648
+  %load40 = load float, ptr %arg15, align 8
+  call void @llvm.memcpy.p5.p0.i64(ptr addrspace(5) %arg25, ptr %arg24, i64 12, i1 false)
+  %load41 = load float, ptr %arg16, align 4
+  call void @llvm.memcpy.p5.p0.i64(ptr addrspace(5) %arg17, ptr null, i64 36, i1 false)
+  %load42 = load float, ptr %arg18, align 4
+  %load43 = load float, ptr %arg19, align 4
+  store float 0x7FF8000000000000, ptr addrspace(5) null, align 2147483648
+  %load44 = load float, ptr %arg14, align 16
+  store float %load44, ptr addrspace(5) null, align 2147483648
+  %fcmp45 = fcmp ole float %arg9, 0.000000e+00
+  br i1 %fcmp45, label %bb29, label %bb46
+
+bb46:                                             ; preds = %bb39
+  %fsub = fsub float %arg8, %load40
+  store float %fsub, ptr addrspace(5) null, align 2147483648
+  %fadd = fadd float %load42, %load43
+  br i1 %arg, label %bb29, label %bb47
+
+bb47:                                             ; preds = %bb46
+  br i1 %arg, label %bb29, label %bb48
+
+bb48:                                             ; preds = %bb47
+  br i1 %arg, label %bb49, label %bb52
+
+bb49:                                             ; preds = %bb48
+  store float 0.000000e+00, ptr %arg23, align 4
+  store float 0.000000e+00, ptr %arg22, align 8
+  store float %fadd, ptr addrspace(5) null, align 2147483648
+  %load50 = load float, ptr %arg20, align 4
+  %fdiv = fdiv float %load41, %load50
+  store float %fdiv, ptr addrspace(5) null, align 2147483648
+  %load51 = load float, ptr %arg13, align 16
+  store float %load51, ptr addrspace(5) null, align 2147483648
+  store float 1.000000e+00, ptr %arg21, align 4
+  br label %bb29
+
+bb52:                                             ; preds = %bb48
+  unreachable
+
+bb53:                                             ; preds = %bb38
+  ret i1 false
+
+bb54:                                             ; preds = %bb37
+  ret i1 true
+
+bb55:                                             ; preds = %bb31, %bb29
+  %load56 = load i1, ptr addrspace(5) null, align 2147483648
+  ret i1 %load56
+}
+
+; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
+declare void @llvm.memcpy.p5.p0.i64(ptr addrspace(5) noalias writeonly captures(none), ptr noalias readonly captures(none), i64, i1 immarg) #1
+
+attributes #0 = { "denormal-fp-math-f32"="preserve-sign,preserve-sign" }
+attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }

llvm/test/CodeGen/AMDGPU/bug-undef-spilled-agpr.ll

shiltian · 2025-06-30T23:25:47Z

llvm/test/CodeGen/AMDGPU/bug-undef-spilled-agpr.ll

+  br i1 %arg5, label %bb31, label %bb32
+
+bb31:                                             ; preds = %bb30
+  store i1 false, ptr addrspace(5) null, align 2147483648


I understand this IR is from llvm-reduce. Can you try to avoid UB (those load/store null`). The alignment also needs to be fixed as well.

Is there a way to tell llvm-reduce to not generate UB?

Unfortunately no (at this moment). Those UBs are from removing the corresponding instructions. When an instruction is removed, all its uses are replace with "default" value, which is null for pointers. I was in the same shoes before.

Technically by the current IR rules this is not UB but you should fix it anyway

shiltian · 2025-06-30T23:26:10Z

llvm/test/CodeGen/AMDGPU/bug-undef-spilled-agpr.ll

@@ -0,0 +1,126 @@
+; Just ensure that llc -O1 does not error out
+; RUN: llc -O1 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs %s -o - &>/dev/null


wonder if it is possible to reduce the IR further

This was the best I could do with llvm-reduce. Do you have other suggestions?

Not sure if that's gonna work but maybe use the IR right before amdgpu-isel as the starting point, and then just run amdgpu-isel in the test script.

Hmm. Tried that and a few other things, but could not get it to work.

For case like this where you merely need to trigger spilling, it's easier to write an artificial testcase with inline asm to ensure no physical registers are available at some point

Should drop the -verify-machineinstrs, check the output, and still try to make a synthetic test

The failure in the description seems to be only observed in machine instruction verification?

The failure in the description seems to be only observed in machine instruction verification?

That is correct.

Should drop the -verify-machineinstrs, check the output, and still try to make a synthetic test

@arsenm I decided to create an mir test instead, which reduced fairly nicely IMO and is more targeted to the (interdependent) codegen passes conspiring to produce the failure. Is this ok?

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

shiltian

LGTM

arsenm · 2025-07-02T02:44:16Z

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

@@ -995,6 +996,12 @@ void SIFrameLowering::emitCSRSpillStores(
        }
      };

+  for (const Register &Reg : make_first_range(WWMScratchRegs)) {


Suggested change

for (const Register &Reg : make_first_range(WWMScratchRegs)) {

for (const Register Reg : make_first_range(WWMScratchRegs)) {

arsenm · 2025-07-02T02:45:43Z

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

+    if (!MRI.isReserved(Reg)) {
+      MRI.addLiveIn(Reg);
+      MBB.addLiveIn(Reg);
+    }


I thought all WWM registers should be reserved at this point, this is papering over not adding them to reserved?

We initially had the code that adds the wwm-regs conservatively to all Blocks LiveIn sets. We removed them earlier with one of the wwm regalloc/spill patches (I can't recollect the exact reason). I had this impression that we should at least add them to the entry block LiveIn set.

Reg is $agpr13.
Where would you expect it to get reserved? SIMachineFunctionInfo::allocateWWMSpill()?

arsenm · 2025-07-02T02:46:25Z

llvm/test/CodeGen/AMDGPU/bug-undef-spilled-agpr.ll

@@ -0,0 +1,126 @@
+; Just ensure that llc -O1 does not error out
+; RUN: llc -O1 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs %s -o - &>/dev/null


For case like this where you merely need to trigger spilling, it's easier to write an artificial testcase with inline asm to ensure no physical registers are available at some point

cdevadas · 2025-07-02T10:06:16Z

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

+    if (!MRI.isReserved(Reg)) {
+      MRI.addLiveIn(Reg);
+      MBB.addLiveIn(Reg);
+    }


We initially had the code that adds the wwm-regs conservatively to all Blocks LiveIn sets. We removed them earlier with one of the wwm regalloc/spill patches (I can't recollect the exact reason). I had this impression that we should at least add them to the entry block LiveIn set.

Fixes: ``` *** Bad machine code: Using an undefined physical register *** - function: widget - basic block: %bb.0 bb (0x564092cbe140) - instruction: $vgpr63 = V_ACCVGPR_READ_B32_e64 killed $agpr13, implicit $exec - operand 1: killed $agpr13 LLVM ERROR: Found 1 machine code errors. ``` The detailed sequence of events that led to this change: 1. MachineVerifier fails because $agpr13 is not defined on line 19 below: ``` 1: bb.0.bb: 2: successors: %bb.1(0x80000000); %bb.1(100.00%) 3: liveins: $agpr14, $agpr15, $sgpr12, $sgpr13, $sgpr14, \ 4: $sgpr15, $sgpr30, $sgpr31, $sgpr34, $sgpr35, \ 5: $sgpr36, $sgpr37, $sgpr38, $sgpr39, $sgpr48, \ 6: $sgpr49, $sgpr50, $sgpr51, $sgpr52, $sgpr53, \ 7: $sgpr54, $sgpr55, $sgpr64, $sgpr65, $sgpr66, \ 8: $sgpr67, $sgpr68, $sgpr69, $sgpr70, $sgpr71, \ 9: $sgpr80, $sgpr81, $sgpr82, $sgpr83, $sgpr84, \ 10: $sgpr85, $sgpr86, $sgpr87, $sgpr96, $sgpr97, \ 11: $sgpr98, $sgpr99, $vgpr0, $vgpr31, $vgpr40, $vgpr41, \ 12: $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, \ 13: $sgpr10_sgpr11 14: $sgpr16 = COPY $sgpr33 15: $sgpr33 = frame-setup COPY $sgpr32 16: $sgpr18_sgpr19 = S_XOR_SAVEEXEC_B64 -1, \ 17: implicit-def $exec, implicit-def dead $scc, \ 18: implicit $exec 19: $vgpr63 = V_ACCVGPR_READ_B32_e64 killed $agpr13, \ 20: implicit $exec 21: BUFFER_STORE_DWORD_OFFSET killed $vgpr63, \ 22: $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, \ 23: implicit $exec :: (store (s32) into %stack.38, \ 24: addrspace 5) 25: ... 26: $vgpr43 = IMPLICIT_DEF 27: $vgpr43 = SI_SPILL_S32_TO_VGPR $sgpr15, 0, \ 28: killed $vgpr43(tied-def 0) 29: $vgpr43 = SI_SPILL_S32_TO_VGPR $sgpr14, 1, \ 30: killed $vgpr43(tied-def 0) 31: $sgpr100_sgpr101 = S_OR_SAVEEXEC_B64 -1, \ 32: implicit-def $exec, implicit-def dead $scc, \ 33: implicit $exec 34: renamable $agpr13 = COPY killed $vgpr43, implicit $exec ``` 2. That instruction is created by emitCSRSpillStores (called by SIFrameLowering::emitPrologue) because $agpr13 is in WWMSpills. See lines 982, 998, and 993 below. ``` 977: // Spill Whole-Wave Mode VGPRs. Save only the inactive lanes of the scratch 978: // registers. However, save all lanes of callee-saved VGPRs. Due to this, we 979: // might end up flipping the EXEC bits twice. 980: Register ScratchExecCopy; 981: SmallVector<std::pair<Register, int>, 2> WWMCalleeSavedRegs, WWMScratchRegs; 982: FuncInfo->splitWWMSpillRegisters(MF, WWMCalleeSavedRegs, WWMScratchRegs); 983: if (!WWMScratchRegs.empty()) 984: ScratchExecCopy = 985: buildScratchExecCopy(LiveUnits, MF, MBB, MBBI, DL, 986: /*IsProlog*/ true, /*EnableInactiveLanes*/ true); 987: 988: auto StoreWWMRegisters = 989: [&](SmallVectorImpl<std::pair<Register, int>> &WWMRegs) { 990: for (const auto &Reg : WWMRegs) { 991: Register VGPR = Reg.first; 992: int FI = Reg.second; 993: buildPrologSpill(ST, TRI, *FuncInfo, LiveUnits, MF, MBB, MBBI, DL, 994: VGPR, FI, FrameReg); 995: } 996: }; 997: 998: StoreWWMRegisters(WWMScratchRegs); ``` 3. $agpr13 got added to WWMSpills by SILowerWWMCopies::runOnMachineFunction as it processed the WWM_COPY on line 11 below (corresponds to line 34 above in point llvm#1): ``` 1: %45:vgpr_32 = SI_SPILL_S32_TO_VGPR $sgpr15, 0, %45:vgpr_32(tied-def 0) 2: %45:vgpr_32 = SI_SPILL_S32_TO_VGPR $sgpr14, 1, %45:vgpr_32(tied-def 0) 3: %44:av_32 = WWM_COPY %45:vgpr_32 ```

llvmbot added the backend:AMDGPU label Jun 30, 2025

macurtis-amd requested review from shiltian and arsenm June 30, 2025 22:06

shiltian reviewed Jun 30, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/bug-undef-spilled-agpr.ll Outdated Show resolved Hide resolved

shiltian reviewed Jun 30, 2025

View reviewed changes

jayfoad reviewed Jul 1, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp Outdated Show resolved Hide resolved

shiltian reviewed Jul 1, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp Outdated Show resolved Hide resolved

shiltian approved these changes Jul 1, 2025

View reviewed changes

arsenm requested a review from cdevadas July 2, 2025 02:44

arsenm reviewed Jul 2, 2025

View reviewed changes

cdevadas approved these changes Jul 2, 2025

View reviewed changes

macurtis-amd force-pushed the bug-undef-spilled-agpr branch from 979830c to b0b218a Compare July 7, 2025 13:53

macurtis-amd added 5 commits July 17, 2025 13:46

fixup! [AMDGPU] Ensure non-reserved CSR spilled regs are live-in

f0d9821

fixup! [AMDGPU] Ensure non-reserved CSR spilled regs are live-in

2a88280

fixup! [AMDGPU] Ensure non-reserved CSR spilled regs are live-in

0050630

fixup! [AMDGPU] Ensure non-reserved CSR spilled regs are live-in

9c457fd

macurtis-amd force-pushed the bug-undef-spilled-agpr branch from b0b218a to 9c457fd Compare July 17, 2025 18:52

		@@ -0,0 +1,126 @@
		; Just ensure that llc -O1 does not error out
		; RUN: llc -O1 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs %s -o - &>/dev/null

	for (const Register &Reg : make_first_range(WWMScratchRegs)) {
	for (const Register Reg : make_first_range(WWMScratchRegs)) {

[AMDGPU] Ensure non-reserved CSR spilled regs are live-in #146427

Are you sure you want to change the base?

[AMDGPU] Ensure non-reserved CSR spilled regs are live-in #146427

Conversation

macurtis-amd commented Jun 30, 2025

Uh oh!

llvmbot commented Jun 30, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!