Skip to content

Conversation

@broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented Jul 28, 2025

D16 pesudo instructions are introduced in true16 mode to represet a D16 load/store. In MC lowering, the pesudo instructions are lowered to the corresponding D16 Lo/Hi MC Inst respecting the register allocation.

However, the pesudo instruction has size 0 and cause an issue in the Inst size estimation. Use D16 Lo when calculating inst size

@broxigarchen broxigarchen force-pushed the main-fix-true16-d16-size branch from 9f56761 to ca04b01 Compare July 28, 2025 21:18
@broxigarchen broxigarchen marked this pull request as ready for review July 28, 2025 21:21
@broxigarchen broxigarchen requested a review from Sisyph July 28, 2025 21:21
@llvmbot
Copy link
Member

llvmbot commented Jul 28, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

D16 pesudo instructions are introduced in true16 mode to represet a D16 load/store. In MC lowering, the pesudo instructions are lowered to the corresponding D16 Lo/Hi MC Inst respecting the register allocation.

However, the pesudo instruction has size 0 and cause an issue in the Inst size estimation. Use D16 Lo when calculating inst size


Full diff: https://github.com/llvm/llvm-project/pull/151042.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+9)
  • (added) llvm/test/CodeGen/AMDGPU/branch-relaxation-inst-size-gfx11.ll (+51)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 2aa6b4e82f9d5..847b2d47c5425 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -9269,6 +9269,15 @@ unsigned SIInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
     return 8 + 4 * ((RSrcIdx - VAddr0Idx + 2) / 4);
   }
 
+  // if D16 Pseudo inst, get correct MC code size
+  const auto *D16Info = AMDGPU::getT16D16Helper(Opc);
+  if (D16Info) {
+    // assume d16_lo/hi inst are always in same size
+    unsigned LoInstOpcode = D16Info->LoOp;
+    const MCInstrDesc &Desc = getMCOpcodeFromPseudo(LoInstOpcode);
+    DescSize = Desc.getSize();
+  }
+
   switch (Opc) {
   case TargetOpcode::BUNDLE:
     return getInstBundleSize(MI);
diff --git a/llvm/test/CodeGen/AMDGPU/branch-relaxation-inst-size-gfx11.ll b/llvm/test/CodeGen/AMDGPU/branch-relaxation-inst-size-gfx11.ll
new file mode 100644
index 0000000000000..dd389375b0d77
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/branch-relaxation-inst-size-gfx11.ll
@@ -0,0 +1,51 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-s-branch-bits=4 -mattr=+real-true16 < %s | FileCheck -enable-var-scope -check-prefixes=GFX11 %s
+
+; Make sure the inst size estimate for D16 pseudo insts are not 0
+
+define amdgpu_kernel void @long_forward_branch_gfx11plus(ptr addrspace(1) %in, ptr addrspace(1) %out, i32 %cnd) #0 {
+; GFX11-LABEL: long_forward_branch_gfx11plus:
+; GFX11:       ; %bb.0: ; %bb0
+; GFX11-NEXT:    s_load_b32 s0, s[4:5], 0x34
+; GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX11-NEXT:    s_cmp_eq_u32 s0, 0
+; GFX11-NEXT:    s_cbranch_scc0 .LBB0_1
+; GFX11-NEXT:  ; %bb.3: ; %bb0
+; GFX11-NEXT:    s_getpc_b64 s[6:7]
+; GFX11-NEXT:  .Lpost_getpc0:
+; GFX11-NEXT:    s_add_u32 s6, s6, (.LBB0_2-.Lpost_getpc0)&4294967295
+; GFX11-NEXT:    s_addc_u32 s7, s7, (.LBB0_2-.Lpost_getpc0)>>32
+; GFX11-NEXT:    s_setpc_b64 s[6:7]
+; GFX11-NEXT:  .LBB0_1: ; %bb2
+; GFX11-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
+; GFX11-NEXT:    v_mov_b32_e32 v1, 0
+; GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX11-NEXT:    s_clause 0x1
+; GFX11-NEXT:    global_load_d16_b16 v0, v1, s[0:1]
+; GFX11-NEXT:    global_load_d16_hi_b16 v0, v1, s[0:1] offset:2
+; GFX11-NEXT:    s_waitcnt vmcnt(1)
+; GFX11-NEXT:    global_store_b16 v1, v0, s[2:3]
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
+; GFX11-NEXT:    global_store_d16_hi_b16 v1, v0, s[2:3] offset:2
+; GFX11-NEXT:  .LBB0_2: ; %bb3
+; GFX11-NEXT:    s_endpgm
+bb0:
+  ;%idx = call i32 @llvm.amdgcn.workitem.id.x()
+  %gep0 = getelementptr inbounds i16, ptr addrspace(1) %in, i32 0
+  %gep1 = getelementptr inbounds i16, ptr addrspace(1) %in, i32 1
+  %out0 = getelementptr inbounds i16, ptr addrspace(1) %out, i32 0
+  %out1 = getelementptr inbounds i16, ptr addrspace(1) %out, i32 1
+  %cmp = icmp eq i32 %cnd, 0
+  br i1 %cmp, label %bb3, label %bb2 ; +9 dword branch
+bb2:
+    ; Estimated as 32-bytes on gfx11 (requiring a long branch)
+  %load0 = load i16, ptr addrspace(1) %gep0
+  %load1 = load i16, ptr addrspace(1) %gep1
+  store i16 %load0, ptr addrspace(1) %out0
+  store i16 %load1, ptr addrspace(1) %out1
+  br label %bb3
+bb3:
+  ret void
+}
+
+declare i32 @llvm.amdgcn.workitem.id.x() #1

Comment on lines 9272 to 9279
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this down into the default case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@broxigarchen broxigarchen force-pushed the main-fix-true16-d16-size branch from ca04b01 to c851e5d Compare July 29, 2025 14:26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should be capitalized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@broxigarchen broxigarchen force-pushed the main-fix-true16-d16-size branch from c851e5d to d7e33a7 Compare July 29, 2025 14:57
@broxigarchen broxigarchen force-pushed the main-fix-true16-d16-size branch from d7e33a7 to 8b0da06 Compare July 29, 2025 15:30
@broxigarchen broxigarchen merged commit 2a3f72e into llvm:main Jul 29, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants