-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[AMDGPU][CodeGen][True16] Correct size calculation for d16 insts #151042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9f56761 to
ca04b01
Compare
|
@llvm/pr-subscribers-backend-amdgpu Author: Brox Chen (broxigarchen) ChangesD16 pesudo instructions are introduced in true16 mode to represet a D16 load/store. In MC lowering, the pesudo instructions are lowered to the corresponding D16 Lo/Hi MC Inst respecting the register allocation. However, the pesudo instruction has size 0 and cause an issue in the Inst size estimation. Use D16 Lo when calculating inst size Full diff: https://github.com/llvm/llvm-project/pull/151042.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 2aa6b4e82f9d5..847b2d47c5425 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -9269,6 +9269,15 @@ unsigned SIInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
return 8 + 4 * ((RSrcIdx - VAddr0Idx + 2) / 4);
}
+ // if D16 Pseudo inst, get correct MC code size
+ const auto *D16Info = AMDGPU::getT16D16Helper(Opc);
+ if (D16Info) {
+ // assume d16_lo/hi inst are always in same size
+ unsigned LoInstOpcode = D16Info->LoOp;
+ const MCInstrDesc &Desc = getMCOpcodeFromPseudo(LoInstOpcode);
+ DescSize = Desc.getSize();
+ }
+
switch (Opc) {
case TargetOpcode::BUNDLE:
return getInstBundleSize(MI);
diff --git a/llvm/test/CodeGen/AMDGPU/branch-relaxation-inst-size-gfx11.ll b/llvm/test/CodeGen/AMDGPU/branch-relaxation-inst-size-gfx11.ll
new file mode 100644
index 0000000000000..dd389375b0d77
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/branch-relaxation-inst-size-gfx11.ll
@@ -0,0 +1,51 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-s-branch-bits=4 -mattr=+real-true16 < %s | FileCheck -enable-var-scope -check-prefixes=GFX11 %s
+
+; Make sure the inst size estimate for D16 pseudo insts are not 0
+
+define amdgpu_kernel void @long_forward_branch_gfx11plus(ptr addrspace(1) %in, ptr addrspace(1) %out, i32 %cnd) #0 {
+; GFX11-LABEL: long_forward_branch_gfx11plus:
+; GFX11: ; %bb.0: ; %bb0
+; GFX11-NEXT: s_load_b32 s0, s[4:5], 0x34
+; GFX11-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-NEXT: s_cmp_eq_u32 s0, 0
+; GFX11-NEXT: s_cbranch_scc0 .LBB0_1
+; GFX11-NEXT: ; %bb.3: ; %bb0
+; GFX11-NEXT: s_getpc_b64 s[6:7]
+; GFX11-NEXT: .Lpost_getpc0:
+; GFX11-NEXT: s_add_u32 s6, s6, (.LBB0_2-.Lpost_getpc0)&4294967295
+; GFX11-NEXT: s_addc_u32 s7, s7, (.LBB0_2-.Lpost_getpc0)>>32
+; GFX11-NEXT: s_setpc_b64 s[6:7]
+; GFX11-NEXT: .LBB0_1: ; %bb2
+; GFX11-NEXT: s_load_b128 s[0:3], s[4:5], 0x24
+; GFX11-NEXT: v_mov_b32_e32 v1, 0
+; GFX11-NEXT: s_waitcnt lgkmcnt(0)
+; GFX11-NEXT: s_clause 0x1
+; GFX11-NEXT: global_load_d16_b16 v0, v1, s[0:1]
+; GFX11-NEXT: global_load_d16_hi_b16 v0, v1, s[0:1] offset:2
+; GFX11-NEXT: s_waitcnt vmcnt(1)
+; GFX11-NEXT: global_store_b16 v1, v0, s[2:3]
+; GFX11-NEXT: s_waitcnt vmcnt(0)
+; GFX11-NEXT: global_store_d16_hi_b16 v1, v0, s[2:3] offset:2
+; GFX11-NEXT: .LBB0_2: ; %bb3
+; GFX11-NEXT: s_endpgm
+bb0:
+ ;%idx = call i32 @llvm.amdgcn.workitem.id.x()
+ %gep0 = getelementptr inbounds i16, ptr addrspace(1) %in, i32 0
+ %gep1 = getelementptr inbounds i16, ptr addrspace(1) %in, i32 1
+ %out0 = getelementptr inbounds i16, ptr addrspace(1) %out, i32 0
+ %out1 = getelementptr inbounds i16, ptr addrspace(1) %out, i32 1
+ %cmp = icmp eq i32 %cnd, 0
+ br i1 %cmp, label %bb3, label %bb2 ; +9 dword branch
+bb2:
+ ; Estimated as 32-bytes on gfx11 (requiring a long branch)
+ %load0 = load i16, ptr addrspace(1) %gep0
+ %load1 = load i16, ptr addrspace(1) %gep1
+ store i16 %load0, ptr addrspace(1) %out0
+ store i16 %load1, ptr addrspace(1) %out1
+ br label %bb3
+bb3:
+ ret void
+}
+
+declare i32 @llvm.amdgcn.workitem.id.x() #1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this down into the default case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ca04b01 to
c851e5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should be capitalized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
c851e5d to
d7e33a7
Compare
d7e33a7 to
8b0da06
Compare
D16 pesudo instructions are introduced in true16 mode to represet a D16 load/store. In MC lowering, the pesudo instructions are lowered to the corresponding D16 Lo/Hi MC Inst respecting the register allocation.
However, the pesudo instruction has size 0 and cause an issue in the Inst size estimation. Use D16 Lo when calculating inst size