Skip to content

[AMDGPU][True16][CodeGen] legalize 16bit and 32bit use-def chain for moveToVALU in si-fix-sgpr-lowering #138734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 4, 2025

Conversation

broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented May 6, 2025

Two changes in this patch:

  1. Covered another case in legalizeOperandVALUt16 functions and the COPY lowering, when SALU16 is used by SALU32, need to insert a reg_sequence after moved to valu (previously only considered SALU32 used by SALU16 case)
  2. Moved the useMI analysis into addUsersToMoveVALUList. Legalize the targetted operand when needed.

Turn on frem test with true16 mode for gfx1150 which is failing before this patch. A few bitcast tests also impacted by this change with some v_mov being replaced to dual mov

@broxigarchen broxigarchen force-pushed the fix-frem-gfx1150 branch 2 times, most recently from 59fc3e5 to 1135e8b Compare May 6, 2025 21:02
@broxigarchen broxigarchen changed the title check for vgpr16 putting into vgpr32 case in v2s lowering [AMDGPU][True16][CodeGen] mismatched reg size for moveToVALU in si-fix-sgpr-lowering May 6, 2025
@broxigarchen broxigarchen changed the title [AMDGPU][True16][CodeGen] mismatched reg size for moveToVALU in si-fix-sgpr-lowering [AMDGPU][True16][CodeGen] legalize vgpr16 to sreg32 use-def chain for moveToVALU in si-fix-sgpr-lowering May 6, 2025
@broxigarchen broxigarchen changed the title [AMDGPU][True16][CodeGen] legalize vgpr16 to sreg32 use-def chain for moveToVALU in si-fix-sgpr-lowering [AMDGPU][True16][CodeGen] legalize 16bit and 32bit use-def chain for moveToVALU in si-fix-sgpr-lowering May 7, 2025
@broxigarchen broxigarchen marked this pull request as ready for review May 7, 2025 14:10
@llvmbot
Copy link
Member

llvmbot commented May 7, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

Patch is 54.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138734.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+42-11)
  • (modified) llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir (+66)
  • (modified) llvm/test/CodeGen/AMDGPU/frem.ll (+503-248)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index e6d54860df221..0cabf09ec7f21 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -7235,24 +7235,44 @@ bool SIInstrWorklist::isDeferred(MachineInstr *MI) {
   return DeferredList.contains(MI);
 }
 
-// 16bit SALU use sgpr32. If a 16bit SALU get lowered to VALU in true16 mode,
-// sgpr32 is replaced to vgpr32 which is illegal in t16 inst. Need to add
-// subreg access properly. This can be removed after we have sgpr16 in place
-void SIInstrInfo::legalizeOperandsVALUt16(MachineInstr &Inst,
+// legalize operand between 16bit and 32bit registers in v2s copy
+// lowering (change spgr to vgpr).
+// This is mainly caused by 16bit SALU and 16bit VALU using reg with different
+// size. Need to legalize the size of the operands during the vgpr lowering
+// chain. This can be removed after we have sgpr16 in place
+void SIInstrInfo::legalizeOperandsVALUt16(MachineInstr &MI,
                                           MachineRegisterInfo &MRI) const {
-  unsigned Opcode = Inst.getOpcode();
-  if (!AMDGPU::isTrue16Inst(Opcode) || !ST.useRealTrue16Insts())
+  if (!ST.useRealTrue16Insts())
     return;
 
-  for (MachineOperand &Op : Inst.explicit_operands()) {
+  unsigned Opcode = MI.getOpcode();
+  MachineBasicBlock *MBB = MI.getParent();
+
+  // legalize operands and check for size mismatch
+  for (MachineOperand &Op : MI.explicit_operands()) {
     unsigned OpIdx = Op.getOperandNo();
     if (!OpIdx)
       continue;
-    if (Op.isReg() && RI.isVGPR(MRI, Op.getReg())) {
+    if (Op.isReg() && Op.getReg().isVirtual() && RI.isVGPR(MRI, Op.getReg())) {
       unsigned RCID = get(Opcode).operands()[OpIdx].RegClass;
-      const TargetRegisterClass *RC = RI.getRegClass(RCID);
-      if (RI.getRegSizeInBits(*RC) == 16) {
+      const TargetRegisterClass *ExpectedRC = RI.getRegClass(RCID);
+      const TargetRegisterClass *RC = MRI.getRegClass(Op.getReg());
+      if (32 == RI.getRegSizeInBits(*RC) &&
+          16 == RI.getRegSizeInBits(*ExpectedRC)) {
         Op.setSubReg(AMDGPU::lo16);
+      } else if (16 == RI.getRegSizeInBits(*RC) &&
+                 32 == RI.getRegSizeInBits(*ExpectedRC)) {
+        const DebugLoc &DL = MI.getDebugLoc();
+        Register NewDstReg =
+            MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+        Register Undef = MRI.createVirtualRegister(&AMDGPU::VGPR_16RegClass);
+        BuildMI(*MBB, MI, DL, get(AMDGPU::IMPLICIT_DEF), Undef);
+        BuildMI(*MBB, MI, DL, get(AMDGPU::REG_SEQUENCE), NewDstReg)
+            .addReg(Op.getReg())
+            .addImm(AMDGPU::lo16)
+            .addReg(Undef)
+            .addImm(AMDGPU::hi16);
+        Op.setReg(NewDstReg);
       }
     }
   }
@@ -7793,8 +7813,19 @@ void SIInstrInfo::moveToVALUImpl(SIInstrWorklist &Worklist,
             .add(Inst.getOperand(1))
             .add(MachineOperand::CreateImm(AMDGPU::lo16));
         Inst.eraseFromParent();
-
         MRI.replaceRegWith(DstReg, NewDstReg);
+        // legalize useMI with mismatched size
+        for (MachineRegisterInfo::use_iterator I = MRI.use_begin(NewDstReg),
+                                               E = MRI.use_end();
+             I != E; ++I) {
+          MachineInstr &UseMI = *I->getParent();
+          unsigned UseMIOpcode = UseMI.getOpcode();
+          if (AMDGPU::isTrue16Inst(UseMIOpcode) &&
+              (16 ==
+               RI.getRegSizeInBits(*getOpRegClass(UseMI, I.getOperandNo())))) {
+            I->setSubReg(AMDGPU::lo16);
+          }
+        }
         addUsersToMoveToVALUWorklist(NewDstReg, MRI, Worklist);
         return;
       }
diff --git a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
index 6e24d9afa2bbc..518a28ebe6539 100644
--- a/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
+++ b/llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir
@@ -54,6 +54,72 @@ body:             |
     %4:vgpr_16 = V_CVT_F16_U16_t16_e64 0, %3:sreg_32, 0, 0, 0, implicit $mode, implicit $exec
 ...
 
+---
+name:            salu16_usedby_salu32
+body:             |
+  bb.0:
+    ; GCN-LABEL: name: salu16_usedby_salu32
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_TRUNC_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_TRUNC_F16_t16_e64 0, [[DEF]].lo16, 0, 0, 0, implicit $mode, implicit $exec
+    ; GCN-NEXT: [[DEF2:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vgpr_32 = REG_SEQUENCE [[V_TRUNC_F16_t16_e64_]], %subreg.lo16, [[DEF2]], %subreg.hi16
+    ; GCN-NEXT: [[V_XOR_B32_e64_:%[0-9]+]]:vgpr_32 = V_XOR_B32_e64 [[REG_SEQUENCE]], [[DEF]], implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:sreg_32 = COPY %0:vgpr_32
+    %2:sreg_32 = S_TRUNC_F16 %1:sreg_32, implicit $mode
+    %3:sreg_32 = S_XOR_B32 %2:sreg_32, %1:sreg_32, implicit-def $scc
+...
+
+---
+name:            salu32_usedby_salu16
+body:             |
+  bb.0:
+    ; GCN-LABEL: name: salu32_usedby_salu16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[V_XOR_B32_e64_:%[0-9]+]]:vgpr_32 = V_XOR_B32_e64 [[DEF]], [[DEF]], implicit $exec
+    ; GCN-NEXT: [[V_TRUNC_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_TRUNC_F16_t16_e64 0, [[V_XOR_B32_e64_]].lo16, 0, 0, 0, implicit $mode, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:sreg_32 = COPY %0:vgpr_32
+    %2:sreg_32 = S_XOR_B32 %1:sreg_32, %1:sreg_32, implicit-def $scc
+    %3:sreg_32 = S_TRUNC_F16 %2:sreg_32, implicit $mode
+...
+
+---
+name:            sgpr16_to_spgr32
+body:             |
+  bb.0:
+    ; GCN-LABEL: name: sgpr16_to_spgr32
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:sgpr_lo16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[SUBREG_TO_REG:%[0-9]+]]:vgpr_32 = SUBREG_TO_REG 0, [[DEF]], %subreg.lo16
+    ; GCN-NEXT: [[SUBREG_TO_REG1:%[0-9]+]]:vgpr_32 = SUBREG_TO_REG 0, [[DEF]], %subreg.lo16
+    ; GCN-NEXT: [[V_FMAC_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_FMAC_F16_t16_e64 0, [[SUBREG_TO_REG1]].lo16, 0, [[SUBREG_TO_REG1]].lo16, 0, [[SUBREG_TO_REG]].lo16, 0, 0, 0, implicit $mode, implicit $exec
+    %0:vgpr_16 = IMPLICIT_DEF
+    %1:sgpr_lo16 = COPY %0:vgpr_16
+    %2:sreg_32 = COPY %0:vgpr_16
+    %3:sreg_32 = COPY %1:sgpr_lo16
+    %4:sreg_32 = S_FMAC_F16 %3:sreg_32, %3:sreg_32, %2:sreg_32, implicit $mode
+...
+
+---
+name:            sgpr32_to_spgr16
+body:             |
+  bb.0:
+    ; GCN-LABEL: name: sgpr32_to_spgr16
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_16 = IMPLICIT_DEF
+    ; GCN-NEXT: [[SUBREG_TO_REG:%[0-9]+]]:vgpr_32 = SUBREG_TO_REG 0, [[DEF]], %subreg.lo16
+    ; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_16 = COPY [[SUBREG_TO_REG]]
+    ; GCN-NEXT: [[SUBREG_TO_REG1:%[0-9]+]]:vgpr_32 = SUBREG_TO_REG 0, [[COPY]], %subreg.lo16
+    ; GCN-NEXT: [[V_FMAC_F16_t16_e64_:%[0-9]+]]:vgpr_16 = V_FMAC_F16_t16_e64 0, [[SUBREG_TO_REG1]].lo16, 0, [[SUBREG_TO_REG1]].lo16, 0, [[SUBREG_TO_REG]].lo16, 0, 0, 0, implicit $mode, implicit $exec
+    %0:vgpr_16 = IMPLICIT_DEF
+    %1:sreg_32 = COPY %0:vgpr_16
+    %2:sgpr_lo16 = COPY %1:sreg_32
+    %3:sreg_32 = COPY %2:sgpr_lo16
+    %4:sreg_32 = S_FMAC_F16 %3:sreg_32, %3:sreg_32, %1:sreg_32, implicit $mode
+...
+
 ---
 name:            vgpr16_to_spgr32
 body:             |
diff --git a/llvm/test/CodeGen/AMDGPU/frem.ll b/llvm/test/CodeGen/AMDGPU/frem.ll
index 125d009429cbf..872b3afb4fa70 100644
--- a/llvm/test/CodeGen/AMDGPU/frem.ll
+++ b/llvm/test/CodeGen/AMDGPU/frem.ll
@@ -6,7 +6,8 @@
 ; RUN:  llc -amdgpu-scalarize-global-loads=false -enable-misched=0 -mtriple=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck --check-prefix=GFX10 %s
 ; RUN:  llc -amdgpu-scalarize-global-loads=false -enable-misched=0 -mtriple=amdgcn -mcpu=gfx1100 -mattr=+real-true16 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX11,GFX11-TRUE16 %s
 ; RUN:  llc -amdgpu-scalarize-global-loads=false -enable-misched=0 -mtriple=amdgcn -mcpu=gfx1100 -mattr=-real-true16 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX11,GFX11-FAKE16 %s
-; RUN:  llc -amdgpu-scalarize-global-loads=false -enable-misched=0 -mtriple=amdgcn -mcpu=gfx1150 -verify-machineinstrs < %s | FileCheck --check-prefix=GFX1150 %s
+; RUN:  llc -amdgpu-scalarize-global-loads=false -enable-misched=0 -mtriple=amdgcn -mcpu=gfx1150 -mattr=+real-true16 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX1150,GFX1150-TRUE16 %s
+; RUN:  llc -amdgpu-scalarize-global-loads=false -enable-misched=0 -mtriple=amdgcn -mcpu=gfx1150 -mattr=-real-true16 -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX1150,GFX1150-FAKE16 %s
 
 define amdgpu_kernel void @frem_f16(ptr addrspace(1) %out, ptr addrspace(1) %in1,
 ; SI-LABEL: frem_f16:
@@ -255,42 +256,81 @@ define amdgpu_kernel void @frem_f16(ptr addrspace(1) %out, ptr addrspace(1) %in1
 ; GFX11-FAKE16-NEXT:    global_store_b16 v0, v1, s[0:1]
 ; GFX11-FAKE16-NEXT:    s_endpgm
 ;
-; GFX1150-LABEL: frem_f16:
-; GFX1150:       ; %bb.0:
-; GFX1150-NEXT:    s_clause 0x1
-; GFX1150-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
-; GFX1150-NEXT:    s_load_b64 s[4:5], s[4:5], 0x34
-; GFX1150-NEXT:    v_mov_b32_e32 v0, 0
-; GFX1150-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1150-NEXT:    s_clause 0x1
-; GFX1150-NEXT:    global_load_u16 v1, v0, s[2:3]
-; GFX1150-NEXT:    global_load_u16 v2, v0, s[4:5] offset:8
-; GFX1150-NEXT:    s_waitcnt vmcnt(1)
-; GFX1150-NEXT:    v_cvt_f32_f16_e32 v3, v1
-; GFX1150-NEXT:    s_waitcnt vmcnt(0)
-; GFX1150-NEXT:    v_cvt_f32_f16_e32 v4, v2
-; GFX1150-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(TRANS32_DEP_1)
-; GFX1150-NEXT:    v_rcp_f32_e32 v4, v4
-; GFX1150-NEXT:    v_mul_f32_e32 v3, v3, v4
-; GFX1150-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX1150-NEXT:    v_fma_mix_f32 v5, -v2, v3, v1 op_sel_hi:[1,0,1]
-; GFX1150-NEXT:    v_fmac_f32_e32 v3, v5, v4
-; GFX1150-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX1150-NEXT:    v_fma_mix_f32 v5, -v2, v3, v1 op_sel_hi:[1,0,1]
-; GFX1150-NEXT:    v_mul_f32_e32 v4, v5, v4
-; GFX1150-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX1150-NEXT:    v_and_b32_e32 v4, 0xff800000, v4
-; GFX1150-NEXT:    v_add_f32_e32 v3, v4, v3
-; GFX1150-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX1150-NEXT:    v_cvt_f16_f32_e32 v3, v3
-; GFX1150-NEXT:    v_div_fixup_f16 v3, v3, v2, v1
-; GFX1150-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX1150-NEXT:    v_trunc_f16_e32 v3, v3
-; GFX1150-NEXT:    v_xor_b32_e32 v3, 0x8000, v3
-; GFX1150-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1150-NEXT:    v_fmac_f16_e32 v1, v3, v2
-; GFX1150-NEXT:    global_store_b16 v0, v1, s[0:1]
-; GFX1150-NEXT:    s_endpgm
+; GFX1150-TRUE16-LABEL: frem_f16:
+; GFX1150-TRUE16:       ; %bb.0:
+; GFX1150-TRUE16-NEXT:    s_clause 0x1
+; GFX1150-TRUE16-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
+; GFX1150-TRUE16-NEXT:    s_load_b64 s[4:5], s[4:5], 0x34
+; GFX1150-TRUE16-NEXT:    v_mov_b32_e32 v2, 0
+; GFX1150-TRUE16-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1150-TRUE16-NEXT:    s_clause 0x1
+; GFX1150-TRUE16-NEXT:    global_load_d16_b16 v0, v2, s[2:3]
+; GFX1150-TRUE16-NEXT:    global_load_d16_b16 v1, v2, s[4:5] offset:8
+; GFX1150-TRUE16-NEXT:    s_waitcnt vmcnt(1)
+; GFX1150-TRUE16-NEXT:    v_cvt_f32_f16_e32 v3, v0.l
+; GFX1150-TRUE16-NEXT:    s_waitcnt vmcnt(0)
+; GFX1150-TRUE16-NEXT:    v_cvt_f32_f16_e32 v4, v1.l
+; GFX1150-TRUE16-NEXT:    v_mov_b16_e32 v5.l, v1.l
+; GFX1150-TRUE16-NEXT:    v_mov_b16_e32 v6.l, v0.l
+; GFX1150-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(TRANS32_DEP_1)
+; GFX1150-TRUE16-NEXT:    v_rcp_f32_e32 v4, v4
+; GFX1150-TRUE16-NEXT:    v_mul_f32_e32 v3, v3, v4
+; GFX1150-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-TRUE16-NEXT:    v_fma_mix_f32 v7, -v5, v3, v6 op_sel_hi:[1,0,1]
+; GFX1150-TRUE16-NEXT:    v_fmac_f32_e32 v3, v7, v4
+; GFX1150-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-TRUE16-NEXT:    v_fma_mix_f32 v5, -v5, v3, v6 op_sel_hi:[1,0,1]
+; GFX1150-TRUE16-NEXT:    v_mul_f32_e32 v4, v5, v4
+; GFX1150-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-TRUE16-NEXT:    v_and_b32_e32 v4, 0xff800000, v4
+; GFX1150-TRUE16-NEXT:    v_add_f32_e32 v3, v4, v3
+; GFX1150-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-TRUE16-NEXT:    v_cvt_f16_f32_e32 v0.h, v3
+; GFX1150-TRUE16-NEXT:    v_div_fixup_f16 v0.h, v0.h, v1.l, v0.l
+; GFX1150-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-TRUE16-NEXT:    v_trunc_f16_e32 v3.l, v0.h
+; GFX1150-TRUE16-NEXT:    v_xor_b32_e32 v3, 0x8000, v3
+; GFX1150-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1150-TRUE16-NEXT:    v_fmac_f16_e32 v0.l, v3.l, v1.l
+; GFX1150-TRUE16-NEXT:    global_store_b16 v2, v0, s[0:1]
+; GFX1150-TRUE16-NEXT:    s_endpgm
+;
+; GFX1150-FAKE16-LABEL: frem_f16:
+; GFX1150-FAKE16:       ; %bb.0:
+; GFX1150-FAKE16-NEXT:    s_clause 0x1
+; GFX1150-FAKE16-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
+; GFX1150-FAKE16-NEXT:    s_load_b64 s[4:5], s[4:5], 0x34
+; GFX1150-FAKE16-NEXT:    v_mov_b32_e32 v0, 0
+; GFX1150-FAKE16-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1150-FAKE16-NEXT:    s_clause 0x1
+; GFX1150-FAKE16-NEXT:    global_load_u16 v1, v0, s[2:3]
+; GFX1150-FAKE16-NEXT:    global_load_u16 v2, v0, s[4:5] offset:8
+; GFX1150-FAKE16-NEXT:    s_waitcnt vmcnt(1)
+; GFX1150-FAKE16-NEXT:    v_cvt_f32_f16_e32 v3, v1
+; GFX1150-FAKE16-NEXT:    s_waitcnt vmcnt(0)
+; GFX1150-FAKE16-NEXT:    v_cvt_f32_f16_e32 v4, v2
+; GFX1150-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(TRANS32_DEP_1)
+; GFX1150-FAKE16-NEXT:    v_rcp_f32_e32 v4, v4
+; GFX1150-FAKE16-NEXT:    v_mul_f32_e32 v3, v3, v4
+; GFX1150-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-FAKE16-NEXT:    v_fma_mix_f32 v5, -v2, v3, v1 op_sel_hi:[1,0,1]
+; GFX1150-FAKE16-NEXT:    v_fmac_f32_e32 v3, v5, v4
+; GFX1150-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-FAKE16-NEXT:    v_fma_mix_f32 v5, -v2, v3, v1 op_sel_hi:[1,0,1]
+; GFX1150-FAKE16-NEXT:    v_mul_f32_e32 v4, v5, v4
+; GFX1150-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-FAKE16-NEXT:    v_and_b32_e32 v4, 0xff800000, v4
+; GFX1150-FAKE16-NEXT:    v_add_f32_e32 v3, v4, v3
+; GFX1150-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-FAKE16-NEXT:    v_cvt_f16_f32_e32 v3, v3
+; GFX1150-FAKE16-NEXT:    v_div_fixup_f16 v3, v3, v2, v1
+; GFX1150-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-FAKE16-NEXT:    v_trunc_f16_e32 v3, v3
+; GFX1150-FAKE16-NEXT:    v_xor_b32_e32 v3, 0x8000, v3
+; GFX1150-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1150-FAKE16-NEXT:    v_fmac_f16_e32 v1, v3, v2
+; GFX1150-FAKE16-NEXT:    global_store_b16 v0, v1, s[0:1]
+; GFX1150-FAKE16-NEXT:    s_endpgm
                       ptr addrspace(1) %in2) #0 {
    %gep2 = getelementptr half, ptr addrspace(1) %in2, i32 4
    %r0 = load half, ptr addrspace(1) %in1, align 4
@@ -456,26 +496,47 @@ define amdgpu_kernel void @fast_frem_f16(ptr addrspace(1) %out, ptr addrspace(1)
 ; GFX11-FAKE16-NEXT:    global_store_b16 v0, v1, s[0:1]
 ; GFX11-FAKE16-NEXT:    s_endpgm
 ;
-; GFX1150-LABEL: fast_frem_f16:
-; GFX1150:       ; %bb.0:
-; GFX1150-NEXT:    s_clause 0x1
-; GFX1150-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
-; GFX1150-NEXT:    s_load_b64 s[4:5], s[4:5], 0x34
-; GFX1150-NEXT:    v_mov_b32_e32 v0, 0
-; GFX1150-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1150-NEXT:    s_clause 0x1
-; GFX1150-NEXT:    global_load_u16 v1, v0, s[2:3]
-; GFX1150-NEXT:    global_load_u16 v2, v0, s[4:5] offset:8
-; GFX1150-NEXT:    s_waitcnt vmcnt(0)
-; GFX1150-NEXT:    v_rcp_f16_e32 v3, v2
-; GFX1150-NEXT:    s_delay_alu instid0(TRANS32_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX1150-NEXT:    v_mul_f16_e32 v3, v1, v3
-; GFX1150-NEXT:    v_trunc_f16_e32 v3, v3
-; GFX1150-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX1150-NEXT:    v_xor_b32_e32 v3, 0x8000, v3
-; GFX1150-NEXT:    v_fmac_f16_e32 v1, v3, v2
-; GFX1150-NEXT:    global_store_b16 v0, v1, s[0:1]
-; GFX1150-NEXT:    s_endpgm
+; GFX1150-TRUE16-LABEL: fast_frem_f16:
+; GFX1150-TRUE16:       ; %bb.0:
+; GFX1150-TRUE16-NEXT:    s_clause 0x1
+; GFX1150-TRUE16-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
+; GFX1150-TRUE16-NEXT:    s_load_b64 s[4:5], s[4:5], 0x34
+; GFX1150-TRUE16-NEXT:    v_mov_b32_e32 v2, 0
+; GFX1150-TRUE16-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1150-TRUE16-NEXT:    s_clause 0x1
+; GFX1150-TRUE16-NEXT:    global_load_d16_b16 v0, v2, s[2:3]
+; GFX1150-TRUE16-NEXT:    global_load_d16_hi_b16 v0, v2, s[4:5] offset:8
+; GFX1150-TRUE16-NEXT:    s_waitcnt vmcnt(0)
+; GFX1150-TRUE16-NEXT:    v_rcp_f16_e32 v1.l, v0.h
+; GFX1150-TRUE16-NEXT:    s_delay_alu instid0(TRANS32_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-TRUE16-NEXT:    v_mul_f16_e32 v1.l, v0.l, v1.l
+; GFX1150-TRUE16-NEXT:    v_trunc_f16_e32 v1.l, v1.l
+; GFX1150-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-TRUE16-NEXT:    v_xor_b32_e32 v1, 0x8000, v1
+; GFX1150-TRUE16-NEXT:    v_fmac_f16_e32 v0.l, v1.l, v0.h
+; GFX1150-TRUE16-NEXT:    global_store_b16 v2, v0, s[0:1]
+; GFX1150-TRUE16-NEXT:    s_endpgm
+;
+; GFX1150-FAKE16-LABEL: fast_frem_f16:
+; GFX1150-FAKE16:       ; %bb.0:
+; GFX1150-FAKE16-NEXT:    s_clause 0x1
+; GFX1150-FAKE16-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
+; GFX1150-FAKE16-NEXT:    s_load_b64 s[4:5], s[4:5], 0x34
+; GFX1150-FAKE16-NEXT:    v_mov_b32_e32 v0, 0
+; GFX1150-FAKE16-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1150-FAKE16-NEXT:    s_clause 0x1
+; GFX1150-FAKE16-NEXT:    global_load_u16 v1, v0, s[2:3]
+; GFX1150-FAKE16-NEXT:    global_load_u16 v2, v0, s[4:5] offset:8
+; GFX1150-FAKE16-NEXT:    s_waitcnt vmcnt(0)
+; GFX1150-FAKE16-NEXT:    v_rcp_f16_e32 v3, v2
+; GFX1150-FAKE16-NEXT:    s_delay_alu instid0(TRANS32_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-FAKE16-NEXT:    v_mul_f16_e32 v3, v1, v3
+; GFX1150-FAKE16-NEXT:    v_trunc_f16_e32 v3, v3
+; GFX1150-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1150-FAKE16-NEXT:    v_xor_b32_e32 v3, 0x8000, v3
+; GFX1150-FAKE16-NEXT:    v_fmac_f16_e32 v1, v3, v2
+; GFX1150-FAKE16-NEXT:    global_store_b16 v0, v1, s[0:1]
+; GFX1150-FAKE16-NEXT:    s_endpgm
                       ptr addrspace(1) %in2) #0 {
    %gep2 = getelementptr half, ptr addrspace(1) %in2, i32 4
    %r0 = load half, ptr addrspace(1) %in1, align 4
@@ -641,26 +702,47 @@ define amdgpu_kernel void @unsafe_frem_f16(ptr addrspace(1) %out, ptr addrspace(
 ; GFX11-FAKE16-NEXT:    global_store_b16 v0, v1, s[0:1]
 ; GFX11-FAKE16-NEXT:    s_endpgm
 ;
-; GFX1150-LABEL: unsafe_frem_f16:
-; GFX1150:       ; %bb.0:
-; GFX1150-NEXT:    s_clause 0x1
-; GFX1150-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
-; GFX1150-NEXT:    s_load_b64 s[4:5], s[4:5], 0x34
-; GFX1150-NEXT:    v_mov_b32_e32 v0, 0
-; GFX1150-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1150-NEXT:    s_clause 0x1
-; GFX1150-NEXT:    global_load_u16 v1, v0, s[2:3]
-; GFX1150-NEXT:    global_load_u16 v2, v0, s[4:5] offset:8
-; GFX1150-NEXT:    s_waitcnt vmcnt(0)
-; GFX1150-NEXT:    v_rcp_f16_e32 v3, v2
-; GFX1150-NEXT:    s_delay_alu instid0(TRANS32_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX1150-NEXT:    v_mul_f16_e32 v3, v1, v3
-; GFX1150-NEXT:    v_trunc_f16_e32 v3, v3
-; GFX1150-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX1150-NEXT:    v_xor_b32_e32 v3, 0x8000, v3
-; GFX1150-NEXT:    v_fmac_f16_e32 v1, v3, v2
-; GFX1150-NEXT:    global_store_b16 v0, v1, s[0:1]
-; GFX1150-NEXT:    s_endpgm
+; GFX1150-TRUE16-LABEL: unsafe_frem_f16:
+; GFX1150-TRUE16:       ; %bb.0:
+; GFX1150-TRUE16-NEXT:    s_clause 0x1
+; GFX1150-TRUE16-NEXT:    s_load_b128 s[0:3], s[4:5], 0x24
+; GFX1150-TRUE16-NEXT:    s_load_...
[truncated]

@broxigarchen broxigarchen marked this pull request as draft May 7, 2025 14:15
@broxigarchen broxigarchen marked this pull request as ready for review May 7, 2025 15:47
@broxigarchen broxigarchen requested review from Sisyph, kosarev and arsenm May 7, 2025 15:47
@broxigarchen
Copy link
Contributor Author

CI error is not related

@broxigarchen
Copy link
Contributor Author

rebased

Comment on lines 7811 to 7818
// legalize useMI with mismatched size
for (MachineRegisterInfo::use_iterator I = MRI.use_begin(NewDstReg),
E = MRI.use_end();
I != E; ++I) {
MachineInstr &UseMI = *I->getParent();
unsigned UseMIOpcode = UseMI.getOpcode();
if (AMDGPU::isTrue16Inst(UseMIOpcode) &&
(16 ==
RI.getRegSizeInBits(*getOpRegClass(UseMI, I.getOperandNo())))) {
I->setSubReg(AMDGPU::lo16);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this, shouldn't really need to have an extra use list scan?

Copy link
Contributor Author

@broxigarchen broxigarchen May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. We mainly have problem with the replaceRegWith() call. If we don't have size mismatch issue, we can simply replace the reg to Equivalent VGPR class. But when we have size mismatch issue, this is a problem because we might replace a 16bit reg into a 32bit reg and vice versa. and thus we need to transverse user list and fix size accordingly.
  2. We replace COPY like inst first and then process useMI in sequence. However for multi-operand t16 inst, we might have useMI being lowered before all its operands being processed. i.e.
(1) %0:vgpr_16 = IMPLICIT_DEF
(2) %1:sgpr_lo16 = COPY %0:vgpr_16
(3) %2:sreg_32 = COPY %0:vgpr_16
(4) %3:sreg_32 = COPY %1:sgpr_lo16
(5) %4:sreg_32 = S_FMAC_F16 %3:sreg_32, %3:sreg_32, %2:sreg_32, implicit $mode

The order of lowering goes from (3)->(2)->(5)->(4). And thus, this hit a problem as lowering (4) putting a VGPR32 to a t16 insts. Thus we need to check useMI again when we lower (4).

There are multiple ways to fix this. What I am currently doing is to add use list check when we process mismatch size inst or t16 insts in moveToVALU process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of trying to hack up the uses, just insert whatever copies are necessary to make the types match. You shouldn't need to do any use analysis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Matt. I am not quite understand how the copies help here. These mismatch are mainly caused by salu16 used by salu32, or the reverse. In sgpr they are fine, but when moved to valu it become a problem. Thus I think a use analysis is necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did the same use list scan where we process general inst. In this place we needed an additional one is because this is a early return routine which process mismatched size copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to fix this with iteration order? If (5) is indeed processed by moveToVALUImpl before all of it's operands, we need to do the use list scan when we replace any register that could be used in a true16 instruction. But if we could guarantee the arguments would be processed before the useMI, then I think the call to legalizeOperandsVALUt16 would fix up these cases.

Copy link
Contributor Author

@broxigarchen broxigarchen Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the iteration order requires to rewrite the fix-sgpr-copy sequences and it might takes a while to address all the side-effects.

I think the user analysis is necesssary, and I found that there is a cleaner solution as we can just move the use analysis into the addUserToMoveVALU to avoid all the redundant scan and code

Comment on lines 7811 to 7818
// legalize useMI with mismatched size
for (MachineRegisterInfo::use_iterator I = MRI.use_begin(NewDstReg),
E = MRI.use_end();
I != E; ++I) {
MachineInstr &UseMI = *I->getParent();
unsigned UseMIOpcode = UseMI.getOpcode();
if (AMDGPU::isTrue16Inst(UseMIOpcode) &&
(16 ==
RI.getRegSizeInBits(*getOpRegClass(UseMI, I.getOperandNo())))) {
I->setSubReg(AMDGPU::lo16);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to fix this with iteration order? If (5) is indeed processed by moveToVALUImpl before all of it's operands, we need to do the use list scan when we replace any register that could be used in a true16 instruction. But if we could guarantee the arguments would be processed before the useMI, then I think the call to legalizeOperandsVALUt16 would fix up these cases.

Copy link

github-actions bot commented Jun 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@broxigarchen broxigarchen requested a review from Sisyph June 3, 2025 13:36
Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it looks a lot better! Besides my small new comments LGTM.

@broxigarchen broxigarchen merged commit b668b64 into llvm:main Jun 4, 2025
11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 4, 2025

LLVM Buildbot has detected a new failure on builder hip-third-party-libs-test running on ext_buildbot_hw_05-hip-docker while building llvm at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/206/builds/1304

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py --jobs=32' (failure)
...
[6776/7846] Creating library symlink lib/libMLIRLinalgTransformOps.so
[6777/7846] Linking CXX shared library lib/libMLIRCAPISparseTensor.so.21.0git
[6778/7846] Creating library symlink lib/libMLIRCAPISparseTensor.so
[6779/7846] Linking CXX shared library lib/libMLIRSparseTensorTransformOps.so.21.0git
[6780/7846] Creating library symlink lib/libMLIRSparseTensorTransformOps.so
[6781/7846] Linking CXX shared library lib/libMLIRTestDialect.so.21.0git
[6782/7846] Creating library symlink lib/libMLIRTestDialect.so
[6783/7846] Linking CXX shared library lib/libMLIRMemRefTestPasses.so.21.0git
[6784/7846] Creating library symlink lib/libMLIRMemRefTestPasses.so
[6785/7846] Linking CXX shared library lib/libMLIRTestMemRefToLLVMWithTransforms.so.21.0git
FAILED: lib/libMLIRTestMemRefToLLVMWithTransforms.so.21.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/hip-third-party-libs-test/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRTestMemRefToLLVMWithTransforms.so.21.0git -o lib/libMLIRTestMemRefToLLVMWithTransforms.so.21.0git tools/mlir/test/lib/Conversion/MemRefToLLVM/CMakeFiles/MLIRTestMemRefToLLVMWithTransforms.dir/TestMemRefToLLVMWithTransforms.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/hip-third-party-libs-test/build/lib:"  lib/libMLIRTestDialect.so.21.0git  lib/libMLIRDerivedAttributeOpInterface.so.21.0git  lib/libMLIRLinalgTransforms.so.21.0git  lib/libMLIRIndexDialect.so.21.0git  lib/libMLIRMemRefTransforms.so.21.0git  lib/libMLIRArithTransforms.so.21.0git  lib/libMLIRFuncTransforms.so.21.0git  lib/libMLIRNVGPUDialect.so.21.0git  lib/libMLIRMeshTransforms.so.21.0git  lib/libMLIRTosaShardingInterfaceImpl.so.21.0git  lib/libMLIRShardingInterface.so.21.0git  lib/libMLIRMeshDialect.so.21.0git  lib/libMLIRTosaDialect.so.21.0git  lib/libMLIRQuantUtils.so.21.0git  lib/libMLIRQuantDialect.so.21.0git  lib/libMLIRSCFTransforms.so.21.0git  lib/libMLIRBufferizationTransforms.so.21.0git  lib/libMLIRTensorTransforms.so.21.0git  lib/libMLIRAffineTransforms.so.21.0git  lib/libMLIRSCFUtils.so.21.0git  lib/libMLIRTensorTilingInterfaceImpl.so.21.0git  lib/libMLIRLinalgUtils.so.21.0git  lib/libMLIRTensorUtils.so.21.0git  lib/libMLIRTilingInterface.so.21.0git  lib/libMLIRVectorToSCF.so.21.0git  lib/libMLIRVectorTransforms.so.21.0git  lib/libMLIRLinalgDialect.so.21.0git  lib/libMLIRParser.so.21.0git  lib/libMLIRBytecodeReader.so.21.0git  lib/libMLIRAsmParser.so.21.0git  lib/libMLIRMathDialect.so.21.0git  lib/libMLIRAffineUtils.so.21.0git  lib/libMLIRVectorUtils.so.21.0git  lib/libMLIRVectorDialect.so.21.0git  lib/libMLIRMaskableOpInterface.so.21.0git  lib/libMLIRMaskingOpInterface.so.21.0git  lib/libMLIRAffineAnalysis.so.21.0git  lib/libMLIRSCFDialect.so.21.0git  lib/libMLIRControlFlowDialect.so.21.0git  lib/libMLIRMemRefUtils.so.21.0git  lib/libMLIRVectorInterfaces.so.21.0git  lib/libMLIRGPUUtils.so.21.0git  lib/libMLIRPtrDialect.so.21.0git  lib/libMLIRNVVMDialect.so.21.0git  lib/libMLIRLLVMDialect.so.21.0git  lib/libLLVMBitWriter.so.21.0git  lib/libLLVMBitReader.so.21.0git  lib/libLLVMAsmParser.so.21.0git  lib/libLLVMCore.so.21.0git  lib/libLLVMBinaryFormat.so.21.0git  lib/libMLIRGPUDialect.so.21.0git  lib/libMLIRDLTIDialect.so.21.0git  lib/libMLIRReduce.so.21.0git  lib/libMLIRTransforms.so.21.0git  lib/libMLIRTransformUtils.so.21.0git  lib/libMLIRRewrite.so.21.0git  lib/libMLIRRewritePDL.so.21.0git  lib/libMLIRPDLToPDLInterp.so.21.0git  lib/libMLIRPass.so.21.0git  lib/libMLIRPDLInterpDialect.so.21.0git  lib/libMLIRPDLDialect.so.21.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.21.0git  lib/libMLIRBufferizationDialect.so.21.0git  lib/libMLIRFuncDialect.so.21.0git  lib/libMLIRTensorDialect.so.21.0git  lib/libMLIRParallelCombiningOpInterface.so.21.0git  lib/libMLIRAffineDialect.so.21.0git  lib/libMLIRMemRefDialect.so.21.0git  lib/libMLIRArithUtils.so.21.0git  lib/libMLIRMemorySlotInterfaces.so.21.0git  lib/libMLIRSparseTensorDialect.so.21.0git  lib/libMLIRDialectUtils.so.21.0git  lib/libMLIRComplexDialect.so.21.0git  lib/libMLIRArithDialect.so.21.0git  lib/libMLIRDialect.so.21.0git  lib/libMLIRCastInterfaces.so.21.0git  lib/libMLIRInferIntRangeCommon.so.2
ce.so.21.0git  lib/libMLIRDestinationStyleOpInterface.so.21.0git  lib/libMLIRAnalysis.so.21.0git  lib/libMLIRControlFlowInterfaces.so.21.0git  lib/libMLIRDataLayoutInterfaces.so.21.0git  lib/libMLIRInferIntRangeInterface.so.21.0git  lib/libMLIRInferTypeOpInterface.so.21.0git  lib/libMLIRSideEffectInterfaces.so.21.0git  lib/libMLIRViewLikeInterface.so.21.0git  lib/libMLIRLoopLikeInterface.so.21.0git  lib/libMLIRFunctionInterfaces.so.21.0git  lib/libMLIRCallInterfaces.so.21.0git  lib/libMLIRIR.so.21.0git  lib/libMLIRSupport.so.21.0git  lib/libMLIRPresburger.so.21.0git  lib/libLLVMSupport.so.21.0git  -Wl,-rpath-link,/home/botworker/bbot/hip-third-party-libs-test/build/lib && :
/usr/bin/ld: tools/mlir/test/lib/Conversion/MemRefToLLVM/CMakeFiles/MLIRTestMemRefToLLVMWithTransforms.dir/TestMemRefToLLVMWithTransforms.cpp.o: in function `(anonymous namespace)::TestMemRefToLLVMWithTransforms::runOnOperation()':
TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x68): undefined reference to `mlir::LowerToLLVMOptions::LowerToLLVMOptions(mlir::MLIRContext*)'
/usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x80): undefined reference to `mlir::LLVMTypeConverter::LLVMTypeConverter(mlir::MLIRContext*, mlir::LowerToLLVMOptions const&, mlir::DataLayoutAnalysis const*)'
/usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x143): undefined reference to `mlir::populateFuncToLLVMConversionPatterns(mlir::LLVMTypeConverter const&, mlir::RewritePatternSet&, mlir::SymbolTable const*)'
/usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x174): undefined reference to `mlir::LLVMConversionTarget::LLVMConversionTarget(mlir::MLIRContext&)'
collect2: error: ld returned 1 exit status
[6786/7846] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/cmake_pch.hxx.gch
[6787/7846] Linking CXX shared library lib/libMLIRBufferizationTestPasses.so.21.0git
[6788/7846] Linking CXX shared library lib/libMLIRDLTITestPasses.so.21.0git
[6789/7846] Linking CXX shared library lib/libMLIRTestPDLL.so.21.0git
[6790/7846] Linking CXX shared library lib/libMLIRFuncTestPasses.so.21.0git
[6791/7846] Linking CXX shared library lib/libMLIRTestToLLVMIRTranslation.so.21.0git
[6792/7846] Linking CXX shared library lib/libMLIRTestIR.so.21.0git
[6793/7846] Linking CXX shared library lib/libMLIRTestTransforms.so.21.0git
[6794/7846] Linking CXX shared library lib/libMLIRTestAnalysis.so.21.0git
[6795/7846] Linking CXX shared library lib/libMLIRTestFromLLVMIRTranslation.so.21.0git
[6796/7846] Linking CXX shared library lib/libMLIRAffineTransformsTestPasses.so.21.0git
[6797/7846] Linking CXX shared library lib/libMLIRTestFuncToLLVM.so.21.0git
[6798/7846] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/intrinsics.test.dir/intrinsics.cpp.o
[6799/7846] Building InstCombineTables.inc...
[6800/7846] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/folding.test.dir/folding.cpp.o
[6801/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/logical.cpp.o
[6802/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/static-data.cpp.o
[6803/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/common.cpp.o
[6804/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/host.cpp.o
[6805/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/target.cpp.o
[6806/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/complex.cpp.o
[6807/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/integer.cpp.o
[6808/7846] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/expression.test.dir/expression.cpp.o
[6809/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/real.cpp.o
[6810/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/constant.cpp.o
[6811/7846] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.gch
[6812/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/call.cpp.o
[6813/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/type.cpp.o
[6814/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/initial-image.cpp.o
[6815/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/variable.cpp.o
Step 7 (build cmake config) failure: build cmake config (failure)
...
[6776/7846] Creating library symlink lib/libMLIRLinalgTransformOps.so
[6777/7846] Linking CXX shared library lib/libMLIRCAPISparseTensor.so.21.0git
[6778/7846] Creating library symlink lib/libMLIRCAPISparseTensor.so
[6779/7846] Linking CXX shared library lib/libMLIRSparseTensorTransformOps.so.21.0git
[6780/7846] Creating library symlink lib/libMLIRSparseTensorTransformOps.so
[6781/7846] Linking CXX shared library lib/libMLIRTestDialect.so.21.0git
[6782/7846] Creating library symlink lib/libMLIRTestDialect.so
[6783/7846] Linking CXX shared library lib/libMLIRMemRefTestPasses.so.21.0git
[6784/7846] Creating library symlink lib/libMLIRMemRefTestPasses.so
[6785/7846] Linking CXX shared library lib/libMLIRTestMemRefToLLVMWithTransforms.so.21.0git
FAILED: lib/libMLIRTestMemRefToLLVMWithTransforms.so.21.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/hip-third-party-libs-test/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRTestMemRefToLLVMWithTransforms.so.21.0git -o lib/libMLIRTestMemRefToLLVMWithTransforms.so.21.0git tools/mlir/test/lib/Conversion/MemRefToLLVM/CMakeFiles/MLIRTestMemRefToLLVMWithTransforms.dir/TestMemRefToLLVMWithTransforms.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/hip-third-party-libs-test/build/lib:"  lib/libMLIRTestDialect.so.21.0git  lib/libMLIRDerivedAttributeOpInterface.so.21.0git  lib/libMLIRLinalgTransforms.so.21.0git  lib/libMLIRIndexDialect.so.21.0git  lib/libMLIRMemRefTransforms.so.21.0git  lib/libMLIRArithTransforms.so.21.0git  lib/libMLIRFuncTransforms.so.21.0git  lib/libMLIRNVGPUDialect.so.21.0git  lib/libMLIRMeshTransforms.so.21.0git  lib/libMLIRTosaShardingInterfaceImpl.so.21.0git  lib/libMLIRShardingInterface.so.21.0git  lib/libMLIRMeshDialect.so.21.0git  lib/libMLIRTosaDialect.so.21.0git  lib/libMLIRQuantUtils.so.21.0git  lib/libMLIRQuantDialect.so.21.0git  lib/libMLIRSCFTransforms.so.21.0git  lib/libMLIRBufferizationTransforms.so.21.0git  lib/libMLIRTensorTransforms.so.21.0git  lib/libMLIRAffineTransforms.so.21.0git  lib/libMLIRSCFUtils.so.21.0git  lib/libMLIRTensorTilingInterfaceImpl.so.21.0git  lib/libMLIRLinalgUtils.so.21.0git  lib/libMLIRTensorUtils.so.21.0git  lib/libMLIRTilingInterface.so.21.0git  lib/libMLIRVectorToSCF.so.21.0git  lib/libMLIRVectorTransforms.so.21.0git  lib/libMLIRLinalgDialect.so.21.0git  lib/libMLIRParser.so.21.0git  lib/libMLIRBytecodeReader.so.21.0git  lib/libMLIRAsmParser.so.21.0git  lib/libMLIRMathDialect.so.21.0git  lib/libMLIRAffineUtils.so.21.0git  lib/libMLIRVectorUtils.so.21.0git  lib/libMLIRVectorDialect.so.21.0git  lib/libMLIRMaskableOpInterface.so.21.0git  lib/libMLIRMaskingOpInterface.so.21.0git  lib/libMLIRAffineAnalysis.so.21.0git  lib/libMLIRSCFDialect.so.21.0git  lib/libMLIRControlFlowDialect.so.21.0git  lib/libMLIRMemRefUtils.so.21.0git  lib/libMLIRVectorInterfaces.so.21.0git  lib/libMLIRGPUUtils.so.21.0git  lib/libMLIRPtrDialect.so.21.0git  lib/libMLIRNVVMDialect.so.21.0git  lib/libMLIRLLVMDialect.so.21.0git  lib/libLLVMBitWriter.so.21.0git  lib/libLLVMBitReader.so.21.0git  lib/libLLVMAsmParser.so.21.0git  lib/libLLVMCore.so.21.0git  lib/libLLVMBinaryFormat.so.21.0git  lib/libMLIRGPUDialect.so.21.0git  lib/libMLIRDLTIDialect.so.21.0git  lib/libMLIRReduce.so.21.0git  lib/libMLIRTransforms.so.21.0git  lib/libMLIRTransformUtils.so.21.0git  lib/libMLIRRewrite.so.21.0git  lib/libMLIRRewritePDL.so.21.0git  lib/libMLIRPDLToPDLInterp.so.21.0git  lib/libMLIRPass.so.21.0git  lib/libMLIRPDLInterpDialect.so.21.0git  lib/libMLIRPDLDialect.so.21.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.21.0git  lib/libMLIRBufferizationDialect.so.21.0git  lib/libMLIRFuncDialect.so.21.0git  lib/libMLIRTensorDialect.so.21.0git  lib/libMLIRParallelCombiningOpInterface.so.21.0git  lib/libMLIRAffineDialect.so.21.0git  lib/libMLIRMemRefDialect.so.21.0git  lib/libMLIRArithUtils.so.21.0git  lib/libMLIRMemorySlotInterfaces.so.21.0git  lib/libMLIRSparseTensorDialect.so.21.0git  lib/libMLIRDialectUtils.so.21.0git  lib/libMLIRComplexDialect.so.21.0git  lib/libMLIRArithDialect.so.21.0git  lib/libMLIRDialect.so.21.0git  lib/libMLIRCastInterfaces.so.21.0git  lib/libMLIRInferIntRangeCommon.so.2
ce.so.21.0git  lib/libMLIRDestinationStyleOpInterface.so.21.0git  lib/libMLIRAnalysis.so.21.0git  lib/libMLIRControlFlowInterfaces.so.21.0git  lib/libMLIRDataLayoutInterfaces.so.21.0git  lib/libMLIRInferIntRangeInterface.so.21.0git  lib/libMLIRInferTypeOpInterface.so.21.0git  lib/libMLIRSideEffectInterfaces.so.21.0git  lib/libMLIRViewLikeInterface.so.21.0git  lib/libMLIRLoopLikeInterface.so.21.0git  lib/libMLIRFunctionInterfaces.so.21.0git  lib/libMLIRCallInterfaces.so.21.0git  lib/libMLIRIR.so.21.0git  lib/libMLIRSupport.so.21.0git  lib/libMLIRPresburger.so.21.0git  lib/libLLVMSupport.so.21.0git  -Wl,-rpath-link,/home/botworker/bbot/hip-third-party-libs-test/build/lib && :
/usr/bin/ld: tools/mlir/test/lib/Conversion/MemRefToLLVM/CMakeFiles/MLIRTestMemRefToLLVMWithTransforms.dir/TestMemRefToLLVMWithTransforms.cpp.o: in function `(anonymous namespace)::TestMemRefToLLVMWithTransforms::runOnOperation()':
TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x68): undefined reference to `mlir::LowerToLLVMOptions::LowerToLLVMOptions(mlir::MLIRContext*)'
/usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x80): undefined reference to `mlir::LLVMTypeConverter::LLVMTypeConverter(mlir::MLIRContext*, mlir::LowerToLLVMOptions const&, mlir::DataLayoutAnalysis const*)'
/usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x143): undefined reference to `mlir::populateFuncToLLVMConversionPatterns(mlir::LLVMTypeConverter const&, mlir::RewritePatternSet&, mlir::SymbolTable const*)'
/usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x174): undefined reference to `mlir::LLVMConversionTarget::LLVMConversionTarget(mlir::MLIRContext&)'
collect2: error: ld returned 1 exit status
[6786/7846] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/cmake_pch.hxx.gch
[6787/7846] Linking CXX shared library lib/libMLIRBufferizationTestPasses.so.21.0git
[6788/7846] Linking CXX shared library lib/libMLIRDLTITestPasses.so.21.0git
[6789/7846] Linking CXX shared library lib/libMLIRTestPDLL.so.21.0git
[6790/7846] Linking CXX shared library lib/libMLIRFuncTestPasses.so.21.0git
[6791/7846] Linking CXX shared library lib/libMLIRTestToLLVMIRTranslation.so.21.0git
[6792/7846] Linking CXX shared library lib/libMLIRTestIR.so.21.0git
[6793/7846] Linking CXX shared library lib/libMLIRTestTransforms.so.21.0git
[6794/7846] Linking CXX shared library lib/libMLIRTestAnalysis.so.21.0git
[6795/7846] Linking CXX shared library lib/libMLIRTestFromLLVMIRTranslation.so.21.0git
[6796/7846] Linking CXX shared library lib/libMLIRAffineTransformsTestPasses.so.21.0git
[6797/7846] Linking CXX shared library lib/libMLIRTestFuncToLLVM.so.21.0git
[6798/7846] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/intrinsics.test.dir/intrinsics.cpp.o
[6799/7846] Building InstCombineTables.inc...
[6800/7846] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/folding.test.dir/folding.cpp.o
[6801/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/logical.cpp.o
[6802/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/static-data.cpp.o
[6803/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/common.cpp.o
[6804/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/host.cpp.o
[6805/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/target.cpp.o
[6806/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/complex.cpp.o
[6807/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/integer.cpp.o
[6808/7846] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/expression.test.dir/expression.cpp.o
[6809/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/real.cpp.o
[6810/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/constant.cpp.o
[6811/7846] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.gch
[6812/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/call.cpp.o
[6813/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/type.cpp.o
[6814/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/initial-image.cpp.o
[6815/7846] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/variable.cpp.o

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants