[AMDGPU] Remove redundant s_cmp_lg_* sX, 0 #162352

LU-JOHN · 2025-10-07T19:43:31Z

Remove redundant s_cmp_lg_* sX, 0 if SALU instruction already sets SCC if sX!=0.

llvmbot · 2025-10-07T19:44:04Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: None (LU-JOHN)

Changes

Remove redundant s_cmp_lg_* sX, 0 if SALU instruction already sets SCC if sX!=0.

Patch is 402.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162352.diff

29 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+69-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll (-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll (-2)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll (+243-270)
(modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll (+57-78)
(modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll (+94-116)
(modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll (+225-360)
(modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll (+38-52)
(modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll (+38-52)
(modified) llvm/test/CodeGen/AMDGPU/carryout-selection.ll (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/ctlz_zero_undef.ll (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/ctpop16.ll (-2)
(modified) llvm/test/CodeGen/AMDGPU/expand-scalar-carry-out-select-user.ll (-2)
(modified) llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll (+10-25)
(modified) llvm/test/CodeGen/AMDGPU/fptrunc.f16.ll (+40-88)
(modified) llvm/test/CodeGen/AMDGPU/fptrunc.ll (+13-23)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+30-85)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+24-57)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+24-57)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+30-85)
(modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+9-11)
(modified) llvm/test/CodeGen/AMDGPU/local-atomicrmw-fadd.ll (+56-72)
(modified) llvm/test/CodeGen/AMDGPU/optimize-compare.mir (+1-2)
(added) llvm/test/CodeGen/AMDGPU/s_cmp_0.ll (+558)
(modified) llvm/test/CodeGen/AMDGPU/sdiv64.ll (+67-75)
(modified) llvm/test/CodeGen/AMDGPU/srem64.ll (+93-108)
(modified) llvm/test/CodeGen/AMDGPU/udiv64.ll (+35-39)
(modified) llvm/test/CodeGen/AMDGPU/urem64.ll (+65-75)
(modified) llvm/test/CodeGen/AMDGPU/workitem-intrinsic-opts.ll (-8)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 46757cf5fe90c..6090f84a4cde8 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -10608,6 +10608,73 @@ bool SIInstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, Register SrcReg,
   if (SrcReg2 && !getFoldableImm(SrcReg2, *MRI, CmpValue))
     return false;
 
+  const auto optimizeCmpSelect = [&CmpInstr, SrcReg, CmpValue, MRI,
+                                  this]() -> bool {
+    if (CmpValue != 0)
+      return false;
+
+    MachineInstr *Def = MRI->getUniqueVRegDef(SrcReg);
+    if (!Def || Def->getParent() != CmpInstr.getParent())
+      return false;
+
+    if (!(Def->getOpcode() == AMDGPU::S_LSHL_B32 ||
+          Def->getOpcode() == AMDGPU::S_LSHL_B64 ||
+          Def->getOpcode() == AMDGPU::S_LSHR_B32 ||
+          Def->getOpcode() == AMDGPU::S_LSHR_B64 ||
+          Def->getOpcode() == AMDGPU::S_AND_B32 ||
+          Def->getOpcode() == AMDGPU::S_AND_B64 ||
+          Def->getOpcode() == AMDGPU::S_OR_B32 ||
+          Def->getOpcode() == AMDGPU::S_OR_B64 ||
+          Def->getOpcode() == AMDGPU::S_XOR_B32 ||
+          Def->getOpcode() == AMDGPU::S_XOR_B64 ||
+          Def->getOpcode() == AMDGPU::S_NAND_B32 ||
+          Def->getOpcode() == AMDGPU::S_NAND_B64 ||
+          Def->getOpcode() == AMDGPU::S_NOR_B32 ||
+          Def->getOpcode() == AMDGPU::S_NOR_B64 ||
+          Def->getOpcode() == AMDGPU::S_XNOR_B32 ||
+          Def->getOpcode() == AMDGPU::S_XNOR_B64 ||
+          Def->getOpcode() == AMDGPU::S_ANDN2_B32 ||
+          Def->getOpcode() == AMDGPU::S_ANDN2_B64 ||
+          Def->getOpcode() == AMDGPU::S_ORN2_B32 ||
+          Def->getOpcode() == AMDGPU::S_ORN2_B64 ||
+          Def->getOpcode() == AMDGPU::S_BFE_I32 ||
+          Def->getOpcode() == AMDGPU::S_BFE_I64 ||
+          Def->getOpcode() == AMDGPU::S_BFE_U32 ||
+          Def->getOpcode() == AMDGPU::S_BFE_U64 ||
+          Def->getOpcode() == AMDGPU::S_BCNT0_I32_B32 ||
+          Def->getOpcode() == AMDGPU::S_BCNT0_I32_B64 ||
+          Def->getOpcode() == AMDGPU::S_BCNT1_I32_B32 ||
+          Def->getOpcode() == AMDGPU::S_BCNT1_I32_B64 ||
+          Def->getOpcode() == AMDGPU::S_QUADMASK_B32 ||
+          Def->getOpcode() == AMDGPU::S_QUADMASK_B64 ||
+          Def->getOpcode() == AMDGPU::S_NOT_B32 ||
+          Def->getOpcode() == AMDGPU::S_NOT_B64 ||
+
+          ((Def->getOpcode() == AMDGPU::S_CSELECT_B32 ||
+            Def->getOpcode() == AMDGPU::S_CSELECT_B64) &&
+           Def->getOperand(1).isImm() && Def->getOperand(1).getImm() &&
+           !Def->getOperand(2).isImm() && !Def->getOperand(2).getImm())))
+      return false;
+
+    for (auto I = std::next(Def->getIterator()), E = CmpInstr.getIterator();
+         I != E; ++I) {
+      if (I->modifiesRegister(AMDGPU::SCC, &RI) ||
+          I->killsRegister(AMDGPU::SCC, &RI))
+        return false;
+    }
+
+    if (!(Def->getOpcode() == AMDGPU::S_CSELECT_B32 ||
+          Def->getOpcode() == AMDGPU::S_CSELECT_B64)) {
+      MachineOperand *SccDef =
+          Def->findRegisterDefOperand(AMDGPU::SCC, /*TRI=*/nullptr);
+      assert(SccDef && "Def instruction must define SCC");
+      SccDef->setIsDead(false);
+    }
+
+    CmpInstr.eraseFromParent();
+    return true;
+  };
+
   const auto optimizeCmpAnd = [&CmpInstr, SrcReg, CmpValue, MRI,
                                this](int64_t ExpectedValue, unsigned SrcSize,
                                      bool IsReversible, bool IsSigned) -> bool {
@@ -10735,7 +10802,7 @@ bool SIInstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, Register SrcReg,
   case AMDGPU::S_CMP_LG_I32:
   case AMDGPU::S_CMPK_LG_U32:
   case AMDGPU::S_CMPK_LG_I32:
-    return optimizeCmpAnd(0, 32, true, false);
+    return optimizeCmpAnd(0, 32, true, false) || optimizeCmpSelect();
   case AMDGPU::S_CMP_GT_U32:
   case AMDGPU::S_CMPK_GT_U32:
     return optimizeCmpAnd(0, 32, false, false);
@@ -10743,7 +10810,7 @@ bool SIInstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, Register SrcReg,
   case AMDGPU::S_CMPK_GT_I32:
     return optimizeCmpAnd(0, 32, false, true);
   case AMDGPU::S_CMP_LG_U64:
-    return optimizeCmpAnd(0, 64, true, false);
+    return optimizeCmpAnd(0, 64, true, false) || optimizeCmpSelect();
   }
 
   return false;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll
index 51714035352a3..7714c032d1737 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll
@@ -140,7 +140,6 @@ define amdgpu_cs i32 @branch_divergent_ballot_eq_zero_non_compare(i32 %v) {
 ; CHECK-NEXT:    v_and_b32_e32 v0, 1, v0
 ; CHECK-NEXT:    v_cmp_ne_u32_e32 vcc_lo, 0, v0
 ; CHECK-NEXT:    s_and_b32 s0, vcc_lo, exec_lo
-; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
 ; CHECK-NEXT:    s_cbranch_scc0 .LBB9_2
 ; CHECK-NEXT:  ; %bb.1: ; %false
 ; CHECK-NEXT:    s_mov_b32 s0, 33
@@ -345,7 +344,6 @@ define amdgpu_cs i32 @branch_divergent_ballot_eq_zero_and(i32 %v1, i32 %v2) {
 ; CHECK-NEXT:    v_cmp_gt_u32_e32 vcc_lo, 12, v0
 ; CHECK-NEXT:    v_cmp_lt_u32_e64 s0, 34, v1
 ; CHECK-NEXT:    s_and_b32 s0, vcc_lo, s0
-; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
 ; CHECK-NEXT:    s_cbranch_scc0 .LBB17_2
 ; CHECK-NEXT:  ; %bb.1: ; %false
 ; CHECK-NEXT:    s_mov_b32 s0, 33
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
index 7b01f13b9ef1c..7b8166948610b 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
@@ -143,7 +143,6 @@ define amdgpu_cs i32 @branch_divergent_ballot_eq_zero_non_compare(i32 %v) {
 ; CHECK-NEXT:    v_and_b32_e32 v0, 1, v0
 ; CHECK-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v0
 ; CHECK-NEXT:    s_and_b64 s[0:1], vcc, exec
-; CHECK-NEXT:    s_cmp_lg_u64 s[0:1], 0
 ; CHECK-NEXT:    s_cbranch_scc0 .LBB9_2
 ; CHECK-NEXT:  ; %bb.1: ; %false
 ; CHECK-NEXT:    s_mov_b32 s0, 33
@@ -348,7 +347,6 @@ define amdgpu_cs i32 @branch_divergent_ballot_eq_zero_and(i32 %v1, i32 %v2) {
 ; CHECK-NEXT:    v_cmp_gt_u32_e32 vcc, 12, v0
 ; CHECK-NEXT:    v_cmp_lt_u32_e64 s[0:1], 34, v1
 ; CHECK-NEXT:    s_and_b64 s[0:1], vcc, s[0:1]
-; CHECK-NEXT:    s_cmp_lg_u64 s[0:1], 0
 ; CHECK-NEXT:    s_cbranch_scc0 .LBB17_2
 ; CHECK-NEXT:  ; %bb.1: ; %false
 ; CHECK-NEXT:    s_mov_b32 s0, 33
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
index e27164c2d6d69..262bb24e089da 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
@@ -7831,10 +7831,9 @@ define amdgpu_kernel void @sdiv_i64_pow2_shl_denom(ptr addrspace(1) %out, i64 %x
 ; GFX6-NEXT:    s_addc_u32 s15, 0, s16
 ; GFX6-NEXT:    s_add_u32 s16, s0, s1
 ; GFX6-NEXT:    v_mov_b32_e32 v0, s16
-; GFX6-NEXT:    s_cselect_b64 s[0:1], -1, 0
 ; GFX6-NEXT:    v_mul_hi_u32 v0, s12, v0
+; GFX6-NEXT:    s_cselect_b64 s[0:1], -1, 0
 ; GFX6-NEXT:    s_or_b32 s0, s0, s1
-; GFX6-NEXT:    s_cmp_lg_u32 s0, 0
 ; GFX6-NEXT:    s_addc_u32 s14, s14, s15
 ; GFX6-NEXT:    s_mul_i32 s0, s12, s14
 ; GFX6-NEXT:    v_readfirstlane_b32 s1, v0
@@ -7865,7 +7864,6 @@ define amdgpu_kernel void @sdiv_i64_pow2_shl_denom(ptr addrspace(1) %out, i64 %x
 ; GFX6-NEXT:    s_add_u32 s15, s16, s0
 ; GFX6-NEXT:    s_cselect_b64 s[0:1], -1, 0
 ; GFX6-NEXT:    s_or_b32 s0, s0, s1
-; GFX6-NEXT:    s_cmp_lg_u32 s0, 0
 ; GFX6-NEXT:    s_addc_u32 s14, s14, s12
 ; GFX6-NEXT:    s_ashr_i32 s12, s7, 31
 ; GFX6-NEXT:    s_add_u32 s0, s6, s12
@@ -7891,52 +7889,50 @@ define amdgpu_kernel void @sdiv_i64_pow2_shl_denom(ptr addrspace(1) %out, i64 %x
 ; GFX6-NEXT:    v_readfirstlane_b32 s4, v0
 ; GFX6-NEXT:    s_addc_u32 s4, s4, 0
 ; GFX6-NEXT:    s_mul_i32 s14, s7, s14
-; GFX6-NEXT:    s_add_u32 s14, s1, s14
-; GFX6-NEXT:    v_mov_b32_e32 v0, s14
+; GFX6-NEXT:    s_add_u32 s16, s1, s14
+; GFX6-NEXT:    v_mov_b32_e32 v0, s16
 ; GFX6-NEXT:    v_mul_hi_u32 v0, s10, v0
-; GFX6-NEXT:    s_addc_u32 s15, 0, s4
+; GFX6-NEXT:    s_addc_u32 s17, 0, s4
 ; GFX6-NEXT:    s_mov_b32 s1, s5
-; GFX6-NEXT:    s_mul_i32 s4, s10, s15
+; GFX6-NEXT:    s_mul_i32 s4, s10, s17
 ; GFX6-NEXT:    v_readfirstlane_b32 s5, v0
 ; GFX6-NEXT:    s_add_i32 s4, s5, s4
-; GFX6-NEXT:    s_mul_i32 s5, s11, s14
-; GFX6-NEXT:    s_add_i32 s16, s4, s5
-; GFX6-NEXT:    s_sub_i32 s17, s7, s16
-; GFX6-NEXT:    s_mul_i32 s4, s10, s14
+; GFX6-NEXT:    s_mul_i32 s5, s11, s16
+; GFX6-NEXT:    s_add_i32 s18, s4, s5
+; GFX6-NEXT:    s_sub_i32 s14, s7, s18
+; GFX6-NEXT:    s_mul_i32 s4, s10, s16
 ; GFX6-NEXT:    s_sub_u32 s6, s6, s4
 ; GFX6-NEXT:    s_cselect_b64 s[4:5], -1, 0
-; GFX6-NEXT:    s_or_b32 s18, s4, s5
-; GFX6-NEXT:    s_cmp_lg_u32 s18, 0
-; GFX6-NEXT:    s_subb_u32 s17, s17, s11
-; GFX6-NEXT:    s_sub_u32 s19, s6, s10
-; GFX6-NEXT:    s_cselect_b64 s[4:5], -1, 0
+; GFX6-NEXT:    s_or_b32 s15, s4, s5
+; GFX6-NEXT:    s_subb_u32 s19, s14, s11
+; GFX6-NEXT:    s_sub_u32 s20, s6, s10
+; GFX6-NEXT:    s_cselect_b64 s[14:15], -1, 0
+; GFX6-NEXT:    s_or_b32 s14, s14, s15
+; GFX6-NEXT:    s_subb_u32 s14, s19, 0
+; GFX6-NEXT:    s_cmp_ge_u32 s14, s11
+; GFX6-NEXT:    s_cselect_b32 s15, -1, 0
+; GFX6-NEXT:    s_cmp_ge_u32 s20, s10
+; GFX6-NEXT:    s_cselect_b32 s19, -1, 0
+; GFX6-NEXT:    s_cmp_eq_u32 s14, s11
+; GFX6-NEXT:    s_cselect_b32 s14, s19, s15
+; GFX6-NEXT:    s_add_u32 s15, s16, 1
+; GFX6-NEXT:    s_addc_u32 s19, s17, 0
+; GFX6-NEXT:    s_add_u32 s20, s16, 2
+; GFX6-NEXT:    s_addc_u32 s21, s17, 0
+; GFX6-NEXT:    s_cmp_lg_u32 s14, 0
+; GFX6-NEXT:    s_cselect_b32 s14, s20, s15
+; GFX6-NEXT:    s_cselect_b32 s15, s21, s19
 ; GFX6-NEXT:    s_or_b32 s4, s4, s5
-; GFX6-NEXT:    s_cmp_lg_u32 s4, 0
-; GFX6-NEXT:    s_subb_u32 s4, s17, 0
+; GFX6-NEXT:    s_subb_u32 s4, s7, s18
 ; GFX6-NEXT:    s_cmp_ge_u32 s4, s11
 ; GFX6-NEXT:    s_cselect_b32 s5, -1, 0
-; GFX6-NEXT:    s_cmp_ge_u32 s19, s10
-; GFX6-NEXT:    s_cselect_b32 s17, -1, 0
-; GFX6-NEXT:    s_cmp_eq_u32 s4, s11
-; GFX6-NEXT:    s_cselect_b32 s4, s17, s5
-; GFX6-NEXT:    s_add_u32 s5, s14, 1
-; GFX6-NEXT:    s_addc_u32 s17, s15, 0
-; GFX6-NEXT:    s_add_u32 s19, s14, 2
-; GFX6-NEXT:    s_addc_u32 s20, s15, 0
-; GFX6-NEXT:    s_cmp_lg_u32 s4, 0
-; GFX6-NEXT:    s_cselect_b32 s4, s19, s5
-; GFX6-NEXT:    s_cselect_b32 s5, s20, s17
-; GFX6-NEXT:    s_cmp_lg_u32 s18, 0
-; GFX6-NEXT:    s_subb_u32 s7, s7, s16
-; GFX6-NEXT:    s_cmp_ge_u32 s7, s11
-; GFX6-NEXT:    s_cselect_b32 s16, -1, 0
 ; GFX6-NEXT:    s_cmp_ge_u32 s6, s10
 ; GFX6-NEXT:    s_cselect_b32 s6, -1, 0
-; GFX6-NEXT:    s_cmp_eq_u32 s7, s11
-; GFX6-NEXT:    s_cselect_b32 s6, s6, s16
-; GFX6-NEXT:    s_cmp_lg_u32 s6, 0
-; GFX6-NEXT:    s_cselect_b32 s5, s5, s15
-; GFX6-NEXT:    s_cselect_b32 s4, s4, s14
+; GFX6-NEXT:    s_cmp_eq_u32 s4, s11
+; GFX6-NEXT:    s_cselect_b32 s4, s6, s5
+; GFX6-NEXT:    s_cmp_lg_u32 s4, 0
+; GFX6-NEXT:    s_cselect_b32 s5, s15, s17
+; GFX6-NEXT:    s_cselect_b32 s4, s14, s16
 ; GFX6-NEXT:    s_xor_b64 s[6:7], s[12:13], s[8:9]
 ; GFX6-NEXT:    s_xor_b64 s[4:5], s[4:5], s[6:7]
 ; GFX6-NEXT:    s_sub_u32 s4, s4, s6
@@ -8338,10 +8334,9 @@ define amdgpu_kernel void @sdiv_v2i64_pow2_shl_denom(ptr addrspace(1) %out, <2 x
 ; GFX6-NEXT:    s_addc_u32 s17, 0, s18
 ; GFX6-NEXT:    s_add_u32 s18, s12, s13
 ; GFX6-NEXT:    v_mov_b32_e32 v0, s18
-; GFX6-NEXT:    s_cselect_b64 s[12:13], -1, 0
 ; GFX6-NEXT:    v_mul_hi_u32 v0, s14, v0
+; GFX6-NEXT:    s_cselect_b64 s[12:13], -1, 0
 ; GFX6-NEXT:    s_or_b32 s12, s12, s13
-; GFX6-NEXT:    s_cmp_lg_u32 s12, 0
 ; GFX6-NEXT:    s_addc_u32 s16, s16, s17
 ; GFX6-NEXT:    s_mul_i32 s12, s14, s16
 ; GFX6-NEXT:    v_readfirstlane_b32 s13, v0
@@ -8372,7 +8367,6 @@ define amdgpu_kernel void @sdiv_v2i64_pow2_shl_denom(ptr addrspace(1) %out, <2 x
 ; GFX6-NEXT:    s_add_u32 s15, s18, s12
 ; GFX6-NEXT:    s_cselect_b64 s[12:13], -1, 0
 ; GFX6-NEXT:    s_or_b32 s12, s12, s13
-; GFX6-NEXT:    s_cmp_lg_u32 s12, 0
 ; GFX6-NEXT:    s_addc_u32 s14, s16, s14
 ; GFX6-NEXT:    s_ashr_i32 s12, s9, 31
 ; GFX6-NEXT:    s_add_u32 s8, s8, s12
@@ -8397,55 +8391,53 @@ define amdgpu_kernel void @sdiv_v2i64_pow2_shl_denom(ptr addrspace(1) %out, <2 x
 ; GFX6-NEXT:    v_readfirstlane_b32 s16, v0
 ; GFX6-NEXT:    s_addc_u32 s16, s16, 0
 ; GFX6-NEXT:    s_mul_i32 s14, s9, s14
-; GFX6-NEXT:    s_add_u32 s17, s15, s14
-; GFX6-NEXT:    v_mov_b32_e32 v0, s17
+; GFX6-NEXT:    s_add_u32 s18, s15, s14
+; GFX6-NEXT:    v_mov_b32_e32 v0, s18
 ; GFX6-NEXT:    v_mul_hi_u32 v0, s6, v0
-; GFX6-NEXT:    s_addc_u32 s16, 0, s16
-; GFX6-NEXT:    s_mul_i32 s14, s6, s16
+; GFX6-NEXT:    s_addc_u32 s19, 0, s16
+; GFX6-NEXT:    s_mul_i32 s14, s6, s19
 ; GFX6-NEXT:    v_readfirstlane_b32 s15, v0
 ; GFX6-NEXT:    s_add_i32 s14, s15, s14
-; GFX6-NEXT:    s_mul_i32 s15, s7, s17
-; GFX6-NEXT:    s_add_i32 s18, s14, s15
-; GFX6-NEXT:    s_sub_i32 s19, s9, s18
-; GFX6-NEXT:    s_mul_i32 s14, s6, s17
+; GFX6-NEXT:    s_mul_i32 s15, s7, s18
+; GFX6-NEXT:    s_add_i32 s20, s14, s15
+; GFX6-NEXT:    s_sub_i32 s16, s9, s20
+; GFX6-NEXT:    s_mul_i32 s14, s6, s18
 ; GFX6-NEXT:    s_sub_u32 s8, s8, s14
 ; GFX6-NEXT:    s_cselect_b64 s[14:15], -1, 0
-; GFX6-NEXT:    s_or_b32 s20, s14, s15
-; GFX6-NEXT:    s_cmp_lg_u32 s20, 0
-; GFX6-NEXT:    s_subb_u32 s19, s19, s7
-; GFX6-NEXT:    s_sub_u32 s21, s8, s6
-; GFX6-NEXT:    s_cselect_b64 s[14:15], -1, 0
+; GFX6-NEXT:    s_or_b32 s17, s14, s15
+; GFX6-NEXT:    s_subb_u32 s21, s16, s7
+; GFX6-NEXT:    s_sub_u32 s22, s8, s6
+; GFX6-NEXT:    s_cselect_b64 s[16:17], -1, 0
+; GFX6-NEXT:    s_or_b32 s16, s16, s17
+; GFX6-NEXT:    s_subb_u32 s16, s21, 0
+; GFX6-NEXT:    s_cmp_ge_u32 s16, s7
+; GFX6-NEXT:    s_cselect_b32 s17, -1, 0
+; GFX6-NEXT:    s_cmp_ge_u32 s22, s6
+; GFX6-NEXT:    s_cselect_b32 s21, -1, 0
+; GFX6-NEXT:    s_cmp_eq_u32 s16, s7
+; GFX6-NEXT:    s_cselect_b32 s16, s21, s17
+; GFX6-NEXT:    s_add_u32 s17, s18, 1
+; GFX6-NEXT:    s_addc_u32 s21, s19, 0
+; GFX6-NEXT:    s_add_u32 s22, s18, 2
+; GFX6-NEXT:    s_addc_u32 s23, s19, 0
+; GFX6-NEXT:    s_cmp_lg_u32 s16, 0
+; GFX6-NEXT:    s_cselect_b32 s16, s22, s17
+; GFX6-NEXT:    s_cselect_b32 s17, s23, s21
 ; GFX6-NEXT:    s_or_b32 s14, s14, s15
-; GFX6-NEXT:    s_cmp_lg_u32 s14, 0
-; GFX6-NEXT:    s_subb_u32 s14, s19, 0
-; GFX6-NEXT:    s_cmp_ge_u32 s14, s7
-; GFX6-NEXT:    s_cselect_b32 s15, -1, 0
-; GFX6-NEXT:    s_cmp_ge_u32 s21, s6
-; GFX6-NEXT:    s_cselect_b32 s19, -1, 0
-; GFX6-NEXT:    s_cmp_eq_u32 s14, s7
-; GFX6-NEXT:    s_cselect_b32 s14, s19, s15
-; GFX6-NEXT:    s_add_u32 s15, s17, 1
-; GFX6-NEXT:    s_addc_u32 s19, s16, 0
-; GFX6-NEXT:    s_add_u32 s21, s17, 2
-; GFX6-NEXT:    s_addc_u32 s22, s16, 0
-; GFX6-NEXT:    s_cmp_lg_u32 s14, 0
-; GFX6-NEXT:    s_cselect_b32 s14, s21, s15
-; GFX6-NEXT:    s_cselect_b32 s15, s22, s19
-; GFX6-NEXT:    s_cmp_lg_u32 s20, 0
-; GFX6-NEXT:    s_subb_u32 s9, s9, s18
+; GFX6-NEXT:    s_subb_u32 s9, s9, s20
 ; GFX6-NEXT:    s_cmp_ge_u32 s9, s7
-; GFX6-NEXT:    s_cselect_b32 s18, -1, 0
+; GFX6-NEXT:    s_cselect_b32 s14, -1, 0
 ; GFX6-NEXT:    s_cmp_ge_u32 s8, s6
 ; GFX6-NEXT:    s_cselect_b32 s6, -1, 0
 ; GFX6-NEXT:    s_cmp_eq_u32 s9, s7
-; GFX6-NEXT:    s_cselect_b32 s6, s6, s18
+; GFX6-NEXT:    s_cselect_b32 s6, s6, s14
 ; GFX6-NEXT:    s_cmp_lg_u32 s6, 0
-; GFX6-NEXT:    s_cselect_b32 s7, s15, s16
-; GFX6-NEXT:    s_cselect_b32 s6, s14, s17
+; GFX6-NEXT:    s_cselect_b32 s7, s17, s19
+; GFX6-NEXT:    s_cselect_b32 s6, s16, s18
 ; GFX6-NEXT:    s_xor_b64 s[2:3], s[12:13], s[2:3]
 ; GFX6-NEXT:    s_xor_b64 s[6:7], s[6:7], s[2:3]
-; GFX6-NEXT:    s_sub_u32 s14, s6, s2
-; GFX6-NEXT:    s_subb_u32 s15, s7, s3
+; GFX6-NEXT:    s_sub_u32 s16, s6, s2
+; GFX6-NEXT:    s_subb_u32 s17, s7, s3
 ; GFX6-NEXT:    s_ashr_i32 s6, s1, 31
 ; GFX6-NEXT:    s_add_u32 s0, s0, s6
 ; GFX6-NEXT:    s_mov_b32 s7, s6
@@ -8464,40 +8456,39 @@ define amdgpu_kernel void @sdiv_v2i64_pow2_shl_denom(ptr addrspace(1) %out, <2 x
 ; GFX6-NEXT:    v_cvt_u32_f32_e32 v0, v0
 ; GFX6-NEXT:    v_cvt_u32_f32_e32 v1, v1
 ; GFX6-NEXT:    v_mul_hi_u32 v2, s12, v0
-; GFX6-NEXT:    v_readfirstlane_b32 s16, v1
+; GFX6-NEXT:    v_readfirstlane_b32 s14, v1
 ; GFX6-NEXT:    v_readfirstlane_b32 s2, v0
-; GFX6-NEXT:    s_mul_i32 s1, s12, s16
+; GFX6-NEXT:    s_mul_i32 s1, s12, s14
 ; GFX6-NEXT:    v_readfirstlane_b32 s3, v2
 ; GFX6-NEXT:    s_mul_i32 s0, s13, s2
 ; GFX6-NEXT:    s_add_i32 s1, s3, s1
 ; GFX6-NEXT:    s_add_i32 s3, s1, s0
-; GFX6-NEXT:    s_mul_i32 s17, s12, s2
+; GFX6-NEXT:    s_mul_i32 s15, s12, s2
 ; GFX6-NEXT:    v_mul_hi_u32 v2, v0, s3
-; GFX6-NEXT:    v_mul_hi_u32 v0, v0, s17
+; GFX6-NEXT:    v_mul_hi_u32 v0, v0, s15
 ; GFX6-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
 ; GFX6-NEXT:    s_mul_i32 s4, s2, s3
 ; GFX6-NEXT:    v_readfirstlane_b32 s5, v2
 ; GFX6-NEXT:    v_readfirstlane_b32 s18, v0
-; GFX6-NEXT:    v_mul_hi_u32 v0, v1, s17
+; GFX6-NEXT:    v_mul_hi_u32 v0, v1, s15
 ; GFX6-NEXT:    v_mul_hi_u32 v1, v1, s3
 ; GFX6-NEXT:    s_add_u32 s4, s18, s4
 ; GFX6-NEXT:    s_addc_u32 s5, 0, s5
-; GFX6-NEXT:    s_mul_i32 s17, s16, s17
+; GFX6-NEXT:    s_mul_i32 s15, s14, s15
 ; GFX6-NEXT:    v_readfirstlane_b32 s18, v0
-; GFX6-NEXT:    s_add_u32 s4, s4, s17
+; GFX6-NEXT:    s_add_u32 s4, s4, s15
 ; GFX6-NEXT:    s_addc_u32 s4, s5, s18
 ; GFX6-NEXT:    v_readfirstlane_b32 s5, v1
 ; GFX6-NEXT:    s_addc_u32 s5, s5, 0
-; GFX6-NEXT:    s_mul_i32 s3, s16, s3
+; GFX6-NEXT:    s_mul_i32 s3, s14, s3
 ; GFX6-NEXT:    s_add_u32 s3, s4, s3
 ; GFX6-NEXT:    s_addc_u32 s4, 0, s5
 ; GFX6-NEXT:    s_add_u32 s5, s2, s3
 ; GFX6-NEXT:    v_mov_b32_e32 v0, s5
-; GFX6-NEXT:    s_cselect_b64 s[2:3], -1, 0
 ; GFX6-NEXT:    v_mul_hi_u32 v0, s12, v0
+; GFX6-NEXT:    s_cselect_b64 s[2:3], -1, 0
 ; GFX6-NEXT:    s_or_b32 s2, s2, s3
-; GFX6-NEXT:    s_cmp_lg_u32 s2, 0
-; GFX6-NEXT:    s_addc_u32 s4, s16, s4
+; GFX6-NEXT:    s_addc_u32 s4, s14, s4
 ; GFX6-NEXT:    s_mul_i32 s2, s12, s4
 ; GFX6-NEXT:    v_readfirstlane_b32 s3, v0
 ; GFX6-NEXT:    s_add_i32 s2, s3, s2
@@ -8511,14 +8502,14 @@ define amdgpu_kernel void @sdiv_v2i64_pow2_shl_denom(ptr addrspace(1) %out, <2 x
 ; GFX6-NEXT:    v_mul_hi_u32 v1, s4, v0
 ; GFX6-NEXT:    v_mul_hi_u32 v0, s5, v0
 ; GFX6-NEXT:    s_mul_i32 s13, s5, s2
-; GFX6-NEXT:    v_readfirstlane_b32 s17, v2
-; GFX6-NEXT:    s_add_u32 s13, s17, s13
-; GFX6-NEXT:    v_readfirstlane_b32 s16, v0
+; GFX6-NEXT:    v_readfirstlane_b32 s15, v2
+; GFX6-NEXT:    s_add_u32 s13, s15, s13
+; GFX6-NEXT:    v_readfirstlane_b32 s14, v0
 ; GFX6-NEXT:    s_mul_i32 s3, s4, s3
-; GFX6-NEXT:    s_addc_u32 s16, 0, s16
+; GFX6-NEXT:    s_addc_u32 s14, 0, s14
 ; GFX6-NEXT:    v_readfirstlane_b32 s12, v3
 ; GFX6-NEXT:    s_add_u32 s3, s13, s3
-; GFX6-NEXT:    s_addc_u32 s3, s16, s12
+; GFX6-NEXT:    s_addc_u32 s3, s14, s12
 ; GFX6-NEXT:    v_readfirstlane_b32 s12, v1
 ; GFX6-NEXT:    s_addc_u32 s12, s12, 0
 ; GFX6-NEXT:    s_mul_i32 s2, s4, s2
@@ -8527,7 +8518,6 @@ define amdgpu_kernel void @sdiv_v2i64_pow2_shl_denom(ptr addrspace(1) %out, <2 x
 ; GFX6-NEXT:    s_add_u32 s13, s5, s2
 ; GFX6-NEXT:    s_cselect_b64 s[2:3], -1, 0
 ; GFX6-NEXT:    s_or_b32 s2, s2, s3
-; GFX6-NEXT:    s_cmp_lg_u32 s2, 0
 ; GFX6-NEXT:    s_addc_u32 s12, s4, s12
 ; GFX6-NEXT:    s_ashr_i32 s4, s11, 31
 ; GFX6-NEXT:    s_add_u32 s2, s10, s4
@@ -8539,72 +8529,70 @@ define amdgpu_kernel void @sdiv_v2i64_pow2_shl_denom(ptr addrspace(1) %out, <2 x
 ; GFX6-NEXT:    v_mov_b32_e32 v2, s13
 ; GFX6-NEXT:    v_mul_hi_u32 v3, s10, v2
 ; GFX6-NEXT:    s_mul_i32 s2, s10, s12
-; GFX6-NEXT:    v_readfirstlane_b32 s16, v1
+; GFX6-NEXT:    v_readfirstlane_b32 s14, v1
 ; GFX6-NEXT:    v_mul_hi_u32 v1, s11, v2
-; GFX6-NEXT:    v_readfirstlane_b32 s17, v3
+; GFX6-NEXT:    v_readfirstlane_b32 s15, v3
 ; GFX6-NEXT:    v_mul_hi_u32 v0, s11, v0
-; GFX6-NEXT:    s_add_u32 s2, s17, s2
-; GFX6-NEXT:    s_addc_u32 s16, 0, s16
+; GFX6-NEXT:    s_add_u32 s2, s15, s2
+; GFX6-NEXT:    s_addc_u32 s14, 0, s14
 ; GFX6-NEXT:    s_mul_i32 s13, s11, s13
-; GFX6-NEXT:    v_readfirstlane_b32 s17, v1
+; GFX6-NEXT:    v_readfirstlane_b32 s15, v1
 ; GFX6-NEXT:    s_add_u32 s2, s2, s13
-; GFX6-NEXT:    s_addc_u32 s2, s16, s17
+; GFX6-NEXT:    s_addc_u32 s2, s14, s15
 ; GFX6-NEXT:    v_readfirstlane_b32 s13, v0
 ; GFX6-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX6-NEXT:    s_mul_i32 s12, s11, s12
-; GFX6-NEXT:    s_add_u32 s16, s2, s12
-; GFX6-NEXT:    v_mov_b32...
[truncated]

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Signed-off-by: John Lu <John.Lu@amd.com>

jayfoad

Question for future improvement: is there any way to extend this to handle s_cmp_eq_* sX, 0? This would invert scc, but maybe you could adjust all users to account for that? Or maybe there is some way to get codegen to prefer _lg_ comparisons in the first place? Do other targets have a way to handle this?

jayfoad · 2025-10-15T09:33:06Z

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

+    for (MachineInstr &MI :
+         make_range(std::next(Def->getIterator()), CmpInstr.getIterator())) {
+      if (MI.modifiesRegister(AMDGPU::SCC, &RI) ||
+          MI.killsRegister(AMDGPU::SCC, &RI))


As an alternative could you ignore kill flags here, and then later remove them if you proceed with the optimization, at the same time as you call setIsDead(false) on the scc def?

Kill flag no longer blocks optimization for both compare optimizations. Kill handling for both compare optimizations tested in optimize-compare.mir.

jayfoad · 2025-10-15T09:34:09Z

llvm/test/CodeGen/AMDGPU/addsub64_carry.ll

-; CHECK-NEXT:    s_cselect_b64 s[6:7], -1, 0
-; CHECK-NEXT:    s_cmp_lg_u64 s[6:7], 0


This is a nice improvement. But I really wish we could generate better code in the first place, instead of generating horrible code and cleaning it up later.

Our lowering for carryout puts it into a virtual 32/64-bit SGPR. SCC is reconstructed from the virtual 32/64-bit SGPR as needed by users of the carryout. If we lowered directly into SCC we could have problems with other definitions of SCC clobbering the carryout. We could/did somewhat avoid this problem by lowering 64-bit adds as a unit. The internal carryout for the low 32-bit part could be generated cleanly, but the carryout for the high 32-bits was a problem. Note that issue #152992 asked for good code for 64-bit carryout.

Signed-off-by: John Lu <John.Lu@amd.com>

LU-JOHN · 2025-10-15T14:42:25Z

Question for future improvement: is there any way to extend this to handle s_cmp_eq_* sX, 0? This would invert scc, but maybe you could adjust all users to account for that? Or maybe there is some way to get codegen to prefer _lg_ comparisons in the first place? Do other targets have a way to handle this?

I would guess that s_cmp_eq_* is frequently (perhaps almost always) followed by a single use by s_cbranch_* which can easily be reversed. Will investigate in a subsequent PR.

jayfoad · 2025-10-15T15:28:41Z

I would guess that s_cmp_eq_* is frequently (perhaps almost always) followed by a single use by s_cbranch_* which can easily be reversed.

Right, or an s_cselect where you can easily swap the operands.

jayfoad

LGTM, thanks.

llvm-ci · 2025-10-18T14:45:17Z

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 10 "Add check check-libc-amdgcn-amd-amdhsa".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/15674

Here is the relevant piece of the build log for the reference

Step 10 (Add check check-libc-amdgcn-amd-amdhsa) failure: test (failure)
...
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcVASPrintfTest.ManyReAlloc
[       OK ] LlvmLibcVASPrintfTest.ManyReAlloc (867 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[3158/3299] Running hermetic test libc.test.src.stdlib.lldiv_test.__hermetic__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcDivTest.SimpleTestlldiv_t
[       OK ] LlvmLibcDivTest.SimpleTestlldiv_t (9 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[3159/3299] Running hermetic test libc.test.src.stdlib.heap_sort_test.__hermetic__
FAILED: libc/test/src/stdlib/libc.test.src.stdlib.heap_sort_test.__hermetic__.__cmd__ /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-amdgcn-amd-amdhsa-bins/libc/test/src/stdlib/libc.test.src.stdlib.heap_sort_test.__hermetic__.__cmd__ 
cd /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-amdgcn-amd-amdhsa-bins/libc/test/src/stdlib && /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/amdhsa-loader /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-amdgcn-amd-amdhsa-bins/libc/test/src/stdlib/libc.test.src.stdlib.heap_sort_test.__hermetic__.__build__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcHeapSortTest.DifferentElemSizeArray
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/test/src/stdlib/SortingTest.h:345: FAILURE
      Expected: buf_val
      Which is: 42
To be equal to: expected_elem_val
      Which is: 0
elem_size: 1 buf_i: 0
[  FAILED  ] LlvmLibcHeapSortTest.DifferentElemSizeArray
Ran 1 tests.  PASS: 0  FAIL: 1
[3160/3299] Running hermetic test libc.test.src.stdlib.bsearch_test.__hermetic__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcBsearchTest.SameKeyAndArray
[       OK ] LlvmLibcBsearchTest.SameKeyAndArray (9 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[3161/3299] Running hermetic test libc.test.src.stdlib.qsort_r_test.__hermetic__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcQsortRTest.SafeTypeErasure
[       OK ] LlvmLibcQsortRTest.SafeTypeErasure (43 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[3162/3299] Linking CXX executable libc/test/src/inttypes/libc.test.src.inttypes.strtoumax_test.__hermetic__.__build__
[3163/3299] Linking CXX executable libc/test/src/inttypes/libc.test.src.inttypes.strtoimax_test.__hermetic__.__build__
[3164/3299] Linking CXX executable libc/test/shared/libc.test.shared.shared_math_test.__hermetic__.__build__
[3165/3299] Running hermetic test libc.test.src.stdlib.quick_sort_test.__hermetic__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcQsortTest.DifferentElemSizeArray
[       OK ] LlvmLibcQsortTest.DifferentElemSizeArray (379 ms)
Ran 1 tests.  PASS: 1  FAIL: 0
[3166/3299] Linking CXX executable libc/test/src/stdlib/libc.test.src.stdlib.strtof_test.__hermetic__.__build__
[3167/3299] Linking CXX executable libc/test/src/time/libc.test.src.time.strftime_test.__hermetic__.__build__
[3168/3299] Linking CXX executable libc/test/src/math/smoke/libc.test.src.math.smoke.fromfpf16_test.__hermetic__.__build__
[3169/3299] Linking CXX executable libc/test/src/math/smoke/libc.test.src.math.smoke.fromfpx_test.__hermetic__.__build__
[3170/3299] Linking CXX executable libc/test/src/math/smoke/libc.test.src.math.smoke.ufromfpf16_test.__hermetic__.__build__
[3171/3299] Linking CXX executable libc/test/src/math/smoke/libc.test.src.math.smoke.ufromfpxf16_test.__hermetic__.__build__
[3172/3299] Linking CXX executable libc/test/src/math/smoke/libc.test.src.math.smoke.ufromfpxl_test.__hermetic__.__build__
[3173/3299] Linking CXX executable libc/test/src/math/smoke/libc.test.src.math.smoke.ufromfpl_test.__hermetic__.__build__
[3174/3299] Linking CXX executable libc/test/src/math/smoke/libc.test.src.math.smoke.ufromfp_test.__hermetic__.__build__

jplehr · 2025-10-18T18:44:59Z

Hi @LU-JOHN, this broke one of our buildbots, and reverting this brings the tests back to working.
Can you look into the breakage?

You can reproduce the issue (at least on gfx90a) with the following command

cd llvm-project
cmake -S llvm -B thebuild -C offload/cmake/caches/AMDGPULibcBot.cmake -GNinja
cd thebuild
ninja
ninja check-libc-amdgcn-amd-amdhsa

This reverts commit 8e5f6dd.

Reverts #162352 Broke our buildbot: https://lab.llvm.org/buildbot/#/builders/10/builds/15674 To reproduce cd llvm-project cmake -S llvm -B thebuild -C offload/cmake/caches/AMDGPULibcBot.cmake -GNinja cd thebuild ninja ninja check-libc-amdgcn-amd-amdhsa

…116) Reverts llvm/llvm-project#162352 Broke our buildbot: https://lab.llvm.org/buildbot/#/builders/10/builds/15674 To reproduce cd llvm-project cmake -S llvm -B thebuild -C offload/cmake/caches/AMDGPULibcBot.cmake -GNinja cd thebuild ninja ninja check-libc-amdgcn-amd-amdhsa

Reverts llvm#162352 Broke our buildbot: https://lab.llvm.org/buildbot/#/builders/10/builds/15674 To reproduce cd llvm-project cmake -S llvm -B thebuild -C offload/cmake/caches/AMDGPULibcBot.cmake -GNinja cd thebuild ninja ninja check-libc-amdgcn-amd-amdhsa

Reland PR #162352. Fix by excluding SI_PC_ADD_REL_OFFSET from instructions that set SCC = DST!=0. Passes check-libc-amdgcn-amd-amdhsa now. Distribution of instructions that allowed a redundant S_CMP to be deleted in check-libc-amdgcn-amd-amdhsa test: ``` S_AND_B32 485 S_AND_B64 47 S_ANDN2_B32 42 S_ANDN2_B64 277492 S_CSELECT_B64 17631 S_LSHL_B32 6 S_OR_B64 11 ``` --------- Signed-off-by: John Lu <John.Lu@amd.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>

Reland PR llvm/llvm-project#162352. Fix by excluding SI_PC_ADD_REL_OFFSET from instructions that set SCC = DST!=0. Passes check-libc-amdgcn-amd-amdhsa now. Distribution of instructions that allowed a redundant S_CMP to be deleted in check-libc-amdgcn-amd-amdhsa test: ``` S_AND_B32 485 S_AND_B64 47 S_ANDN2_B32 42 S_ANDN2_B64 277492 S_CSELECT_B64 17631 S_LSHL_B32 6 S_OR_B64 11 ``` --------- Signed-off-by: John Lu <John.Lu@amd.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>

Reland PR llvm#162352. Fix by excluding SI_PC_ADD_REL_OFFSET from instructions that set SCC = DST!=0. Passes check-libc-amdgcn-amd-amdhsa now. Distribution of instructions that allowed a redundant S_CMP to be deleted in check-libc-amdgcn-amd-amdhsa test: ``` S_AND_B32 485 S_AND_B64 47 S_ANDN2_B32 42 S_ANDN2_B64 277492 S_CSELECT_B64 17631 S_LSHL_B32 6 S_OR_B64 11 ``` --------- Signed-off-by: John Lu <John.Lu@amd.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>

llvmbot added backend:AMDGPU llvm:globalisel labels Oct 7, 2025

LU-JOHN mentioned this pull request Oct 7, 2025

[AMDGPU][NFC] Pre-commit test for redundant s_cmp_lg_* sX, 0 removal #162351

Merged

arsenm reviewed Oct 8, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Show resolved Hide resolved

jmmartinez reviewed Oct 8, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

jmmartinez reviewed Oct 9, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

LU-JOHN force-pushed the remove_s_cmp branch from 1f57ce2 to 39380bb Compare October 9, 2025 16:49

arsenm reviewed Oct 10, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

LU-JOHN force-pushed the remove_s_cmp branch from cae0fb8 to 54f952c Compare October 10, 2025 14:25

LU-JOHN added 9 commits October 10, 2025 16:47

Pre-commit test for redundant s_cmp sX, 0 removal

c5a15e5

Signed-off-by: John Lu <John.Lu@amd.com>

Simplify extra use

250302a

Signed-off-by: John Lu <John.Lu@amd.com>

Delete s_cmp sX, 0 if it is redundant

613a06d

Signed-off-by: John Lu <John.Lu@amd.com>

Streamline code and handle more opcodes

68cbb6e

Signed-off-by: John Lu <John.Lu@amd.com>

Update rebased test results

b4628ba

Signed-off-by: John Lu <John.Lu@amd.com>

Fix typo blocking S_CSELECT* handling

a2e6ad4

Signed-off-by: John Lu <John.Lu@amd.com>

Streamline condition

7b84872

Signed-off-by: John Lu <John.Lu@amd.com>

Use make_range

88fdd67

Signed-off-by: John Lu <John.Lu@amd.com>

Use setsSCCifResultIsNonZero

13d73b2

Signed-off-by: John Lu <John.Lu@amd.com>

LU-JOHN force-pushed the remove_s_cmp branch from fba3ee2 to 13d73b2 Compare October 10, 2025 21:59

LU-JOHN requested review from arsenm and jmmartinez October 14, 2025 15:09

jayfoad reviewed Oct 15, 2025

View reviewed changes

Do not SCC kill to block optimization

2682020

Signed-off-by: John Lu <John.Lu@amd.com>

LU-JOHN requested a review from jayfoad October 15, 2025 14:43

jayfoad approved these changes Oct 17, 2025

View reviewed changes

LU-JOHN merged commit 8e5f6dd into llvm:main Oct 18, 2025
10 checks passed

jplehr added a commit that referenced this pull request Oct 18, 2025

Revert "[AMDGPU] Remove redundant s_cmp_lg_* sX, 0 (#162352)"

ea797b7

This reverts commit 8e5f6dd.

jplehr mentioned this pull request Oct 18, 2025

Revert "[AMDGPU] Remove redundant s_cmp_lg_* sX, 0 " #164116

Merged

LU-JOHN mentioned this pull request Oct 20, 2025

[AMDGPU] Reland "Remove redundant s_cmp_lg_* sX, 0" #164201

Merged

		; CHECK-NEXT: s_cselect_b64 s[6:7], -1, 0
		; CHECK-NEXT: s_cmp_lg_u64 s[6:7], 0

[AMDGPU] Remove redundant s_cmp_lg_* sX, 0 #162352

[AMDGPU] Remove redundant s_cmp_lg_* sX, 0 #162352

Uh oh!

Conversation

LU-JOHN commented Oct 7, 2025

Uh oh!

llvmbot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

jayfoad Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

LU-JOHN Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

LU-JOHN Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

LU-JOHN commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jayfoad commented Oct 15, 2025

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Oct 18, 2025

Uh oh!

jplehr commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

llvmbot commented Oct 7, 2025 •

edited

Loading

LU-JOHN commented Oct 15, 2025 •

edited

Loading