[DAGCombiner][X86] Correctly clean up high bits in `combinei64TruncSrlAdd` #128353

dtcxzyw · 2025-02-22T12:14:03Z

A counterexample for original implementation: https://alive2.llvm.org/ce/z/7ieYLg
This patch uses zext instead of anyext to fix the original issue.
BTW, we should keep low 64 - shamt bits instead of shamt - 32: https://alive2.llvm.org/ce/z/ruQP_Z
Some codes are simplified to avoid confusion.
Proof: https://alive2.llvm.org/ce/z/z_jdHD

Closes #128309.

…lAdd`

llvmbot · 2025-02-22T12:14:37Z

@llvm/pr-subscribers-backend-x86

Author: Yingwei Zheng (dtcxzyw)

Changes

A counterexample for original implementation: https://alive2.llvm.org/ce/z/YowPZY
We should keep low 64 - shamt bits instead of shamt - 32.
Proof: https://alive2.llvm.org/ce/z/z_jdHD

Full diff: https://github.com/llvm/llvm-project/pull/128353.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+4-7)
(modified) llvm/test/CodeGen/X86/combine-i64-trunc-srl-add.ll (+28-7)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 95fd7f9b94282..0bd0dbeac2087 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -53712,23 +53712,20 @@ static SDValue combinei64TruncSrlAdd(SDValue N, EVT VT, SelectionDAG &DAG,
                                  m_ConstInt(SrlConst)))))
     return SDValue();
 
-  if (SrlConst.ule(32) || AddConst.lshr(SrlConst).shl(SrlConst) != AddConst)
+  if (SrlConst.ule(32) || AddConst.countr_zero() < SrlConst.getZExtValue())
     return SDValue();
 
   SDValue AddLHSSrl =
       DAG.getNode(ISD::SRL, DL, MVT::i64, AddLhs, N.getOperand(1));
   SDValue Trunc = DAG.getNode(ISD::TRUNCATE, DL, VT, AddLHSSrl);
 
-  APInt NewAddConstVal =
-      (~((~AddConst).lshr(SrlConst))).trunc(VT.getSizeInBits());
+  APInt NewAddConstVal = AddConst.lshr(SrlConst).trunc(VT.getSizeInBits());
   SDValue NewAddConst = DAG.getConstant(NewAddConstVal, DL, VT);
   SDValue NewAddNode = DAG.getNode(ISD::ADD, DL, VT, Trunc, NewAddConst);
 
-  APInt CleanupSizeConstVal = (SrlConst - 32).zextOrTrunc(VT.getSizeInBits());
   EVT CleanUpVT =
-      EVT::getIntegerVT(*DAG.getContext(), CleanupSizeConstVal.getZExtValue());
-  SDValue CleanUp = DAG.getAnyExtOrTrunc(NewAddNode, DL, CleanUpVT);
-  return DAG.getAnyExtOrTrunc(CleanUp, DL, VT);
+      EVT::getIntegerVT(*DAG.getContext(), 64 - SrlConst.getZExtValue());
+  return DAG.getZeroExtendInReg(NewAddNode, DL, CleanUpVT);
 }
 
 /// Attempt to pre-truncate inputs to arithmetic ops if it will simplify
diff --git a/llvm/test/CodeGen/X86/combine-i64-trunc-srl-add.ll b/llvm/test/CodeGen/X86/combine-i64-trunc-srl-add.ll
index 41e1a24b239a6..14992ca5bf488 100644
--- a/llvm/test/CodeGen/X86/combine-i64-trunc-srl-add.ll
+++ b/llvm/test/CodeGen/X86/combine-i64-trunc-srl-add.ll
@@ -7,8 +7,9 @@ define i1 @test_ult_trunc_add(i64 %x) {
 ; X64-LABEL: test_ult_trunc_add:
 ; X64:       # %bb.0:
 ; X64-NEXT:    shrq $48, %rdi
-; X64-NEXT:    addl $-65522, %edi # imm = 0xFFFF000E
-; X64-NEXT:    cmpl $3, %edi
+; X64-NEXT:    addl $14, %edi
+; X64-NEXT:    movzwl %di, %eax
+; X64-NEXT:    cmpl $3, %eax
 ; X64-NEXT:    setb %al
 ; X64-NEXT:    retq
   %add = add i64 %x, 3940649673949184
@@ -22,8 +23,9 @@ define i1 @test_ult_add(i64 %x) {
 ; X64-LABEL: test_ult_add:
 ; X64:       # %bb.0:
 ; X64-NEXT:    shrq $48, %rdi
-; X64-NEXT:    addl $-65522, %edi # imm = 0xFFFF000E
-; X64-NEXT:    cmpl $3, %edi
+; X64-NEXT:    addl $14, %edi
+; X64-NEXT:    movzwl %di, %eax
+; X64-NEXT:    cmpl $3, %eax
 ; X64-NEXT:    setb %al
 ; X64-NEXT:    retq
   %add = add i64 3940649673949184, %x
@@ -35,8 +37,9 @@ define i1 @test_ugt_trunc_add(i64 %x) {
 ; X64-LABEL: test_ugt_trunc_add:
 ; X64:       # %bb.0:
 ; X64-NEXT:    shrq $48, %rdi
-; X64-NEXT:    addl $-65522, %edi # imm = 0xFFFF000E
-; X64-NEXT:    cmpl $4, %edi
+; X64-NEXT:    addl $14, %edi
+; X64-NEXT:    movzwl %di, %eax
+; X64-NEXT:    cmpl $4, %eax
 ; X64-NEXT:    setae %al
 ; X64-NEXT:    retq
   %add = add i64 %x, 3940649673949184
@@ -116,7 +119,8 @@ define i32 @test_trunc_add(i64 %x) {
 ; X64-LABEL: test_trunc_add:
 ; X64:       # %bb.0:
 ; X64-NEXT:    shrq $48, %rdi
-; X64-NEXT:    leal -65522(%rdi), %eax
+; X64-NEXT:    addl $14, %edi
+; X64-NEXT:    movzwl %di, %eax
 ; X64-NEXT:    retq
   %add = add i64 %x, 3940649673949184
   %shr = lshr i64 %add, 48
@@ -151,3 +155,20 @@ for.body:
 exit:
   ret i32 0
 }
+
+define i64 @pr128309(i64 %x) {
+; X64-LABEL: pr128309:
+; X64:       # %bb.0: # %entry
+; X64-NEXT:    movl %edi, %eax
+; X64-NEXT:    andl $18114, %eax # imm = 0x46C2
+; X64-NEXT:    addl $6, %eax
+; X64-NEXT:    andl %edi, %eax
+; X64-NEXT:    retq
+entry:
+  %shl = shl i64 %x, 48
+  %and = and i64 %shl, 5098637728136822784
+  %add = add i64 %and, 1688849860263936
+  %lshr = lshr i64 %add, 48
+  %res = and i64 %lshr, %x
+  ret i64 %res
+}

phoebewang · 2025-02-22T13:34:43Z

A counterexample for original implementation: https://alive2.llvm.org/ce/z/YowPZY
We should keep low 64 - shamt bits instead of shamt - 32.
Proof: https://alive2.llvm.org/ce/z/z_jdHD

The alive2 just proves the problem in calculation of NewAddConstVal, but is the problem in #128309 actually caused by zext vs. anyext?

phoebewang · 2025-02-22T13:41:45Z

A counterexample for original implementation: https://alive2.llvm.org/ce/z/YowPZY
We should keep low 64 - shamt bits instead of shamt - 32.
Proof: https://alive2.llvm.org/ce/z/z_jdHD

The alive2 just proves the problem in calculation of NewAddConstVal, but is the problem in #128309 actually caused by zext vs. anyext?

We had discussions regarding eliminating movz in #126448 (comment)
I was worrying the correctness, but don't know how to prove it. Do you have any idea how to prove the problem of #128309 with alive2?

phoebewang · 2025-02-22T14:48:52Z

A counterexample for original implementation: https://alive2.llvm.org/ce/z/YowPZY
We should keep low 64 - shamt bits instead of shamt - 32.
Proof: https://alive2.llvm.org/ce/z/z_jdHD

The alive2 just proves the problem in calculation of NewAddConstVal, but is the problem in #128309 actually caused by zext vs. anyext?

We had discussions regarding eliminating movz in #126448 (comment) I was worrying the correctness, but don't know how to prove it. Do you have any idea how to prove the problem of #128309 with alive2?

One proof: https://alive2.llvm.org/ce/z/xMsYxr
If we clamp c2 to 48, which happens equal between 64 - shamt and shamt - 32 and happens used in the test cases and in #128309, alive2 (corrected a mistake in your example) considers it's the right transform.

dtcxzyw · 2025-02-22T16:45:22Z

A counterexample for original implementation: https://alive2.llvm.org/ce/z/YowPZY
We should keep low 64 - shamt bits instead of shamt - 32.
Proof: https://alive2.llvm.org/ce/z/z_jdHD

The alive2 just proves the problem in calculation of NewAddConstVal, but is the problem in #128309 actually caused by zext vs. anyext?

Yeah, the original issue can be fixed by using zext instead of anyext. But alive2 still complains about the correctness, so I have to fix the number of low bits that should be kept.

I was worrying the correctness, but don't know how to prove it. Do you have any idea how to prove the problem of #128309 with alive2?

We can hardcode c2 into the type and divide anyext into sext+zext (2 pairs): https://alive2.llvm.org/ce/z/7ieYLg

phoebewang · 2025-02-23T02:58:05Z

I was worrying the correctness, but don't know how to prove it. Do you have any idea how to prove the problem of #128309 with alive2?

We can hardcode c2 into the type and divide anyext into sext+zext (2 pairs): https://alive2.llvm.org/ce/z/7ieYLg

Thanks for the point! I finally proved we must use zext as long as c1 >= 1 << shamt: https://alive2.llvm.org/ce/z/7sTBS7. So we cannot save the movz instruction.

phoebewang

LGTM, thanks!

llvm-ci · 2025-02-23T05:03:00Z

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/16286

Here is the relevant piece of the build log for the reference

Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/pgo1.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate      -Xclang "-fprofile-instrument=llvm"
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate -Xclang -fprofile-instrument=llvm
# note: command had no output on stdout or stderr
# RUN: at line 3
env LLVM_PROFILE_FILE=llvm.profraw /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp 2>&1
# executed command: env LLVM_PROFILE_FILE=llvm.profraw /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp
# note: command had no output on stdout or stderr
# RUN: at line 4
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/llvm-profdata show --all-functions --counts      amdgcn-amd-amdhsa.llvm.profraw | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c      --check-prefix="LLVM-PGO"
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/llvm-profdata show --all-functions --counts amdgcn-amd-amdhsa.llvm.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c --check-prefix=LLVM-PGO
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c:38:14: error: LLVM-PGO: expected string not found in input
# | // LLVM-PGO: Block counts: [20, 10, 2, 1]
# |              ^
# | <stdin>:4:13: note: scanning from here
# |  Counters: 4
# |             ^
# | <stdin>:5:2: note: possible intended match here
# |  Block counts: [20, 10, 3, 1]
# |  ^
# | 
# | Input file: <stdin>
# | Check file: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Counters: 
# |             2:  __omp_offloading_802_b388217_main_l27: 
# |             3:  Hash: 0x03fd5b902019ff2d 
# |             4:  Counters: 4 
# | check:38'0                 X error: no match found
# |             5:  Block counts: [20, 10, 3, 1] 
# | check:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | check:38'1      ?                             possible intended match
# |             6:  test1: 
# | check:38'0     ~~~~~~~
# |             7:  Hash: 0x0a4d0ad3efffffff 
# |             8:  Counters: 1 
# |             9:  Block counts: [10] 
# |            10:  test2: 
...

llvm-ci · 2025-02-23T05:11:56Z

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/13479

Here is the relevant piece of the build log for the reference

Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/pgo1.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate      -Xclang "-fprofile-instrument=llvm"
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate -Xclang -fprofile-instrument=llvm
# RUN: at line 3
env LLVM_PROFILE_FILE=llvm.profraw /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp 2>&1
# executed command: env LLVM_PROFILE_FILE=llvm.profraw /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp
# RUN: at line 4
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/llvm-profdata show --all-functions --counts      amdgcn-amd-amdhsa.llvm.profraw | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c      --check-prefix="LLVM-PGO"
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/llvm-profdata show --all-functions --counts amdgcn-amd-amdhsa.llvm.profraw
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c --check-prefix=LLVM-PGO
# RUN: at line 8
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-instr-generate      -Xclang "-fprofile-instrument=clang"
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-instr-generate -Xclang -fprofile-instrument=clang
# RUN: at line 10
env LLVM_PROFILE_FILE=clang.profraw /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp 2>&1
# executed command: env LLVM_PROFILE_FILE=clang.profraw /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp
# RUN: at line 11
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/llvm-profdata show --all-functions --counts      amdgcn-amd-amdhsa.clang.profraw | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c      --check-prefix="CLANG-PGO"
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/llvm-profdata show --all-functions --counts amdgcn-amd-amdhsa.clang.profraw
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c --check-prefix=CLANG-PGO
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c:54:15: error: CLANG-PGO: expected string not found in input
# | // CLANG-PGO: Block counts: [11, 20]
# |               ^
# | <stdin>:5:19: note: scanning from here
# |  Function count: 0
# |                   ^
# | <stdin>:6:2: note: possible intended match here
# |  Block counts: [12, 20]
# |  ^
# | 
# | Input file: <stdin>
# | Check file: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Counters: 
# |             2:  pgo1.c:__omp_offloading_802_d8283c4_main_l27: 
# |             3:  Hash: 0x000000011b11b451 
# |             4:  Counters: 3 
# |             5:  Function count: 0 
# | check:54'0                       X error: no match found
# |             6:  Block counts: [12, 20] 
...

@phoebewang

… `xor` (#128435) As discussed in #126448, the fold implemented by #126448 / #128353 can be extended to operations other than `add`. This patch extends the fold performed by `combinei64TruncSrlAdd` to include `or` and `xor` (proof: https://alive2.llvm.org/ce/z/AXuaQu). There's no need to extend it to `sub` and `and`, as similar folds are already being performed for those operations. CC: @phoebewang @RKSimon

@phoebewang

… `xor` (llvm#128435) As discussed in llvm#126448, the fold implemented by llvm#126448 / llvm#128353 can be extended to operations other than `add`. This patch extends the fold performed by `combinei64TruncSrlAdd` to include `or` and `xor` (proof: https://alive2.llvm.org/ce/z/AXuaQu). There's no need to extend it to `sub` and `and`, as similar folds are already being performed for those operations. CC: @phoebewang @RKSimon

dtcxzyw added 2 commits February 22, 2025 19:59

[DAGCombiner][X86] Add pre-commit tests. NFC.

270fc0e

[DAGCombiner][X86] Correctly clean up high bits in `combinei64TruncSr…

6c4f7bf

…lAdd`

dtcxzyw requested review from RKSimon and phoebewang February 22, 2025 12:14

llvmbot added the backend:X86 label Feb 22, 2025

phoebewang approved these changes Feb 23, 2025

View reviewed changes

dtcxzyw merged commit dbd219a into llvm:main Feb 23, 2025
13 checks passed

dtcxzyw deleted the fix-pr128309 branch February 23, 2025 04:57

joaotgouveia mentioned this pull request Feb 23, 2025

[X86] Extend combinei64TruncSrlAdd to handle patterns with or and xor #128435

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DAGCombiner][X86] Correctly clean up high bits in `combinei64TruncSrlAdd` #128353

[DAGCombiner][X86] Correctly clean up high bits in `combinei64TruncSrlAdd` #128353

Uh oh!

dtcxzyw commented Feb 22, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 22, 2025

Uh oh!

phoebewang commented Feb 22, 2025

Uh oh!

phoebewang commented Feb 22, 2025

Uh oh!

phoebewang commented Feb 22, 2025

Uh oh!

dtcxzyw commented Feb 22, 2025

Uh oh!

phoebewang commented Feb 23, 2025

Uh oh!

phoebewang left a comment

Uh oh!

Uh oh!

llvm-ci commented Feb 23, 2025

Uh oh!

llvm-ci commented Feb 23, 2025

Uh oh!

Uh oh!

[DAGCombiner][X86] Correctly clean up high bits in combinei64TruncSrlAdd #128353

[DAGCombiner][X86] Correctly clean up high bits in combinei64TruncSrlAdd #128353

Uh oh!

Conversation

dtcxzyw commented Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 22, 2025

Uh oh!

phoebewang commented Feb 22, 2025

Uh oh!

phoebewang commented Feb 22, 2025

Uh oh!

phoebewang commented Feb 22, 2025

Uh oh!

dtcxzyw commented Feb 22, 2025

Uh oh!

phoebewang commented Feb 23, 2025

Uh oh!

phoebewang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Feb 23, 2025

Uh oh!

llvm-ci commented Feb 23, 2025

Uh oh!

Uh oh!

[DAGCombiner][X86] Correctly clean up high bits in `combinei64TruncSrlAdd` #128353

[DAGCombiner][X86] Correctly clean up high bits in `combinei64TruncSrlAdd` #128353

dtcxzyw commented Feb 22, 2025 •

edited

Loading