Skip to content

[CGP] Consider arguments and ret values in dupRetToEnableTailCallOpts #76613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

antoniofrighetto
Copy link
Contributor

@antoniofrighetto antoniofrighetto commented Dec 30, 2023

Hint further tail call optimization opportunities when the examined returned value is either the return value of a call instruction, or a function argument. Moreover, take into account the cases in which incoming values from phi-nodes, not directly tail call instructions, may still be simplified.

Fixes: #75455.

@llvmbot
Copy link
Member

llvmbot commented Dec 30, 2023

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-backend-x86

Author: Antonio Frighetto (antoniofrighetto)

Changes

Unless the returned value is a constant, which may result in duplicated code later in codegen, there should be no reason not to refrain from further tail call optimization opportunities. Consider also the cases in which incoming values in phi-nodes, not coming directly from call instructions, may still be folded.

Fixes: #75455.


Full diff: https://github.com/llvm/llvm-project/pull/76613.diff

6 Files Affected:

  • (modified) llvm/lib/CodeGen/CodeGenPrepare.cpp (+10-2)
  • (modified) llvm/test/CodeGen/Thumb2/constant-islands-cbz.ll (+8-10)
  • (modified) llvm/test/CodeGen/X86/tail-dup-merge-loop-headers.ll (+63-66)
  • (modified) llvm/test/CodeGen/X86/tailcall-cgp-dup.ll (+51)
  • (modified) llvm/test/CodeGen/X86/x86-shrink-wrapping.ll (+2-2)
  • (modified) llvm/test/DebugInfo/X86/live-debug-values-expr-conflict.ll (+8-8)
diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp
index 6e99fb133e26a9..8c28333ac532d9 100644
--- a/llvm/lib/CodeGen/CodeGenPrepare.cpp
+++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp
@@ -2532,7 +2532,8 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,
     }
 
     PN = dyn_cast<PHINode>(V);
-    if (!PN)
+
+    if (isa<Constant>(V))
       return false;
   }
 
@@ -2569,8 +2570,15 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,
       Value *IncomingVal = PN->getIncomingValue(I)->stripPointerCasts();
       CallInst *CI = dyn_cast<CallInst>(IncomingVal);
       BasicBlock *PredBB = PN->getIncomingBlock(I);
+      // Do we have a call whose return value may have been optimized out (e.g.
+      // __builtin_memset) and an unconditional jump as terminator eligible to
+      // be folded?
+      if (!CI && PredBB && PredBB->getSingleSuccessor() == BB)
+        CI = dyn_cast_or_null<CallInst>(
+            PredBB->getTerminator()->getPrevNonDebugInstruction(true));
+
       // Make sure the phi value is indeed produced by the tail call.
-      if (CI && CI->hasOneUse() && CI->getParent() == PredBB &&
+      if (CI && CI->getParent() == PredBB && CI->getNumUses() <= 1 &&
           TLI->mayBeEmittedAsTailCall(CI) &&
           attributesPermitTailCall(F, CI, RetI, *TLI))
         TailCallBBs.push_back(PredBB);
diff --git a/llvm/test/CodeGen/Thumb2/constant-islands-cbz.ll b/llvm/test/CodeGen/Thumb2/constant-islands-cbz.ll
index fc42d4e0e4d8c0..6e5cf4d335d94e 100644
--- a/llvm/test/CodeGen/Thumb2/constant-islands-cbz.ll
+++ b/llvm/test/CodeGen/Thumb2/constant-islands-cbz.ll
@@ -8,12 +8,11 @@ define ptr @test(ptr returned %this, i32 %event_size, ptr %event_pointer) {
 ; CHECK-T1-NEXT:    .save {r4, lr}
 ; CHECK-T1-NEXT:    push {r4, lr}
 ; CHECK-T1-NEXT:    mov r4, r0
-; CHECK-T1-NEXT:    movs r0, #0
-; CHECK-T1-NEXT:    str r0, [r4, #4]
-; CHECK-T1-NEXT:    str r0, [r4, #8]
-; CHECK-T1-NEXT:    str r0, [r4, #12]
-; CHECK-T1-NEXT:    str r0, [r4, #16]
-; CHECK-T1-NEXT:    mov r0, r4
+; CHECK-T1-NEXT:    movs r3, #0
+; CHECK-T1-NEXT:    str r3, [r0, #4]
+; CHECK-T1-NEXT:    str r3, [r0, #8]
+; CHECK-T1-NEXT:    str r3, [r0, #12]
+; CHECK-T1-NEXT:    str r3, [r0, #16]
 ; CHECK-T1-NEXT:    cbz r2, .LBB0_2
 ; CHECK-T1-NEXT:  @ %bb.1: @ %if.else
 ; CHECK-T1-NEXT:    bl equeue_create_inplace
@@ -28,11 +27,10 @@ define ptr @test(ptr returned %this, i32 %event_size, ptr %event_pointer) {
 ; CHECK-T2:       @ %bb.0: @ %entry
 ; CHECK-T2-NEXT:    .save {r4, lr}
 ; CHECK-T2-NEXT:    push {r4, lr}
+; CHECK-T2-NEXT:    movs r3, #0
 ; CHECK-T2-NEXT:    mov r4, r0
-; CHECK-T2-NEXT:    movs r0, #0
-; CHECK-T2-NEXT:    strd r0, r0, [r4, #4]
-; CHECK-T2-NEXT:    strd r0, r0, [r4, #12]
-; CHECK-T2-NEXT:    mov r0, r4
+; CHECK-T2-NEXT:    strd r3, r3, [r0, #4]
+; CHECK-T2-NEXT:    strd r3, r3, [r0, #12]
 ; CHECK-T2-NEXT:    cbz r2, .LBB0_2
 ; CHECK-T2-NEXT:  @ %bb.1: @ %if.else
 ; CHECK-T2-NEXT:    bl equeue_create_inplace
diff --git a/llvm/test/CodeGen/X86/tail-dup-merge-loop-headers.ll b/llvm/test/CodeGen/X86/tail-dup-merge-loop-headers.ll
index 937cc173e7faef..404b8b13328de7 100644
--- a/llvm/test/CodeGen/X86/tail-dup-merge-loop-headers.ll
+++ b/llvm/test/CodeGen/X86/tail-dup-merge-loop-headers.ll
@@ -95,102 +95,92 @@ define i32 @loop_shared_header(i8* %exe, i32 %exesz, i32 %headsize, i32 %min, i3
 ; CHECK-NEXT:    pushq %r12
 ; CHECK-NEXT:    pushq %rbx
 ; CHECK-NEXT:    pushq %rax
-; CHECK-NEXT:    movl $1, %ebx
 ; CHECK-NEXT:    xorl %eax, %eax
 ; CHECK-NEXT:    testb %al, %al
-; CHECK-NEXT:    jne .LBB1_24
+; CHECK-NEXT:    jne .LBB1_22
 ; CHECK-NEXT:  # %bb.1: # %if.end19
 ; CHECK-NEXT:    movl %esi, %ebp
-; CHECK-NEXT:    movq %rdi, %r15
-; CHECK-NEXT:    movl (%rax), %r13d
-; CHECK-NEXT:    leal (,%r13,4), %ebx
-; CHECK-NEXT:    movl %ebx, %r12d
+; CHECK-NEXT:    movq %rdi, %r14
+; CHECK-NEXT:    movl (%rax), %r12d
+; CHECK-NEXT:    leal (,%r12,4), %r13d
+; CHECK-NEXT:    movl %r13d, %r15d
 ; CHECK-NEXT:    movl $1, %esi
-; CHECK-NEXT:    movq %r12, %rdi
+; CHECK-NEXT:    movq %r15, %rdi
 ; CHECK-NEXT:    callq cli_calloc@PLT
 ; CHECK-NEXT:    testl %ebp, %ebp
-; CHECK-NEXT:    je .LBB1_23
+; CHECK-NEXT:    je .LBB1_22
 ; CHECK-NEXT:  # %bb.2: # %if.end19
-; CHECK-NEXT:    testl %r13d, %r13d
-; CHECK-NEXT:    je .LBB1_23
+; CHECK-NEXT:    testl %r12d, %r12d
+; CHECK-NEXT:    je .LBB1_22
 ; CHECK-NEXT:  # %bb.3: # %if.end19
-; CHECK-NEXT:    movq %rax, %r14
+; CHECK-NEXT:    movq %rax, %rbx
 ; CHECK-NEXT:    xorl %eax, %eax
 ; CHECK-NEXT:    testb %al, %al
-; CHECK-NEXT:    jne .LBB1_23
+; CHECK-NEXT:    jne .LBB1_22
 ; CHECK-NEXT:  # %bb.4: # %if.end19
-; CHECK-NEXT:    cmpq %r15, %r14
-; CHECK-NEXT:    jb .LBB1_23
+; CHECK-NEXT:    cmpq %r14, %rbx
+; CHECK-NEXT:    jb .LBB1_22
 ; CHECK-NEXT:  # %bb.5: # %if.end50
-; CHECK-NEXT:    movq %r14, %rdi
-; CHECK-NEXT:    movq %r12, %rdx
+; CHECK-NEXT:    movq %rbx, %rdi
+; CHECK-NEXT:    movq %r15, %rdx
 ; CHECK-NEXT:    callq memcpy@PLT
-; CHECK-NEXT:    cmpl $4, %ebx
-; CHECK-NEXT:    jb .LBB1_26
+; CHECK-NEXT:    cmpl $4, %r13d
+; CHECK-NEXT:    jb .LBB1_21
 ; CHECK-NEXT:  # %bb.6: # %shared_preheader
 ; CHECK-NEXT:    movb $32, %cl
 ; CHECK-NEXT:    xorl %eax, %eax
-; CHECK-NEXT:    jmp .LBB1_8
+; CHECK-NEXT:    jmp .LBB1_7
 ; CHECK-NEXT:    .p2align 4, 0x90
-; CHECK-NEXT:  .LBB1_7: # %merge_predecessor_split
-; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
+; CHECK-NEXT:  .LBB1_17: # %merge_predecessor_split
+; CHECK-NEXT:    # in Loop: Header=BB1_7 Depth=1
 ; CHECK-NEXT:    movb $32, %cl
-; CHECK-NEXT:  .LBB1_8: # %outer_loop_header
+; CHECK-NEXT:  .LBB1_7: # %outer_loop_header
 ; CHECK-NEXT:    # =>This Loop Header: Depth=1
-; CHECK-NEXT:    # Child Loop BB1_9 Depth 2
-; CHECK-NEXT:    testl %r13d, %r13d
-; CHECK-NEXT:    je .LBB1_16
+; CHECK-NEXT:    # Child Loop BB1_11 Depth 2
+; CHECK-NEXT:    testl %r12d, %r12d
+; CHECK-NEXT:    je .LBB1_8
 ; CHECK-NEXT:    .p2align 4, 0x90
-; CHECK-NEXT:  .LBB1_9: # %shared_loop_header
-; CHECK-NEXT:    # Parent Loop BB1_8 Depth=1
+; CHECK-NEXT:  .LBB1_11: # %shared_loop_header
+; CHECK-NEXT:    # Parent Loop BB1_7 Depth=1
 ; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
-; CHECK-NEXT:    testq %r14, %r14
-; CHECK-NEXT:    jne .LBB1_25
-; CHECK-NEXT:  # %bb.10: # %inner_loop_body
-; CHECK-NEXT:    # in Loop: Header=BB1_9 Depth=2
+; CHECK-NEXT:    testq %rbx, %rbx
+; CHECK-NEXT:    jne .LBB1_20
+; CHECK-NEXT:  # %bb.12: # %inner_loop_body
+; CHECK-NEXT:    # in Loop: Header=BB1_11 Depth=2
 ; CHECK-NEXT:    testb %al, %al
-; CHECK-NEXT:    je .LBB1_9
-; CHECK-NEXT:  # %bb.11: # %if.end96.i
-; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
-; CHECK-NEXT:    cmpl $3, %r13d
-; CHECK-NEXT:    jae .LBB1_20
-; CHECK-NEXT:  # %bb.12: # %if.end287.i
-; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
+; CHECK-NEXT:    je .LBB1_11
+; CHECK-NEXT:  # %bb.13: # %if.end96.i
+; CHECK-NEXT:    # in Loop: Header=BB1_7 Depth=1
+; CHECK-NEXT:    cmpl $3, %r12d
+; CHECK-NEXT:    jae .LBB1_14
+; CHECK-NEXT:  # %bb.15: # %if.end287.i
+; CHECK-NEXT:    # in Loop: Header=BB1_7 Depth=1
 ; CHECK-NEXT:    testb %al, %al
 ; CHECK-NEXT:    # implicit-def: $cl
-; CHECK-NEXT:    jne .LBB1_8
-; CHECK-NEXT:  # %bb.13: # %if.end308.i
-; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
+; CHECK-NEXT:    jne .LBB1_7
+; CHECK-NEXT:  # %bb.16: # %if.end308.i
+; CHECK-NEXT:    # in Loop: Header=BB1_7 Depth=1
 ; CHECK-NEXT:    testb %al, %al
-; CHECK-NEXT:    je .LBB1_7
-; CHECK-NEXT:  # %bb.14: # %if.end335.i
-; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
+; CHECK-NEXT:    je .LBB1_17
+; CHECK-NEXT:  # %bb.18: # %if.end335.i
+; CHECK-NEXT:    # in Loop: Header=BB1_7 Depth=1
 ; CHECK-NEXT:    xorl %ecx, %ecx
 ; CHECK-NEXT:    testb %cl, %cl
-; CHECK-NEXT:    jne .LBB1_8
-; CHECK-NEXT:  # %bb.15: # %merge_other
-; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
+; CHECK-NEXT:    jne .LBB1_7
+; CHECK-NEXT:  # %bb.19: # %merge_other
+; CHECK-NEXT:    # in Loop: Header=BB1_7 Depth=1
 ; CHECK-NEXT:    # implicit-def: $cl
-; CHECK-NEXT:    jmp .LBB1_8
-; CHECK-NEXT:  .LBB1_23:
-; CHECK-NEXT:    movl $1, %ebx
-; CHECK-NEXT:    jmp .LBB1_24
-; CHECK-NEXT:  .LBB1_16: # %while.cond.us1412.i
+; CHECK-NEXT:    jmp .LBB1_7
+; CHECK-NEXT:  .LBB1_8: # %while.cond.us1412.i
 ; CHECK-NEXT:    xorl %eax, %eax
 ; CHECK-NEXT:    testb %al, %al
-; CHECK-NEXT:    movl $1, %ebx
-; CHECK-NEXT:    jne .LBB1_18
-; CHECK-NEXT:  # %bb.17: # %while.cond.us1412.i
+; CHECK-NEXT:    jne .LBB1_10
+; CHECK-NEXT:  # %bb.9: # %while.cond.us1412.i
 ; CHECK-NEXT:    decb %cl
-; CHECK-NEXT:    jne .LBB1_24
-; CHECK-NEXT:  .LBB1_18: # %if.end41.us1436.i
-; CHECK-NEXT:  .LBB1_20: # %if.then99.i
-; CHECK-NEXT:    movq .str.6@GOTPCREL(%rip), %rdi
-; CHECK-NEXT:    xorl %ebx, %ebx
-; CHECK-NEXT:    xorl %eax, %eax
-; CHECK-NEXT:    callq cli_dbgmsg@PLT
-; CHECK-NEXT:  .LBB1_24: # %cleanup
-; CHECK-NEXT:    movl %ebx, %eax
+; CHECK-NEXT:    je .LBB1_10
+; CHECK-NEXT:  .LBB1_22: # %cleanup
+; CHECK-NEXT:    movl $1, %eax
+; CHECK-NEXT:  .LBB1_23: # %cleanup
 ; CHECK-NEXT:    addq $8, %rsp
 ; CHECK-NEXT:    popq %rbx
 ; CHECK-NEXT:    popq %r12
@@ -199,8 +189,15 @@ define i32 @loop_shared_header(i8* %exe, i32 %exesz, i32 %headsize, i32 %min, i3
 ; CHECK-NEXT:    popq %r15
 ; CHECK-NEXT:    popq %rbp
 ; CHECK-NEXT:    retq
-; CHECK-NEXT:  .LBB1_25: # %wunpsect.exit.thread.loopexit389
-; CHECK-NEXT:  .LBB1_26: # %wunpsect.exit.thread.loopexit391
+; CHECK-NEXT:  .LBB1_14: # %if.then99.i
+; CHECK-NEXT:    movq .str.6@GOTPCREL(%rip), %rdi
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    callq cli_dbgmsg@PLT
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    jmp .LBB1_23
+; CHECK-NEXT:  .LBB1_20: # %wunpsect.exit.thread.loopexit389
+; CHECK-NEXT:  .LBB1_10: # %if.end41.us1436.i
+; CHECK-NEXT:  .LBB1_21: # %wunpsect.exit.thread.loopexit391
 entry:
   %0 = load i32, i32* undef, align 4
   %mul = shl nsw i32 %0, 2
diff --git a/llvm/test/CodeGen/X86/tailcall-cgp-dup.ll b/llvm/test/CodeGen/X86/tailcall-cgp-dup.ll
index 48440558283d45..a52bd0b305bd3f 100644
--- a/llvm/test/CodeGen/X86/tailcall-cgp-dup.ll
+++ b/llvm/test/CodeGen/X86/tailcall-cgp-dup.ll
@@ -184,3 +184,54 @@ return:
   %retval = phi ptr [ %ptr, %if.then ], [ %obj, %entry ]
   ret ptr %retval
 }
+
+define noundef ptr @memset_tailc(ptr noundef %0, i64 noundef %1) {
+; CHECK-LABEL: memset_tailc:
+; CHECK:       ## %bb.0:
+; CHECK-NEXT:    testq %rdi, %rdi
+; CHECK-NEXT:    je LBB4_1
+; CHECK-NEXT:  ## %bb.2:
+; CHECK-NEXT:    movq %rsi, %rdx
+; CHECK-NEXT:    xorl %esi, %esi
+; CHECK-NEXT:    jmp _memset ## TAILCALL
+; CHECK-NEXT:  LBB4_1:
+; CHECK-NEXT:    movq %rdi, %rax
+; CHECK-NEXT:    retq
+  %3 = icmp eq ptr %0, null
+  br i1 %3, label %5, label %4
+
+4:
+  tail call void @llvm.memset.p0.i64(ptr nonnull align 1 %0, i8 0, i64 %1, i1 false)
+  br label %5
+
+5:
+  ret ptr %0
+}
+
+define noundef ptr @memcpy_tailc(ptr noundef %0, i64 noundef %1, ptr noundef %2) {
+; CHECK-LABEL: memcpy_tailc:
+; CHECK:       ## %bb.0:
+; CHECK-NEXT:    testq %rsi, %rsi
+; CHECK-NEXT:    je LBB5_1
+; CHECK-NEXT:  ## %bb.2:
+; CHECK-NEXT:    movq %rsi, %rax
+; CHECK-NEXT:    movq %rdx, %rsi
+; CHECK-NEXT:    movq %rax, %rdx
+; CHECK-NEXT:    jmp _memcpy ## TAILCALL
+; CHECK-NEXT:  LBB5_1:
+; CHECK-NEXT:    movq %rdx, %rax
+; CHECK-NEXT:    retq
+  %4 = icmp eq i64 %1, 0
+  br i1 %4, label %6, label %5
+
+5:
+  tail call void @llvm.memcpy.p0.p0.i64(ptr align 1 %0, ptr align 1 %2, i64 %1, i1 false)
+  br label %6
+
+6:
+  %7 = phi ptr [ %0, %5 ], [ %2, %3 ]
+  ret ptr %7
+}
+
+declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #1
+declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg)
diff --git a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
index fab3847b3a2c51..eb2e00fb90429c 100644
--- a/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/X86/x86-shrink-wrapping.ll
@@ -477,8 +477,8 @@ define i32 @inlineAsm(i32 %cond, i32 %N) {
 ; ENABLE-NEXT:    popq %rbx
 ; ENABLE-NEXT:    retq
 ; ENABLE-NEXT:  LBB6_4: ## %if.else
+; ENABLE-NEXT:    addl %esi, %esi
 ; ENABLE-NEXT:    movl %esi, %eax
-; ENABLE-NEXT:    addl %esi, %eax
 ; ENABLE-NEXT:    retq
 ;
 ; DISABLE-LABEL: inlineAsm:
@@ -509,8 +509,8 @@ define i32 @inlineAsm(i32 %cond, i32 %N) {
 ; DISABLE-NEXT:    popq %rbx
 ; DISABLE-NEXT:    retq
 ; DISABLE-NEXT:  LBB6_4: ## %if.else
+; DISABLE-NEXT:    addl %esi, %esi
 ; DISABLE-NEXT:    movl %esi, %eax
-; DISABLE-NEXT:    addl %esi, %eax
 ; DISABLE-NEXT:    popq %rbx
 ; DISABLE-NEXT:    retq
 entry:
diff --git a/llvm/test/DebugInfo/X86/live-debug-values-expr-conflict.ll b/llvm/test/DebugInfo/X86/live-debug-values-expr-conflict.ll
index afce27ee4abb9b..2769d39fa54aa2 100644
--- a/llvm/test/DebugInfo/X86/live-debug-values-expr-conflict.ll
+++ b/llvm/test/DebugInfo/X86/live-debug-values-expr-conflict.ll
@@ -24,18 +24,18 @@
 ; one in the block two, and none in block three.
 ; CHECK:       ![[BAZVAR:[0-9]+]] = !DILocalVariable(name: "baz",
 ; CHECK-LABEL: bb.0.entry:
-; CHECK:       DBG_VALUE {{[0-9a-zA-Z$%_]*}}, $noreg, ![[BAZVAR]], 
+; CHECK:       DBG_VALUE {{[0-9a-zA-Z$%_]*}}, $noreg, ![[BAZVAR]], 
 ; CHECK-SAME:     !DIExpression()
 ; CHECK-LABEL: bb.1.if.then:
-; CHECK-LABEL: bb.2.if.else:
-; CHECK:       DBG_VALUE {{[0-9a-zA-Z$%_]*}}, $noreg, ![[BAZVAR]], 
+; CHECK-LABEL: bb.3.if.else:
+; CHECK:       DBG_VALUE {{[0-9a-zA-Z$%_]*}}, $noreg, ![[BAZVAR]], 
 ; CHECK-SAME:     !DIExpression()
-; CHECK:       DBG_VALUE_LIST ![[BAZVAR]], 
-; CHECK-SAME:     !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_plus_uconst, 1, DW_OP_stack_value)
-; CHECK-SAME:     {{[0-9a-zA-Z$%_]*}}
-; CHECK-LABEL: bb.3.if.end:
+; CHECK:       DBG_VALUE_LIST ![[BAZVAR]], 
+; CHECK-SAME:     !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_plus_uconst, 1, DW_OP_stack_value)
+; CHECK-SAME:     {{[0-9a-zA-Z$%_]*}}
+; CHECK-LABEL: bb.2.if.then:
 ; CHECK-NOT:   DBG_VALUE
-; CHECK-NOT:   DBG_VALUE_LIST
+; CHECK-NOT:   DBG_VALUE_LIST
 
 declare void @escape1(i32)
 declare void @escape2(i32)

@antoniofrighetto antoniofrighetto force-pushed the feature/tailcall_missed_opt branch from 1f07415 to 08abfe9 Compare December 30, 2023 13:03
dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Dec 30, 2023
@dtcxzyw dtcxzyw requested review from RKSimon and topperc December 30, 2023 13:41
@antoniofrighetto antoniofrighetto force-pushed the feature/tailcall_missed_opt branch from 08abfe9 to 12fe22e Compare December 30, 2023 14:57
@dtcxzyw
Copy link
Member

dtcxzyw commented Dec 30, 2023

@antoniofrighetto Could you please take a look at llvm-test-suite/MultiSource/Benchmarks/Olden/health/health? This patch seems to cause a significant regression (~22%) in this application, which is compiled with -Os.
Artifacts: https://github.com/dtcxzyw/llvm-ci/actions/runs/7364440817/artifacts/1140327255

MultiSource/Benchmarks/Olden/health/health
Baseline: c2971aedfcbfb8b5c16c2c2ed1ea82cb
Current: 055ea400b5ab283b03146c87666b7ef7

@antoniofrighetto antoniofrighetto force-pushed the feature/tailcall_missed_opt branch from 12fe22e to 46ba8eb Compare January 3, 2024 21:44
@antoniofrighetto antoniofrighetto changed the title [CGP] Do not fold ret value if constant in dupRetToEnableTailCallOpts [CGP] Consider arguments and ret values in dupRetToEnableTailCallOpts Jan 3, 2024
@antoniofrighetto
Copy link
Contributor Author

@dtcxzyw, thanks, restrained the folding to ret values of function calls and function arguments, and considering essentially only libc functions (whose first argument coincides with the return value).

@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 4, 2024

@dtcxzyw, thanks, restrained the folding to ret values of function calls and function arguments, and considering essentially only libc functions (whose first argument coincides with the return value).

The regression has been fixed. Thank you!

@antoniofrighetto antoniofrighetto force-pushed the feature/tailcall_missed_opt branch 2 times, most recently from 54ec30a to 3758b79 Compare January 5, 2024 11:16
@RKSimon RKSimon removed their request for review January 5, 2024 11:32
@antoniofrighetto
Copy link
Contributor Author

Kind ping.

@dvyukov
Copy link
Collaborator

dvyukov commented Jan 15, 2024

cc @david-xl @teresajohnson can you please find somebody to review this?
This is extracted from our production code, see #75455.

@dvyukov
Copy link
Collaborator

dvyukov commented Jan 15, 2024

@antoniofrighetto do you have any intuition regarding how much code this can affect in real applications?

@antoniofrighetto
Copy link
Contributor Author

@dvyukov, hard to assess the impact of the change qualitatively. Looking for calls in tail position w/ ret value used over glibc (grep -r -P 'return.*\b(strcpy|strncpy|memset|memcpy|memmove).*\(.*\).*;' .) reveals 52 instances of tail call opportunities, which possibly doesn't seem a huge number (out of total # functions?), but they all seem within highly recurring functions (e.g., malloc, strdup – as already noted in the issue).

@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 16, 2024

@dvyukov, hard to assess the impact of the change qualitatively. Looking for calls in tail position w/ ret value used over glibc (grep -r -P 'return.*\b(strcpy|strncpy|memset|memcpy|memmove).*\(.*\).*;' .) reveals 52 instances of tail call opportunities, which possibly doesn't seem a huge number (out of total # functions?), but they all seem within highly recurring functions (e.g., malloc, strdup – as already noted in the issue).

I found 342 potential tail call opportunities in my benchmark. But most of them are only associated with llvm.memcpy/llvm.memset.

@antoniofrighetto
Copy link
Contributor Author

Kind ping (cc/ @david-xl).

@antoniofrighetto antoniofrighetto force-pushed the feature/tailcall_missed_opt branch from 6608bac to 757a012 Compare January 25, 2024 20:36
@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 30, 2024

Ping.

@antoniofrighetto antoniofrighetto force-pushed the feature/tailcall_missed_opt branch from 757a012 to 7cb15b2 Compare January 30, 2024 18:13
@antoniofrighetto
Copy link
Contributor Author

Updated this to explicitly looking for candidate intrinsics / strcpy, as I don't think only looking for first argument was robust enough to avoid phi folding to ret in all cases (now evaluating it on the compile-time tracker for regressions).

ret ptr %phi
}

define ptr @strcpy_tailc(i64 %0, ptr %src) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a test case with diamond shaped control flow. In one arm, add a positive test and in the other arm, add a negative test where tailcall is not legal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoniofrighetto antoniofrighetto force-pushed the feature/tailcall_missed_opt branch 2 times, most recently from 72e2a6d to b89b00f Compare January 30, 2024 22:27
@antoniofrighetto
Copy link
Contributor Author

Ping.

@david-xl
Copy link
Contributor

david-xl commented Feb 6, 2024

Can you provide any performance numbers (real applications, benchmark suite) etc?

@dvyukov
Copy link
Collaborator

dvyukov commented Feb 6, 2024

Can you provide any performance numbers (real applications, benchmark suite) etc?

What will depend on these numbers? Do you mean that we don't merge this if it's not possible to demonstrate the effect? Obtaining relativistic performance data for smaller improvements is usually very expensive, and not possible to do an improvement less than 0.1%.

@david-xl
Copy link
Contributor

david-xl commented Feb 6, 2024

Can you provide any performance numbers (real applications, benchmark suite) etc?

What will depend on these numbers? Do you mean that we don't merge this if it's not possible to demonstrate the effect? Obtaining relativistic performance data for smaller improvements is usually very expensive, and not possible to do an improvement less than 0.1%.

It is the other way around. Having some positive numbers will help speed up the approval. There are always trade-offs between runtime performance and compile time overhead (probably not a concern for this case).

There is also chance of trigging unexpected regressions. Having some test coverage gives people more confidence.

Having said that, this patch LGTM.

@antoniofrighetto antoniofrighetto force-pushed the feature/tailcall_missed_opt branch from b89b00f to c310316 Compare February 7, 2024 16:56
@antoniofrighetto antoniofrighetto force-pushed the feature/tailcall_missed_opt branch 2 times, most recently from 84b2753 to 7a600ae Compare February 8, 2024 11:33
@antoniofrighetto antoniofrighetto force-pushed the feature/tailcall_missed_opt branch from 7a600ae to 8df3ae3 Compare February 12, 2024 06:07
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Hint further tail call optimization opportunities when the examined
returned value is the return value of a known intrinsic or library
function, and it appears as first function argument.

Fixes: llvm#75455.
@antoniofrighetto antoniofrighetto force-pushed the feature/tailcall_missed_opt branch from 8df3ae3 to 8373cee Compare February 12, 2024 13:17
@antoniofrighetto antoniofrighetto merged commit 8373cee into llvm:main Feb 12, 2024
@antoniofrighetto
Copy link
Contributor Author

Thanks for reviewing this! @david-xl, @nikic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missed optimization: not tail calling memset
6 participants