Skip to content

LLVM IR for generic functions sometimes fails to inline post MergeFuncs with LTO #97552

Closed
@Nils-TUD

Description

@Nils-TUD

For some reason, the compiler decides to not inline Vec::deref anymore if LTO is enabled, which ruins performance in my case. I tried to force the compiler to inline it via -Znew-llvm-pass-manager=no -Cinline-threshold=N, but even N=10000 (resulting in insane compile times and binary sizes) doesn't convince the compiler to inline the function. This behavior changed between Rust version 76d770a (inlined) and 6af09d2 (not inlined).

Code

I tried to create a more or less minimal example that reproduces the problem, which resulted in the following Rust program:

#[inline(never)]
fn do_something(args: &Vec<String>) -> bool {
    args[0] == "test"
}

fn main() {
    println!("{}", do_something(&std::env::args().collect::<Vec<_>>()));
}

I expected to see this happen: the Deref implementation of Vec is inlined regardless of whether LTO is used or not.

Instead, this happened: If LTO is enabled, the Deref implementation of Vec is not inlined. If LTO is disabled, it is inlined. That is, with LTO, the generated assembly code looks like this:

0000000000037bd0 <vec_deref_inline::do_something>:
   37bd0:       50                      push   %rax
   37bd1:       48 83 7f 10 00          cmpq   $0x0,0x10(%rdi)
   37bd6:       74 1d                   je     37bf5 <vec_deref_inline::do_something+0x25>
   37bd8:       48 8b 3f                mov    (%rdi),%rdi
   37bdb:       e8 30 00 00 00          call   37c10 <<alloc::vec::Vec<T,A> as core::ops::deref::Deref>::deref>
   37be0:       48 83 fa 04             cmp    $0x4,%rdx
   37be4:       75 0b                   jne    37bf1 <vec_deref_inline::do_something+0x21>
   37be6:       81 38 74 65 73 74       cmpl   $0x74736574,(%rax)
   37bec:       0f 94 c0                sete   %al
   37bef:       59                      pop    %rcx
   37bf0:       c3                      ret    
   37bf1:       31 c0                   xor    %eax,%eax
   37bf3:       59                      pop    %rcx
   37bf4:       c3                      ret    
   37bf5:       48 8d 15 c4 df 00 00    lea    0xdfc4(%rip),%rdx        # 45bc0 <__do_global_dtors_aux_fini_array_entry+0x1b68>
   37bfc:       31 ff                   xor    %edi,%edi
   37bfe:       31 f6                   xor    %esi,%esi
   37c00:       e8 7b c5 fc ff          call   4180 <core::panicking::panic_bounds_check>
   37c05:       0f 0b                   ud2    
   37c07:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
   37c0e:       00 00 

0000000000037c10 <<alloc::vec::Vec<T,A> as core::ops::deref::Deref>::deref>:
   37c10:       48 8b 07                mov    (%rdi),%rax
   37c13:       48 8b 57 10             mov    0x10(%rdi),%rdx
   37c17:       c3                      ret    

Version it worked on

rustc --version --verbose:

rustc 1.61.0-nightly (76d770ac2 2022-04-02)
binary: rustc
commit-hash: 76d770ac21d9521db6a92a48c7b3d5b2cc535941
commit-date: 2022-04-02
host: x86_64-unknown-linux-gnu
release: 1.61.0-nightly
LLVM version: 14.0.0

In this version, Vec::deref is inlined in every Rust program I have (LTO=on). In the example above, it is also inlined regardless of whether LTO is used or not.

Version with regression

rustc --version --verbose:

rustc 1.61.0-nightly (6af09d250 2022-04-03)
binary: rustc
commit-hash: 6af09d2505f38e4f1df291df56d497fb2ad935ed
commit-date: 2022-04-03
host: x86_64-unknown-linux-gnu
release: 1.61.0-nightly
LLVM version: 14.0.0

In this version, Vec::deref is never inlined as it seems (LTO=on). In the example above, it is only inlined if LTO is disabled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.P-mediumMedium priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.regression-untriagedUntriaged performance or correctness regression.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions