Description
For some reason, the compiler decides to not inline Vec::deref
anymore if LTO is enabled, which ruins performance in my case. I tried to force the compiler to inline it via -Znew-llvm-pass-manager=no -Cinline-threshold=N
, but even N=10000 (resulting in insane compile times and binary sizes) doesn't convince the compiler to inline the function. This behavior changed between Rust version 76d770a (inlined) and 6af09d2 (not inlined).
Code
I tried to create a more or less minimal example that reproduces the problem, which resulted in the following Rust program:
#[inline(never)]
fn do_something(args: &Vec<String>) -> bool {
args[0] == "test"
}
fn main() {
println!("{}", do_something(&std::env::args().collect::<Vec<_>>()));
}
I expected to see this happen: the Deref
implementation of Vec
is inlined regardless of whether LTO is used or not.
Instead, this happened: If LTO is enabled, the Deref
implementation of Vec
is not inlined. If LTO is disabled, it is inlined. That is, with LTO, the generated assembly code looks like this:
0000000000037bd0 <vec_deref_inline::do_something>:
37bd0: 50 push %rax
37bd1: 48 83 7f 10 00 cmpq $0x0,0x10(%rdi)
37bd6: 74 1d je 37bf5 <vec_deref_inline::do_something+0x25>
37bd8: 48 8b 3f mov (%rdi),%rdi
37bdb: e8 30 00 00 00 call 37c10 <<alloc::vec::Vec<T,A> as core::ops::deref::Deref>::deref>
37be0: 48 83 fa 04 cmp $0x4,%rdx
37be4: 75 0b jne 37bf1 <vec_deref_inline::do_something+0x21>
37be6: 81 38 74 65 73 74 cmpl $0x74736574,(%rax)
37bec: 0f 94 c0 sete %al
37bef: 59 pop %rcx
37bf0: c3 ret
37bf1: 31 c0 xor %eax,%eax
37bf3: 59 pop %rcx
37bf4: c3 ret
37bf5: 48 8d 15 c4 df 00 00 lea 0xdfc4(%rip),%rdx # 45bc0 <__do_global_dtors_aux_fini_array_entry+0x1b68>
37bfc: 31 ff xor %edi,%edi
37bfe: 31 f6 xor %esi,%esi
37c00: e8 7b c5 fc ff call 4180 <core::panicking::panic_bounds_check>
37c05: 0f 0b ud2
37c07: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
37c0e: 00 00
0000000000037c10 <<alloc::vec::Vec<T,A> as core::ops::deref::Deref>::deref>:
37c10: 48 8b 07 mov (%rdi),%rax
37c13: 48 8b 57 10 mov 0x10(%rdi),%rdx
37c17: c3 ret
Version it worked on
rustc --version --verbose
:
rustc 1.61.0-nightly (76d770ac2 2022-04-02)
binary: rustc
commit-hash: 76d770ac21d9521db6a92a48c7b3d5b2cc535941
commit-date: 2022-04-02
host: x86_64-unknown-linux-gnu
release: 1.61.0-nightly
LLVM version: 14.0.0
In this version, Vec::deref
is inlined in every Rust program I have (LTO=on). In the example above, it is also inlined regardless of whether LTO is used or not.
Version with regression
rustc --version --verbose
:
rustc 1.61.0-nightly (6af09d250 2022-04-03)
binary: rustc
commit-hash: 6af09d2505f38e4f1df291df56d497fb2ad935ed
commit-date: 2022-04-03
host: x86_64-unknown-linux-gnu
release: 1.61.0-nightly
LLVM version: 14.0.0
In this version, Vec::deref
is never inlined as it seems (LTO=on). In the example above, it is only inlined if LTO is disabled.