Description
I'll just start with some reproduction steps that I'm hoping someone else will be able to reproduce. This assumes you've compiled ripgrep with Rust 1.50 to a binary named rg-stable_1.50
and also compiled ripgrep with Rust nightly 2021-03-09 to a binary named rg-nightly_2021-03-09
(alternatively, compile with the beta release, as I've reproduced the problem there in a subsequent comment):
$ curl -LO 'https://burntsushi.net/stuff/subtitles2016-sample.en.gz'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 265M 100 265M 0 0 32.1M 0 0:00:08 0:00:08 --:--:-- 33.4M
$ gunzip subtitles2016-sample.en.gz
$ time rg-stable_1.50 -c --no-mmap -a '[a-z]' subtitles2016-sample.en
31813587
real 1.601
user 1.467
sys 0.133
maxmem 7 MB
faults 0
$ time rg-nightly_2021-03-09 -c --no-mmap -a '[a-z]' subtitles2016-sample.en
31813587
real 3.973
user 3.837
sys 0.133
maxmem 7 MB
faults 0
Here is the relevant part of the profile I extracted by running the ripgrep compiled with nightly under perf
:
The key difference between Rust nightly and stable is the fact that it looks like i8x32::new
isn't being inlined. But it's not the only one. There are other functions showing up in the profile, like core::core_arch::x86::m256iExt::as_i32x8
, that aren't being inlined either. These are trivial cast functions, and them not being inlined is likely a bug. (So an alternative title for this issue might be, "some trivial functions aren't getting inlined in hot code paths." But I figured I'd start with the actual problem I'm seeing in case my analysis is wrong.)
Initially I assumed that maybe something had changed in stdarch recently related to these code paths, but I don't see anything. So I'm a bit worried that perhaps something else changed that impacted inlining decisions, and this is an indirect effect. Alas, I'm stuck at this point and would love some help getting to the bottom of it.
It's possible, perhaps even likely, that this is related to #60637. I note that it is used to justify some inline(always)
annotations, but fn new
is left at just #[inline]
.
Perhaps there is a quick fix where we need to go over some of the lower level SIMD routines and make sure they're tagged with inline(always)
. But really, it seems to me like these functions really should be inlined automatically. I note that this doesn't look like a cross crate problem that might typically be a reason for preventing inlining. In particular, _mm256_setr_epi8
is being inlined (as one would expect), but the call to i8x32
in its implementation is the thing not being inlined. So this seems pretty suspicious to me.
Apologies for not narrowing this down more. A good next step might be to find the specific version of nightly that introduced this problem.