Skip to content

major performance regression between Rust 1.50 and beta when using target-cpu=native #83027

Closed

Description

I'll just start with some reproduction steps that I'm hoping someone else will be able to reproduce. This assumes you've compiled ripgrep with Rust 1.50 to a binary named rg-stable_1.50 and also compiled ripgrep with Rust nightly 2021-03-09 to a binary named rg-nightly_2021-03-09 (alternatively, compile with the beta release, as I've reproduced the problem there in a subsequent comment):

$ curl -LO 'https://burntsushi.net/stuff/subtitles2016-sample.en.gz'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  265M  100  265M    0     0  32.1M      0  0:00:08  0:00:08 --:--:-- 33.4M

$ gunzip subtitles2016-sample.en.gz

$ time rg-stable_1.50 -c --no-mmap -a '[a-z]' subtitles2016-sample.en
31813587

real    1.601
user    1.467
sys     0.133
maxmem  7 MB
faults  0

$ time rg-nightly_2021-03-09 -c --no-mmap -a '[a-z]' subtitles2016-sample.en
31813587

real    3.973
user    3.837
sys     0.133
maxmem  7 MB
faults  0

Here is the relevant part of the profile I extracted by running the ripgrep compiled with nightly under perf:

simd-funs-not-inlined

The key difference between Rust nightly and stable is the fact that it looks like i8x32::new isn't being inlined. But it's not the only one. There are other functions showing up in the profile, like core::core_arch::x86::m256iExt::as_i32x8, that aren't being inlined either. These are trivial cast functions, and them not being inlined is likely a bug. (So an alternative title for this issue might be, "some trivial functions aren't getting inlined in hot code paths." But I figured I'd start with the actual problem I'm seeing in case my analysis is wrong.)

Initially I assumed that maybe something had changed in stdarch recently related to these code paths, but I don't see anything. So I'm a bit worried that perhaps something else changed that impacted inlining decisions, and this is an indirect effect. Alas, I'm stuck at this point and would love some help getting to the bottom of it.

It's possible, perhaps even likely, that this is related to #60637. I note that it is used to justify some inline(always) annotations, but fn new is left at just #[inline].

Perhaps there is a quick fix where we need to go over some of the lower level SIMD routines and make sure they're tagged with inline(always). But really, it seems to me like these functions really should be inlined automatically. I note that this doesn't look like a cross crate problem that might typically be a reason for preventing inlining. In particular, _mm256_setr_epi8 is being inlined (as one would expect), but the call to i8x32 in its implementation is the thing not being inlined. So this seems pretty suspicious to me.

Apologies for not narrowing this down more. A good next step might be to find the specific version of nightly that introduced this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.I-slowIssue: Problems and improvements with respect to performance of generated code.P-highHigh priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.regression-from-stable-to-betaPerformance or correctness regression from stable to beta.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions