-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize IntRange::from_pat
, then shrink ParamEnv
#77257
Optimize IntRange::from_pat
, then shrink ParamEnv
#77257
Conversation
Previously, this method called the more general `pat_constructor` function, which can return other pattern variants besides `IntRange`. Then it throws away any non-`IntRange` variants. Specialize it so work is only done when it could result in an `IntRange`.
…uct." This reverts commit ab83d37.
@bors try |
Awaiting bors try build completion |
⌛ Trying commit c4d8089 with merge baa01e8429de2c041f0095c01d4e8ca99c0b11a7... |
☀️ Try build successful - checks-actions, checks-azure |
Queued baa01e8429de2c041f0095c01d4e8ca99c0b11a7 with parent 1ec980d, future comparison URL. |
Finished benchmarking try commit (baa01e8429de2c041f0095c01d4e8ca99c0b11a7): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
(moved to summary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :) Super weird that this was the cause of the slowdown. (A large one at that!)
I'm not quite sure what you mean @vandenheuvel. I think I've gone as far as I care to. If you'd like to investigate further, feel free to keep #77058 open. Perhaps you can do even better. Here's the "slow" assembly ( Excerpt from 0.00 → callq rustc_mir_build::thir::pattern::_match::pat_constructor
0.00 mov 0xb0(%rsp),%cl
0.02 mov 0xc0(%rsp),%r9
0.20 mov 0xb8(%rsp),%rbp
0.01 mov 0xd0(%rsp),%r13
0.01 mov 0xc8(%rsp),%r8
0.01 mov 0xd8(%rsp),%al
0.01 mov 0xd9(%rsp),%edx
32.37 mov %edx,0x40(%rsp)
0.00 mov 0xdc(%rsp),%edx
0.00 mov %edx,0x43(%rsp)
cmp $0x7,%cl
↓ jne 5a1 Excerpt from (I think) 5a1: mov 0x40(%rsp),%edx
20.80 mov 0x43(%rsp),%esi
0.03 mov %esi,0x183(%rsp)
mov %edx,0x180(%rsp)
mov $0x1,%dl
cmp $0x3,%cl
↓ jne 5d9
mov 0x180(%rsp),%ecx
0.01 mov 0x183(%rsp),%edx
23.64 mov %edx,0x23(%rsp)
0.12 mov %ecx,0x20(%rsp)
cmp $0x2,%al
sete %dl
0.07 5d9: mov 0x20(%rsp),%eax
0.01 mov 0x23(%rsp),%ecx
18.20 mov %ecx,0xb3(%rsp)
0.01 mov %eax,0xb0(%rsp)
test %dl,%dl
0.00 ↓ jne 1546
mov %r14,0x18(%rsp)
0.00 mov 0x218(%rsp),%rax
0.00 mov (%rax),%rdi
0.01 mov 0x30(%rbx),%rax
0.01 mov 0x8(%rbx),%r14
0.14 mov 0x10(%rbx),%rsi
mov 0x20(%rbx),%rdx
mov 0x18(%rbx),%r10
movzbl (%rax),%ecx
cmp $0x2,%rcx
↓ je 629 We're repeatedly copying two words into local stack variables, then branching, then reading them again. This sequence has a lot of data hazards, which is clearly slowing down the pipeline significantly. I can't say why LLVM chose to generate code this way, or what causes the data hazards to disappear with my changes to |
I think the tag was meant for @vandenheuvel |
Sorry! |
r=me unless you're waiting on something further, not quite sure. |
@bors r=Mark-Simulacrum |
📌 Commit c4d8089 has been approved by |
☀️ Test successful - checks-actions, checks-azure |
@ecstatic-morse I don't have the expertise for further investigation either, unfortunately. And why I asked whether #77058 would remain open: it seems that the root cause has not yet been found.
So to document that, it seems useful to keep #77058 open, perhaps after renaming it and adding another comment. |
I don't think there's anything actionable left in #77058. If you disagree, you're free to reopen it (you should have the requisite permissions). |
Final perf results are in. Looks to be an improvement everywhere except |
Resolves #77058.
r? @Mark-Simulacrum
cc @vandenheuvel
Looking at the output of
perf report
for #76244, the hot instructions seemed to be around the call topat_constructor
inIntRange::from_pat
. I carried out an obvious optimization, but it actually made the instruction count higher (see #77075). However, it seems to have mitigated whatever was causing the pipeline stalls, so when combined with #76244, it's a net win.As you can see below, the regression in #76244 seems to have originated from something measured by
stalled-cycles-backend
. I'll try to collect some finer-grained stats to see if I can isolate it. I wish I had a better idea of what was going on here. I'd like to prevent the regression from reappearing in the future due to small changes in unrelated code.Current `master`:
Shrink `ParamEnv` without changing `IntRange::from_pat`:
Shrink `ParamEnv` and change `IntRange::from_pat`: