Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIR-OPT: Pass to deduplicate blocks #77551

Merged
merged 3 commits into from
Feb 22, 2021

Conversation

simonvandel
Copy link
Contributor

@simonvandel simonvandel commented Oct 4, 2020

This pass finds basic blocks that are completely equal,
and replaces all uses with just one of them.

$ RUSTC_LOG=rustc_mir::transform::deduplicate_blocks ./x.py build --stage 2 | grep "SUCCESS: Replacing: " > log
...
$ cat log | wc -l
23875

@rust-highfive
Copy link
Collaborator

r? @oli-obk

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 4, 2020
@jonas-schievink
Copy link
Contributor

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Oct 4, 2020

⌛ Trying commit 6d10adca99e598a96de8dc438de2b2bbfe974fc7 with merge e8bab90e8bb7eaee4e142e99267ec4dacdfd3985...

@bors
Copy link
Contributor

bors commented Oct 4, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: e8bab90e8bb7eaee4e142e99267ec4dacdfd3985 (e8bab90e8bb7eaee4e142e99267ec4dacdfd3985)

@rust-timer
Copy link
Collaborator

Queued e8bab90e8bb7eaee4e142e99267ec4dacdfd3985 with parent 4ccf5f7, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (e8bab90e8bb7eaee4e142e99267ec4dacdfd3985): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@simonvandel
Copy link
Contributor Author

Performance does not look good. I'll setup a profiler in the coming days to see if it can be improved.

fn rvalue_eq(lhs: &Rvalue<'tcx>, rhs: &Rvalue<'tcx>) -> bool {
let res = match (lhs, rhs) {
(
Rvalue::Use(Operand::Constant(box Constant { user_ty: _, literal, span: _ })),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason for not using == is to not compare span. See #77549 (comment)
Should we have a compare_without_user_info? Not comparing span is an issue that most optimization passes would like to solve I guess

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a riddiculous footgun. I'll open an issue to discuss this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonvandel
Copy link
Contributor Author

Performance should be quite a bit better with the latest commit. Unnecessary tuple combinations are still being created though. I'll see if this can be removed.
If the perf queue is empty, it would be interesting to see how it fares now.

@simonvandel
Copy link
Contributor Author

Unnecessary tuple combinations should be gone now. Can I get a perf run again?

@oli-obk
Copy link
Contributor

oli-obk commented Oct 6, 2020

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Oct 6, 2020

⌛ Trying commit 44b87ce06d5582a00b5b21688834e8d03f7b5600 with merge 50438f458aff5610389abb267c5b5b89fbb68b04...

@bors
Copy link
Contributor

bors commented Oct 6, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 50438f458aff5610389abb267c5b5b89fbb68b04 (50438f458aff5610389abb267c5b5b89fbb68b04)

@rust-timer
Copy link
Collaborator

Queued 50438f458aff5610389abb267c5b5b89fbb68b04 with parent a1dfd24, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (50438f458aff5610389abb267c5b5b89fbb68b04): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@simonvandel
Copy link
Contributor Author

Based on the perf results, there is absolutely still room for improvement. I'll see what can be done after some profiling.

The regressions in optimized_mir make sense as we are doing more work (and maybe too much), but it's not intuitive to mee why the LLVM passes seem to regress. Intuitively, when handing fewer basic blocks to LLVM, it itself has to do less work.

Any ideas? I guess we can verify that we are indeed handing simpler IR to LLVM using cargo-lines. If it turns out we are not, the problem could be that deduplicating blocks in Mir confuses some of the existing Mir opt passes.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 7, 2020

You can basically ignore these llvm regressions as they are very small and only during incremental compilation afaict. They happen because the number of codegen units changes when a function gets optimized too well in MIR.

One thing that I found curious is that the ctfe stress tests are regressing during mir interpretation. Shouldn't we get less code and thus do less work?

About the optimized_mir regressions, maybe it's time to include optimization fuel so that the optimization is aborted if it becomes too expensive?

debug!("tuple_iter: {:?}", tuple_iter);

let statementkinds1 = body.basic_blocks()[item1].statements.iter().map(|x| &x.kind);
let statementkinds2 = body.basic_blocks()[item2].statements.iter().map(|x| &x.kind);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you can gain some performance by moving these two lines out of the while loop, so we're not evaluating them many times?

// - This is needed for itertools::group_by which only assigns consecutive elements to the same group
// 2. Group by (length of statements, TerminatorKind)
// 3. Compare StatementKinds pairwise inside the group
// - This is technically O(n²), but `n` should be small in most cases

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the reason why the unicode_normalization benchmark is slow is that n is actually large, I wonder if we know why that is? Maybe a large table of constants or something?


let mut tuple_iter = TupleCombinationsWithSkip::new(group);
debug!("tuple_iter: {:?}", tuple_iter);
while let Some((item1, item2)) = tuple_iter.next() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely the most common case are (len statements, terminator kind) groups of size one, right? Could it be beneficial to filter them out early, since there obviously can't be any duplicates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I have added a filtering group size > 1 in the latest commits

@simonvandel
Copy link
Contributor Author

The pass should be a lot faster with the latest changes. Can I get another perf run?

@spastorino
Copy link
Member

Adding T-compiler label so this can be picked up by our automated prioritization tool and added to the following T-compiler weekly meeting agenda.

@oli-obk oli-obk removed I-needs-decision Issue: In need of a decision. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 31, 2020
@oli-obk
Copy link
Contributor

oli-obk commented Dec 31, 2020

We discussed it at the compiler team meeting. A few points came up

  • The metric of "how many times is it run" is not as useful as "how many statements were deduplicated because of this"
  • Do we get any runtime benefit out of it, like are subsequent (or LLVM) optimizations more effective now?
  • Can it be more effective if we allow it to ignore spans and other debug info for finding duplicates? (actual deduplication should merge, not drop such info)
  • Should we just put it into mir-opt-level=3 for now?

@simonvandel
Copy link
Contributor Author

  • The metric of "how many times is it run" is not as useful as "how many statements were deduplicated because of this"

Yeah, I agree.

  • Do we get any runtime benefit out of it, like are subsequent (or LLVM) optimizations more effective now?

This would be interesting to check, yeah. The runtime benchmarks would have to be cherry-picked though to actually have duplicated blocks. rustc-perf Webrender triggered the deduplication removed ~2% of statements last i tried, but i don't see Webrender having any benchmarks that can easily be run. Any ideas?

  • Can it be more effective if we allow it to ignore spans and other debug info for finding duplicates? (actual deduplication should merge, not drop such info)

Ignoring spans and debug info would be a great improvement to finding duplicates for sure. I'm not sure how to do deduplication without throwing some span away. How would the pass know which of multiple spans to preserve?

  • Should we just put it into mir-opt-level=3 for now?

Until clear compile-time wins (or runtime) improvements are seen, yeah, it should. I guess next step would be to figure out how to ignore spans, while not throwing information away.

For what is Span used for if a local is not used for debuginfo? Constants, for instance, seem to get different spans quite often. See https://godbolt.org/z/YK6j8W .

@oli-obk
Copy link
Contributor

oli-obk commented Jan 3, 2021

Ignoring spans and debug info would be a great improvement to finding duplicates for sure. I'm not sure how to do deduplication without throwing some span away. How would the pass know which of multiple spans to preserve?

I had a similar issue with MIR inlining. All the spans were pointing into the inlined MIR body, and all I could do was rewrite them to point at the callsite where it was inlined. Neither solution was good. So I implemented another expansion variant. Look at the ui test diff (all the way at the bottom if the github link doesn't work) to see how such a span could be rendered.

I think in this situation, since there is no hierarchy of spans, I would create a span that maybe points to whatever the two merged spans have in common, and if they do not have this, make an arbitrary choice for the root span. Then the two spans can be put into an ExpansionInfo variant and the span rendering can do something similar to the inlining span thing.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 3, 2021

So for the quickest way forward, let's put it into mir-opt-level 3 and merge it, then figure out the span and source_info handling (which we should probably do for more optimizations anyway).

@camelid camelid added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 15, 2021
@pnkfelix pnkfelix added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Jan 21, 2021
@JohnCSimon
Copy link
Member

Ping from triage: @simonvandel can you post your status on this PR and resolve the merge conflicts? Thank you

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 14, 2021
@simonvandel
Copy link
Contributor Author

@oli-obk rebased on master, and gated the deduplication behind mir-opt-level >= 3

@oli-obk
Copy link
Contributor

oli-obk commented Feb 22, 2021

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Feb 22, 2021

📌 Commit 2d1e0ad has been approved by oli-obk

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2021
@bors
Copy link
Contributor

bors commented Feb 22, 2021

⌛ Testing commit 2d1e0ad with merge 15598a8...

@bors
Copy link
Contributor

bors commented Feb 22, 2021

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 15598a8 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 22, 2021
@bors bors merged commit 15598a8 into rust-lang:master Feb 22, 2021
@rustbot rustbot added this to the 1.52.0 milestone Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir-opt Area: MIR optimizations merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.