Skip to content

Conversation

@rylev
Copy link
Member

@rylev rylev commented Dec 15, 2020

This adds the ability to tell whether the user is deriving PartialEq and if so only derives partial_cmp when deriving PartialOrd. This builds on previous special casing of built-in derive macros which change how deriving Clone is handled when the user is also deriving Copy. Because the full implementation of PartialOrd is expensive to compile, this leaner PartialOrd might see some perf gains in code bases that derive PartialOrd a lot. For instance, this increases the derive perf benchmark from rustc-perf by ~45% on my machine.

This has the aided side effect of making some diagnostics a little less noisy.

r? @petrochenkov

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 15, 2020
@davidtwco
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Collaborator

bors commented Dec 15, 2020

⌛ Trying commit 824df7b with merge 8844bde20c440d85a1729e12c9131a1d032cd3f2...

@bors
Copy link
Collaborator

bors commented Dec 15, 2020

☀️ Try build successful - checks-actions
Build commit: 8844bde20c440d85a1729e12c9131a1d032cd3f2 (8844bde20c440d85a1729e12c9131a1d032cd3f2)

@rust-timer
Copy link
Collaborator

Queued 8844bde20c440d85a1729e12c9131a1d032cd3f2 with parent e261649, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (8844bde20c440d85a1729e12c9131a1d032cd3f2): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

@bjorn3
Copy link
Member

bjorn3 commented Dec 15, 2020

Clap has up to 2.2% regressions. Derive has up to ~35% improvements.

@rylev
Copy link
Member Author

rylev commented Dec 15, 2020

Hmm... big wins in the derive benchmark as expected but a small regression in clap benchmarks which I didn't expect. I'll have to investigate.

pub enum BuiltinDerive {
Copy,
PartialEq,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually had this enum in the past, but it was represented as bitflags.
See #65892 where it was removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, I didn't realize! I was considering using bitflags, but I didn't want to add a dependency to rustc_expand. I can make this change if we don't close the PR.

@petrochenkov
Copy link
Contributor

Is this a legal thing to do for types that have fields with "weird" implementations of PartialOrd?
Could you document the possible cases a bit more?

@rylev
Copy link
Member Author

rylev commented Dec 15, 2020

@petrochenkov You're correct. Beyond that, the semantics of deriving PartialOrd right not are that the respective PartialOrd implementations for each field will be called which this would break. We can close this, or I can keep going and try to implement mir shims for PartialOrd just like we do for Copy. What do you think? The goal here is to improve compile times.

@petrochenkov
Copy link
Contributor

From the zulip thread it looks like the extra time (compared to manual implementation) is spent on type checking the generated methods.
MIR shims won't help with that, if methods are generated, then they will be type checked even if not used in codegen.
This means in addition to shims #[derive(PartialOrd)] will also have to generate an impl of PartialOrd that is entirely dummy, but recognizable during MIR processing.

@petrochenkov
Copy link
Contributor

Also, MIR shims will be generated for all methods individually, without relying on partial_cmp only, right?
Otherwise the question about weird implementations of PartialOrd apply to them as well.

@petrochenkov
Copy link
Contributor

It's pretty sad that we have to pull things that are perfectly implementable as a (macro) library into the compiler proper due to performance.
What if serde turns out to be slow next, pull it into the compiler as well?

C++ is going into the same direction though.
"Derived" comparison operators in C++20 all go through shims, and the upcoming reflection library facilities (serving the same goals as our derives) will also have access to type information and not just tokens.

@petrochenkov
Copy link
Contributor

By the way, does derive(PartialOrd) have the highest overhead compared to manual implementations?
What are overheads of other derives in percents?

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 16, 2020
@rylev
Copy link
Member Author

rylev commented Dec 17, 2020

@petrochenkov I totally agree that this is less than ideal, and it would be nice if derives were just fast enough, but due to derives needing to handle so many different inputs correctly, they're bound to generate code that is more expensive to compile than hand written versions.

By the way, does derive(PartialOrd) have the highest overhead compared to manual implementations? What are overheads of other derives in percents?

PartialOrd is by far the most expensive to derive of the traits in std. The derive benchmark with #[derive(PartialEq, PartialOrd)] takes 9.1s on my machine while just #[derive(PartialEq)] takes 1.9s. Here's percentages based on a benchmark I did where I created a struct with one field 10,000 times and benchmarked what happened when the structs had various derives on them. I generated the code using the following ruby script:

File.open('src/lib.rs', 'w') do |file|
  0..10_000.times do |n|
      file.write("pub struct MyType#{n} { pub field: i32 }\n")
  end
end
  • Base (i.e., no derives): 061.s
  • #[derive(Debug)]: 13.33s
  • #[derive(PartialEq)]: 9.78s
  • #[derive(PartialEq, PartialOrd)]: 47.46s
  • #[derive(Clone)]: 6.32s
  • #[derive(Clone, Copy)]: 5.32s
  • #[derive(PartialEq, Eq)]: 14.13s
  • #[derive(Default)]: 4.33s
  • #[derive(Hash)]: 6.52s

There's definitely some wiggle room beyond "just making compilation in general faster, will make derives faster". I manually implemented Debug and ran the test again, and it compiles 10% faster.

@petrochenkov since this PR's proposed mechanism won't work, I propose we close this PR and open an issue to track built-in derive performance. Thoughts?

@petrochenkov
Copy link
Contributor

Ok, closing then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants