-
Notifications
You must be signed in to change notification settings - Fork 547
Document the usage of cargo-llvm-lines and -Ztimings. #905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,5 +12,83 @@ Depending on what you're trying to measure, there are several different approach | |
See [their docs](https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md) for more information. | ||
|
||
- If you want function level performance data or even just more details than the above approaches: | ||
- Consider using a native code profiler such as [perf](profiling/with_perf.html). | ||
- Consider using a native code profiler such as [perf](profiling/with_perf.html) | ||
- or [tracy](https://github.com/nagisa/rust_tracy_client) for a nanosecond-precision, | ||
full-featured graphical interface. | ||
|
||
- If you want a nice visual representation of the compile times of your crate graph, | ||
you can use [cargo's `-Ztimings` flag](https://doc.rust-lang.org/cargo/reference/unstable.html#timings), | ||
eg. `cargo -Ztimings build`. | ||
You can use this flag on the compiler itself with `CARGOFLAGS="-Ztimings" ./x.py build` | ||
|
||
## Optimizing rustc's bootstrap times with `cargo-llvm-lines` | ||
|
||
Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the | ||
number of lines of LLVM IR across all instantiations of a generic function. | ||
Since most of the time compiling rustc is spent in LLVM, the idea is that by | ||
reducing the amount of code passed to LLVM, compiling rustc gets faster. | ||
|
||
Example usage: | ||
``` | ||
cargo install cargo-llvm-lines | ||
# On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P | ||
|
||
# Do a clean before every run, to not mix in the results from previous runs. | ||
./x.py clean | ||
RUSTFLAGS="--emit=llvm-ir" ./x.py build --stage 0 compiler/rustc | ||
|
||
# Single crate, eg. rustc_middle | ||
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/rustc_middle-a539a639bdab6513.ll > llvm-lines-middle.txt | ||
# Specify all crates of the compiler. (Relies on the glob support of your shell.) | ||
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/*.ll > llvm-lines.txt | ||
``` | ||
|
||
Example output: | ||
``` | ||
Lines Copies Function name | ||
----- ------ ------------- | ||
11802479 (100%) 52848 (100%) (TOTAL) | ||
1663902 (14.1%) 400 (0.8%) rustc_query_system::query::plumbing::get_query_impl::{{closure}} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like another good place to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably. I haven't had much time yet to look into it, but it's tricky because the relevant traits aren't dyn safe. |
||
683526 (5.8%) 10579 (20.0%) core::ptr::drop_in_place | ||
568523 (4.8%) 528 (1.0%) rustc_query_system::query::plumbing::get_query_impl | ||
472715 (4.0%) 1134 (2.1%) hashbrown::raw::RawTable<T>::reserve_rehash | ||
306782 (2.6%) 1320 (2.5%) rustc_middle::ty::query::plumbing::<impl rustc_query_system::query::QueryContext for rustc_middle::ty::context::TyCtxt>::start_query::{{closure}}::{{closure}}::{{closure}} | ||
212800 (1.8%) 514 (1.0%) rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl | ||
194813 (1.7%) 124 (0.2%) rustc_query_system::query::plumbing::force_query_impl | ||
158488 (1.3%) 1 (0.0%) rustc_middle::ty::query::<impl rustc_middle::ty::context::TyCtxt>::alloc_self_profile_query_strings | ||
119768 (1.0%) 418 (0.8%) core::ops::function::FnOnce::call_once | ||
119644 (1.0%) 1 (0.0%) rustc_target::spec::load_specific | ||
104153 (0.9%) 7 (0.0%) rustc_middle::ty::context::_DERIVE_rustc_serialize_Decodable_D_FOR_TypeckResults::<impl rustc_serialize::serialize::Decodable<__D> for rustc_middle::ty::context::TypeckResults>::decode::{{closure}} | ||
81173 (0.7%) 1 (0.0%) rustc_middle::ty::query::stats::query_stats | ||
80306 (0.7%) 2029 (3.8%) core::ops::function::FnOnce::call_once{{vtable.shim}} | ||
78019 (0.7%) 1611 (3.0%) stacker::grow::{{closure}} | ||
69720 (0.6%) 3286 (6.2%) <&T as core::fmt::Debug>::fmt | ||
56327 (0.5%) 186 (0.4%) rustc_query_system::query::plumbing::incremental_verify_ich | ||
49714 (0.4%) 14 (0.0%) rustc_mir::dataflow::framework::graphviz::BlockFormatter<A>::write_node_label | ||
``` | ||
|
||
Since this doesn't seem to work with incremental compilation or `x.py check`, | ||
you will be compiling rustc _a lot_. | ||
I recommend changing a few settings in `config.toml` to make it bearable: | ||
``` | ||
[rust] | ||
# A debug build takes _a third_ as long on my machine, | ||
# but compiling more than stage0 rustc becomes unbearably slow. | ||
optimize = false | ||
|
||
# We can't use incremental anyway, so we disable it for a little speed boost. | ||
incremental = false | ||
# We won't be running it, so no point in compiling debug checks. | ||
debug = false | ||
|
||
# Using a single codegen unit gives less output, but is slower to compile. | ||
codegen-units = 0 # num_cpus | ||
``` | ||
|
||
The llvm-lines output is affected by several options. | ||
`optimize = false` increases it from 2.1GB to 3.5GB and `codegen-units = 0` to 4.1GB. | ||
|
||
MIR optimizations have little impact. Compared to the default `RUSTFLAGS="-Zmir-opt-level=1"`, | ||
level 0 adds 0.3GB and level 2 removes 0.2GB. | ||
Inlining currently only happens in LLVM, but this might change in the future. | ||
|
Uh oh!
There was an error while loading. Please reload this page.