Description
helloworld5000
is the name I've given to the benchmark that is helloworld
with the println!("Hello world");
repeated 5,000 times. It's an interesting stress test for the compiler.
On my machine, a debug build takes 4.5 seconds and an opt build takes 62(!) seconds.
In the debug build, execution time is dominated by take_and_reset_data
. Cachegrind measures these instruction counts:
28,047,781,500 PROGRAM TOTALS
7,501,239,976 /home/njn/moz/rust0/src/libcore/slice/mod.rs:rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data
5,625,275,003 /home/njn/.cargo/registry/src/github.com-1ecc6299db9ec823/ena-0.9.3/src/snapshot_vec.rs:rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data
The reset_unifications
call within take_and_reset_data
is the expensive part. It all boils down to set_all
within the ena
crate:
. pub fn set_all(&mut self, mut new_elems: impl FnMut(usize) -> D::Value) {
185,003 if !self.in_snapshot() {
. for (slot, index) in self.values.iter_mut().zip(0..) {
5,625,090,000 *slot = new_elems(index);
. }
. } else {
. for i in 0..self.values.len() {
. self.set(i, new_elems(i));
. }
. }
. }
and iterator code (called from set_all
):
. fn next(&mut self) -> Option<$elem> {
. // could be implemented with slices, but this avoids bounds checks
. unsafe {
. if mem::size_of::<T>() != 0 {
1,067,186 assume(!self.ptr.is_null());
20,013 assume(!self.end.is_null());
. }
7,621,400,804 if self.ptr == self.end {
. None
. } else {
. Some($mkref!(self.ptr.post_inc()))
. }
. }
. }
I did some measurement and found that, in the vast majority of cases, reset_unification
is a no-op -- it overwrites the the unification table with the same values that it already has. I wonder if we could do better somehow. It's a shame we have to keep the unbound variables around rather than just clearing them like we do with the other data in take_and_reset_data
. I know that this is an extreme example, but profiles indicate that reset_unifications
is somewhat hot on more normal programs too. @nikomatsakis, any ideas?
In the opt builds, these are the top functions according to Cachegrind:
555,228,191,235 PROGRAM TOTALS
87,515,336,354 /home/njn/moz/rust0/src/llvm/include/llvm/ADT/SmallPtrSet.h:llvm::PointerMayBeCaptured(llvm::Value const*, llvm::CaptureTracker*)
41,641,142,772 /home/njn/moz/rust0/src/llvm/lib/Analysis/CaptureTracking.cpp:llvm::PointerMayBeCaptured(llvm::Value const*, llvm::CaptureTracker*) [/home/njn/moz/rust0/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_codegen_llvm-llvm.so]
35,396,524,596 /home/njn/moz/rust0/src/llvm/include/llvm/ADT/DenseMap.h:llvm::SmallDenseMap<llvm::Instruction const*, unsigned int, 32u, llvm::DenseMapInfo<llvm::Instruction const*>, llvm::detail::DenseMapPair<llvm::Instruction const*, unsigned int> >::grow(unsigned int) [/home/njn/moz/rust0/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_codegen_llvm-llvm.so]
33,096,763,980 /home/njn/moz/rust0/src/llvm/lib/IR/Attributes.cpp:llvm::AttributeList::getAttributes(unsigned int) const [/home/njn/moz/rust0/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_codegen_llvm-llvm.so]
30,008,031,294 /home/njn/moz/rust0/src/llvm/include/llvm/ADT/DenseMap.h:llvm::OrderedBasicBlock::comesBefore(llvm::Instruction const*, llvm::Instruction const*)
29,931,802,152 /home/njn/moz/rust0/src/llvm/lib/IR/Attributes.cpp:llvm::AttributeList::hasAttribute(unsigned int, llvm::Attribute::AttrKind) const [/home/njn/moz/rust0/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_codegen_llvm-llvm.so]
That's a lot of time in PointerMayBeCaptured
.