Description
Code
I tried this code:
use std::collections::HashMap;
pub fn main() {
let mut table = HashMap::new();
for a in 0..0xff {
for b in 0..0xff {
for c in 0..0xff {
for d in 0..0xff {
let hash = 5u64;
let trunc = hash & 0xffffffffffffff00;
table.insert(trunc, (a, b, c, d));
}
}
}
}
for x in 0..0xff {
for y in 0..0xff {
for z in 0..0xff {
let hash = 5u64;
let trunc = hash & 0xffffffffffffff00;
if let Some(orig) = table.get(&trunc) {
println!("Original {orig:?}");
println!("New ({x}, {y}, {z})")
}
}
}
}
}
This slowdown is only visible on some target-cpus (not the default x86 target), for me this was -Ctarget-cpu=znver3
but it also happens on target-cpu=native
on godbolt
https://godbolt.org/z/xqnbfdxKb
I expected to see this happen: It compiles in some reasonable amount of time
Instead, this happened: It takes over 7 minutes to compile on my ryzen 5900X
I'm thinking that the compiler is aggressively trying to unroll the loops and then inline the formatting code (the compile speeds up quite a bit when I remove the prints), but that is just speculation.
Version it worked on
It's hard to track down exactly where this regression happened, but it seems to be at least working on 1.64 (takes ~20s to compile on godbolt), at 1.65 it starts timing on on cpu=znver3
Version with regression
It seems that the regression happened somewhere between 1.65 and 1.72, however I am using nightly
rustc --version --verbose
:
rustc 1.74.0-nightly (2f5df8a94 2023-08-31)
binary: rustc
commit-hash: 2f5df8a94bb3c5fae4e3fcbfc8ef20f1f976cb19
commit-date: 2023-08-31
host: x86_64-unknown-linux-gnu
release: 1.74.0-nightly
LLVM version: 17.0.0
Timings
Timings
time: 0.000; rss: 99MB -> 99MB ( +0MB) module_lints
time: 0.000; rss: 99MB -> 99MB ( +0MB) lint_checking
time: 0.000; rss: 99MB -> 99MB ( +0MB) check_lint_expectations
time: 0.000; rss: 98MB -> 99MB ( +1MB) misc_checking_3
time: 0.000; rss: 99MB -> 100MB ( +0MB) monomorphization_collector_root_collections
time: 0.001; rss: 100MB -> 102MB ( +2MB) Inline
time: 0.000; rss: 102MB -> 102MB ( +0MB) ReferencePropagation
time: 0.001; rss: 102MB -> 103MB ( +1MB) ConstProp
time: 0.012; rss: 100MB -> 116MB ( +16MB) monomorphization_collector_graph_walk
time: 0.001; rss: 116MB -> 117MB ( +2MB) partition_and_assert_distinct_symbols
time: 0.000; rss: 122MB -> 125MB ( +3MB) write_allocator_module
time: 0.008; rss: 128MB -> 148MB ( +20MB) codegen_to_LLVM_IR
time: 0.022; rss: 99MB -> 148MB ( +48MB) codegen_crate
time: 0.002; rss: 148MB -> 119MB ( -28MB) free_global_ctxt
time: 0.004; rss: 116MB -> 118MB ( +3MB) LLVM_lto_optimize(fnv.fb177d82fe6a19f5-cgu.2)
time: 0.020; rss: 116MB -> 127MB ( +11MB) LLVM_lto_optimize(fnv.fb177d82fe6a19f5-cgu.0)
time: 450.155; rss: 116MB -> 135MB ( +19MB) LLVM_lto_optimize(fnv.fb177d82fe6a19f5-cgu.1)
time: 450.215; rss: 137MB -> 136MB ( -2MB) LLVM_passes
time: 0.000; rss: 136MB -> 129MB ( -7MB) join_worker_thread
time: 450.207; rss: 119MB -> 129MB ( +9MB) finish_ongoing_codegen
time: 0.000; rss: 129MB -> 128MB ( -1MB) link_binary_check_files_are_writeable
time: 0.042; rss: 120MB -> 117MB ( -3MB) run_linker
time: 0.043; rss: 129MB -> 117MB ( -12MB) link_binary
time: 0.043; rss: 129MB -> 117MB ( -12MB) link_crate
time: 450.250; rss: 119MB -> 117MB ( -3MB) link
time: 450.292; rss: 33MB -> 100MB ( +67MB) total