Description
openedon Apr 20, 2022
Spawned off of rust-lang/rust#95958 (comment) and discussion at https://zulip-archive.rust-lang.org/stream/247081-t-compiler/performance/topic/triage.202022.2004.2019.html#279492421
Following the hints at https://perf.rust-lang.org/detailed-query.html?commit=dc4bfcbdfff651c82eff4bdd311d28e54d1513c4&base_commit=0d13f6afeba4935499abe0c9a07426c94492c94e&benchmark=unicode-normalization-0.1.19-debug&scenario=full, I proceed to try to do local cachegrind runs in order to understand where the regression was coming from.
- I.e. I was invoking this command:
./target/release/collector profile_local cachegrind +0d13f6afeba4935499abe0c9a07426c94492c94e --rustc2 +dc4bfcbdfff651c82eff4bdd311d28e54d1513c4 --include unicode-normalization-0.1.19 --profiles Debug --scenarios Full
A problem with this was that there was significant variation from run to run, more variation than what we treat as "significant" in our own reporting.
- E.g. one run produced this output: https://gist.github.com/pnkfelix/b7ec15e783ba47e7f21f9db4112c141b : -1,420,813, i.e. fewer instructions executed.
- but another run produced this output: https://gist.github.com/pnkfelix/9882f305cfbdfc11fac32ddd6db499e9 : 2,987,301 more instructions executed.
How much noise/variance/undetermism is acceptable, given our current reporting thresholds?
(Can one do multiple cachegrind runs, accumulate their results, and diff the accumulation? Would that help to counter the variance here?)