-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
A benchmark that has only sleep_for(10s) gets reported / number of threads.
System
Which OS, compiler, and compiler version are you using:
- OS: Linux / Ubuntu 22.04
- Compiler and version: clang-14
To reproduce
Steps to reproduce the behavior:
create a benchmark that only hwas a std::this_thread::sleep_for(10s).
Execute it using ->Threads(2);
Expected behavior
It should report 10s, but only reports 5s.
Additional context
I understand that this issue may include several changes downstream.
However, I can't find it right the that a code, which is takes 1s to run in 1 thread (as per sleep_for(1s)), gets reported as 0.5s in 2 threads, 0.25 in 4 threads and so on.
"A question for you: Would you agree that a code that just has a wait time of 1s should be reported that it took 1s to run, regardless of the number of threads?"
Digging deeper into the code:
The following function DoNIterations() will call RunInThread():
BenchmarkRunner::IterationResults BenchmarkRunner::DoNIterations() {
...
RunInThread(&b, iters, 0, manager.get(), perf_counters_measurement_ptr);And RunInThread() will eventually post its results
State st =
b->Run(iters, thread_id, &timer, manager, perf_counters_measurement);
BM_CHECK(st.skipped() || st.iterations() >= st.max_iterations)
<< "Benchmark returned before State::KeepRunning() returned false!";
{
MutexLock l(manager->GetBenchmarkMutex());
internal::ThreadManager::Result& results = manager->results;
results.iterations += st.iterations();
results.cpu_time_used += timer.cpu_time_used();
results.real_time_used += timer.real_time_used();
results.manual_time_used += timer.manual_time_used();
results.complexity_n += st.complexity_length_n();
internal::Increment(&results.counters, st.counters);
}But then further down in RunInThread you have:
// Adjust real/manual time stats since they were reported per thread.
i.results.real_time_used /= b.threads();
i.results.manual_time_used /= b.threads();
// If we were measuring whole-process CPU usage, adjust the CPU time too.
if (b.measure_process_cpu_time()) i.results.cpu_time_used /= b.threads();So, you're averaging the times with the number of threads, but the number of iterations doesn't get average causing the discrepancy in measurements that use the ->Threads([greater than 1]) functionality.
Since the averaging happens later with the total number of iterations, eliminating the time adjusting (all of =/ b.threads();) from the code would be the best solution. Also, this could be an optional parameter (for initial testing purposes).
Again, this would only be relevant for benchmarks that use the ->Threads function.