Skip to content

[BUG] Threads time taken report. #1834

@manuelgustavo

Description

@manuelgustavo

Describe the bug
A benchmark that has only sleep_for(10s) gets reported / number of threads.

System
Which OS, compiler, and compiler version are you using:

  • OS: Linux / Ubuntu 22.04
  • Compiler and version: clang-14

To reproduce
Steps to reproduce the behavior:

create a benchmark that only hwas a std::this_thread::sleep_for(10s).
Execute it using ->Threads(2);

Expected behavior
It should report 10s, but only reports 5s.

Additional context

I understand that this issue may include several changes downstream.

However, I can't find it right the that a code, which is takes 1s to run in 1 thread (as per sleep_for(1s)), gets reported as 0.5s in 2 threads, 0.25 in 4 threads and so on.

"A question for you: Would you agree that a code that just has a wait time of 1s should be reported that it took 1s to run, regardless of the number of threads?"

Digging deeper into the code:

The following function DoNIterations() will call RunInThread():

BenchmarkRunner::IterationResults BenchmarkRunner::DoNIterations() {
...
RunInThread(&b, iters, 0, manager.get(), perf_counters_measurement_ptr);

And RunInThread() will eventually post its results

State st =
      b->Run(iters, thread_id, &timer, manager, perf_counters_measurement);
  BM_CHECK(st.skipped() || st.iterations() >= st.max_iterations)
      << "Benchmark returned before State::KeepRunning() returned false!";
  {
    MutexLock l(manager->GetBenchmarkMutex());
    internal::ThreadManager::Result& results = manager->results;
    results.iterations += st.iterations();
    results.cpu_time_used += timer.cpu_time_used();
    results.real_time_used += timer.real_time_used();
    results.manual_time_used += timer.manual_time_used();
    results.complexity_n += st.complexity_length_n();
    internal::Increment(&results.counters, st.counters);
  }

But then further down in RunInThread you have:

  // Adjust real/manual time stats since they were reported per thread.
  i.results.real_time_used /= b.threads();
  i.results.manual_time_used /= b.threads();
  // If we were measuring whole-process CPU usage, adjust the CPU time too.
  if (b.measure_process_cpu_time()) i.results.cpu_time_used /= b.threads();

So, you're averaging the times with the number of threads, but the number of iterations doesn't get average causing the discrepancy in measurements that use the ->Threads([greater than 1]) functionality.

Since the averaging happens later with the total number of iterations, eliminating the time adjusting (all of =/ b.threads();) from the code would be the best solution. Also, this could be an optional parameter (for initial testing purposes).

Again, this would only be relevant for benchmarks that use the ->Threads function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions