Skip to content

qlibs/perf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

// Overview / Examples / API / FAQ / Resources

perf: C++23 Performance library

License Version Build Try it online

Performance is not a number!

Overview

Single header/module performance library that combines the power of:
c++23, linux/perf, llvm/mca, gnuplot/sixel, ...

Features

Profiling, Tracing, Analyzing, Plotting, Testing, Benchmarking
namespace description API
info hardware/software info compiler, cpu, memory, sys, proc, bin
core low-level code, compiler, cpu, memory
mc disassembling (llvm) assembly, address, encoding, size, uops, latency, rthroughput, may_load, may_store, has_side_effects, ..., source
mca analyzing (llvm/mca) cycles, instructions, uops, timeline, resource_pressure, bottleneck
time timing (rdtsc/clock/chrono) tsc, cpu, thread, real, monotonic, steady_clock, high_resolution_clock
stat counting (linux/perf) instructions, cycles, ..., top_down
record sampling (linux/perf) instructions, cycles, ..., mem_loads, mem_stores, top_down
trace tracing (linux/intel_pt) traces, cycles, tsc
bench benchmarking baseline, latency, throughput
io logging/plotting (gnuplot/sixel) log, spec, json, report, annotate, plot (hist, box, bar, line, ecdf)

Requirements

Minimal
Optimal

Dockerfile / setup

Auxiliary
  • gh - apt-get install gh
  • prof - https://github.com/qlibs/prof
  • ut - https://github.com/qlibs/ut
  • uefi - https://github.com/qlibs/uefi

Dockerfile / setup

Examples

[Tuning]

See FAQ/Setup for more details

Info/Core
  • info::compiler::name

    assert(perf::info::compiler::name() == "clang"s);
  • info::compiler::version

    assert(perf::info::compiler::version() ==
           perf::info::sem_ver{.major = 20, .minor = 0, .patch = 0});
  • info::cpu::name

    assert(perf::info::cpu::name() == "12th Gen Intel(R) Core(TM) i7-12650"s);
  • info::cpu::code_name

    assert(perf::info::cpu::code_name() == "alderlake"s);
  • info::cpu::version

    assert(perf::info::cpu::version() ==
           perf::info::cpu::cpu_ver{.family = 6, .model = 154, .stepping = 3});
  • info::cpu::features

    assert(perf::info::cpu::features() == std::vector{"avx", "avx2", "bmi", ...});
  • info/memory

    assert(perf::info::memory::icache() ==
           std::map{{level::L1, {.size = 32768, .line_size = 64, .assoc = 8}}});
    
    assert(perf::info::memory::dcache() ==
           std::map{{level::L1, {.size = 49152, .line_size = 64, .assoc = 12}}, ...});
  • info/sys

    assert(perf::info::sys::name() == "linux"s);
    assert(perf::info::sys::triple() == "x86_64-pc-linux-gnu"s);
    assert(perf::info::sys::page_size() == 4096b);
  • info/proc

    assert(perf::info::proc::self::name() == "/tmp/perf.out"s);
    assert(perf::info::proc::self::base_address() > 0u);
  • info/bin

    static auto fn = [] {};
    auto&& fn_name = perf::info::bin::addr_to_fn_name(
      perf::info::proc::self::name(),
      std::uint64_t(&fn) - perf::info::proc::self::base_address()
    );
    assert(fn_name.has_value() and *fn_name == "fn"s);
    static auto var = 0;
    auto&& var_name = perf::info::bin::addr_to_name(
      perf::info::proc::self::name(),
      std::uint64_t(&var) - perf::info::proc::self::base_address()
    );
    assert(var_name.has_value() and *var_name == "var"s);
    // addr_to_line # requires debug symbols (-g)
    label:; auto&& source = perf::info::bin::addr_to_line(
      perf::info::proc::self::name(),
      std::uint64_t(&&label) - perf::info::proc::self::base_address()
    );
    assert(source.has_value() and source->contains("source ="));
  • core/memory

    const std::array add{ // x86-64
      0x89, 0xf8,         // mov eax, edi
      0x01, 0xf0,         // add eax, esi
      0xc3                // ret
    };
    
    memory::protect(std::span(add), memory::protection::read |
                                    memory::protection::write |
                                    memory::protection::exec));
    assert(invoke(add, 1, 2) == 3);
    assert(invoke(add, 2, 3) == 5);
  • core/compiler

    // prevent_elision/is_elided
    assert(perf::compiler::is_elided([] { }));
    assert(perf::compiler::is_elided([] {
      int i{};
      i++;
    }));
    assert(not perf::compiler::is_elided([] {
      int i{};
      perf::compiler::prevent_elision(i++);
    }));
Analyzing
  • backend/analyzer

    API

Profiling
  • prof/timer

    perf::time::timer t{perf::time::steady_clock};
    
    t.start();
    fn();
    t.stop();
    
    assert(t[perf::time::steady_clock] > perf::time::duration<perf::time::ns>(0.));
    perf::time::timer t{perf::time::steady_clock, perf::time::cpu};
    
    t.start();
    fn();
    t.stop();
    
    assert(t[perf::time::steady_clock] > perf::time::duration<perf::time::ns>(0.));
    assert(t[perf::time::cpu] > perf::time::duration<perf::time::ns>(0.));
    
    // `t[]` - returns std::tuple of all timers
    assert(std::get<0u>(t[]) > perf::time::duration<perf::time::ns>(0.)); // steady_clock
    assert(std::get<1u>(t[]) > perf::time::duration<perf::time::ns>(0.)); // time_cpu

    API

    perf::time::steady_clock          - monotonic-time
    perf::time::high_resolution_clock - highest available resolution clock
    perf::time::cpu                   - user-time + sys-time
    perf::time::thread                - cpu-time for the current thread
    perf::time::real                  - wall-time
    perf::time::monotonic             - guranateed to be always increasing
    perf::time::tsc                   - time-stamp-counter
  • prof/counter

    // metrics/dsl
    // top_down
  • prof/sampler

  • https://github.com/qlibs/prof

Tracing
  • prof/tracer

  • prof/trace

Plotting
  • plot/gnuplot

    Note: See Benchmarking section for bench related plotting

Testing
  • utility/verify

    perf::verify(fn...assembly);
    perf::verify(fn...cycles);
  • test

    perf::test();
    perf::test({.verbose = true});

    -DNTEST - disables compile-time/run-time tests

  • https://github.com/qlibs/ut

    import ut;
    
    "benchmark"_test = [] {
      // ...
    };
Benchmarking
  • runner
    auto fizz_buzz = [](int n) {
      if (n % 15 == 0) {
        return "FizzBuzz";
      } else if (n % 3 == 0) {
        return "Fizz";
      } else if (n % 5 == 0) {
        return "Buzz";
      } else {
        return "Unknown";
      }
    };
    
    perf::runner bench{perf::bench::latency{}}; // what and how
    
    bench(fizz_buzz, 15);
    bench(fizz_buzz, 3);
    bench(fizz_buzz, 5);
    
    perf::report(bench[perf::time::steady_clock, perf::bench::operations, perf::bench::samples]); // total time

^^bench[...] == ^^std::vector<perf::name, std::tuple<named<Name, std::vector<Ts>...>>>

  • data

    bench(fizz_buzz, perf::data::sequence<int>{{3,5,15}});
    bench(fizz_buzz, perf::data::uniform<int>{.min = 0, .max = 15});
    // choice

    API

  • latency vs. throughput

    auto add  = [](int a, int b) { return a + b; };
    auto sub  = [](int a, int b) { return a - b; };
    auto mult = [](int a, int b) { return a * b; };
    auto div  = [](int a, int b) { return a / b; };
    
    perf::runner bench{perf::bench::latency{}};
    
    bench(add,  0, 0); // invoke(add, 0, 0)
    bench(sub,  0, 0); // invoke(sub, 0, 0)
    bench(mult, 0, 0); // invoke(mult, 0, 0)
    bench(div,  0, 0); // invoke(div, 0, 0)
    
    using perf::dsl::operator/;
    perf::report(bench[perf::time::tsc / perf::bench::operations,     // ns/op
                       perf::stat::cycles / perf::bench::operations]  // cyc/op
    );
    inline constexpr auto latency = perf::time::steady_clock / perf::bench::operations;
    inline constexpr auto throughput = perf::bench::operations / perf::time::steady_clock;
    inline constexpr auto inverse_throughput = perf::time::steady_clock / perf::bench::operations;
  • bench/policy

  • bench/baseline

  • bench/debug

  • io/report

  • io/annotate

  • io/plot

    // perf::plot::hist
    // perf::plot::bar
    // perf::plot::box
    // perf::plot::line
    // perf::plot::ecdf
    // complexity
Exporting/Sharing

API

Configuration
/**
 * PERF version # https://semver.org
 */
#define PERF (MAJOR, MINOR, PATCH) // ex. (1, 0, 0)

/**
 * GNU # default: deduced based on `__GNUC__`
 * - 0 not compatible
 * - 1 compatible
 */
#define PERF_GNU 0/1

/**
 * Linux # default: deduced based on `__linux__`
 * - 0 not supported
 * - 1 supported
 */
#define PERF_LINUX 0/1

/**
 * LLVM # default: deduced based on `llvm-dev` headers
 * - 0 not supported
 * - 1 supported
 */
#define PERF_LLVM 0/1

/**
 * Intel Processor Trace # default: deduced based on `intel_pt` headers
 * - 0 not supported
 * - 1 supported
 */
#define PERF_INTEL 0/1

/**
 * I/O support # default: 1
 * - 0 not compiled in
 * - 1 supported (`log, json, report, annotate, plot`)
 */
#define PERF_IO 0/1

/**
 * tests # default: not-defined
 * - defined:     disables all compile-time, run-time tests
 * - not-defined: compile-time tests executed,
 *                run-time tests available by `perf::test()` API
 */
#define NTEST
/**
 * gnuplot terminal # see `gnuplot -> set terminal` # default: 'sixel'
 * - 'sixel'                  # console image # https://www.arewesixelyet.com
 * - 'wxt'                    # popup window
 * - 'dumb size 150,25 ansi'  # console with colors
 * - 'dumb size 80,25'        # console
 */
ENV:PERF_IO_PLOT_TERM

/**
 * style # default: dark
 * - light
 * - dark
 */
ENV:PERF_IO_PLOT_STYLE
Synopsis

FAQ

Setup
  • How to setup perf docker?

    Dockerfile

    docker build -t perf .
    docker run \
      -it \
      --privileged \
      --network=host \
      -e DISPLAY=${DISPLAY} \
      -v ${PWD}:${PWD} \
      -w ${PWD} \
      perf
  • How to install perf depenencies?

    apt-get install linux-tools-common # linux-perf (perf::stat/perf::record)
    apt-get install llvm-dev           # llvm (perf::mc/perf::mca)
    apt-get install libipt-dev         # libipt (perf::trace)
    apt-get install gnuplot            # (perf::plot)
  • How to setup linux performance counters?

    scripts/setup.sh

    .github/scripts/setup.sh --perf # --rdpmc --max-sample-rate 10000

    linux

    sudo mount -o remount,mode=755 /sys/kernel/debug
    sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
    sudo chown `whoami` /sys/kernel/debug/tracing/uprobe_events
    sudo chmod a+rw /sys/kernel/debug/tracing/uprobe_events
    echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
    echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
    echo 1000 | sudo tee /proc/sys/kernel/perf_event_max_sample_rate
    echo 2 | sudo tee /sys/devices/cpu_core/rdpmc
  • How to reduce execution variability?

    scripts/tune.sh

    .github/scripts/tune.sh

    pyperf - pip3 install pyperf

    sudo pyperf system tune
    sudo pyperf system show
    sudo pyperf system reset

    linux

    # Set Process CPU Affinity (apt install util-linux)
    taskset -c 0 ./a.out
    
    # Set Process Scheduling Priority (apt install coreutils)
    nice -n -20 taskset -c 0 ./a.out # -20..19 (most..less favorable to the process)
    
    # Disable CPU Frequency Scaling (apt install cpufrequtils)
    sudo cpupower frequency-set --governor performance
    # cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
    # Disable Address Space Randomization
    echo 0 > /proc/sys/kernel/randomize_va_space
    
    # Disable Processor Boosting
    echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost
    
    # Disable Turbo Mode
    echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
    
    # Disable Hyperthreading/SMT
    echo off | sudo tee /sys/devices/system/cpu/smt/control
    
    # Restrict memory to a single socket
    numactl -m 0 -N 0 ./a.out
    
    # Enable Huge Pages
    sudo numactl --cpunodebind=1 --membind=1 hugeadm \
      --obey-mempolicy --pool-pages-min=1G:64
    sudo hugeadm --create-mounts

    bootloader

    # Enable Kernel Mode Task-Isolation (https://lwn.net/Articles/816298)
    # cat /sys/devices/system/cpu/isolated
    isolcpus=<cpu number>,...,<cpu number>
    
    # Disable P-states and C-states
    # cat /sys/devices/system/cpu/intel_pstate/status
    idle=pool intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=1
    
    # Disable NMI watchdog
    # cat /proc/sys/kernel/nmi_watchdog
    nmi_watchdog=0

    uefi - https://github.com/qlibs/uefi

Usage Guide
  • How to use perf with modules?

    clang

    clang++ -std=c++23 -O3 -I. --precompile perf.cppm
    clang++ -std=c++23 -O3 -fprebuilt-module-path=. perf.pcm *.cpp -lLLVM -lipt
    import perf;
  • How to change assembly syntax?

    perf::llvm llvm{
      {.syntax = perf::arch::syntax::att} // default: intel
    };
  • How to disassemble for a different platform?

    perf::llvm llvm{
      .triple = "x86_64-pc-linux-gnu" // see `llvm-llc` for details
    };
  • How to write custom profiler?

    struct profiler {
      // starts profiling
      constexpr auto start();
    
      // stops profiling
      constexpr auto stop();
    
      // filter results
      [[nodiscard]] constexpr auto operator[](Ts...) const;
    };
    static_assert(perf::profiler_like<profiler>);

    See https://github.com/qlibs/prof

  • How to integrate with unit-testing framework?

    import perf;
    import ut; // https://github.com/qlibs/ut
    
    int main() {
      "benchmark"_test = [] {
        // ...
      };
    }
  • Which terminal can display images?

    Any terminal with sixel support - https://www.arewesixelyet.com

  • How to plot with popup windows?

    PERF_IO_PLOT_TERM='wxt' ./a.out
  • How to plot without sixel?

    PERF_IO_PLOT_TERM='dumb' ./a.out
    PERF_IO_PLOT_TERM='dumb size 80,25' ./a.out
    PERF_IO_PLOT_TERM='dumb size 150,25 ansi' ./a.out
  • How to change plot style?

    PERF_IO_PLOT_STYLE='dark' ./perf # default
    PERF_IO_PLOT_STYLE='light' ./perf
  • How to save plot?

    perf::plot::gnuplot plt{{.term = "png"}};
    plt.send("set output 'output.png'");
    perf::plot::bar(plt, ...);
  • How to export results?

    scripts/export.sh

    ./a.out 2>&1 | .github/scripts/export.sh markdown > results.md
    ./a.out 2>&1 | .github/scripts/export.sh notebook > results.ipynb
    ./a.out 2>&1 | .github/scripts/export.sh html > results.html
  • How to share results?

    gh - apt-get install gh

    # https://jbt.github.io/markdown-editor
    gh gist create --public --web results.md
    # https://jupyter.org
    gh gist create --public --web results.ipynb
    # https://htmlpreview.github.io
    gh gist create --public --web results.html
  • How to integrate with jupyter?

    jupyter can be used for data analysis (python)

    perf::json(...); // save to json file
    # apt install jupyter
    jupyter notebook -ip 0.0.0.0 --no-browser notebook.ipynb # read from saved json
  • How perf tests are working?

    compile-time tests are executed upon include/import (enabled by default)
    run-time/sanity check tests can be executed at run-time

    int main() {
      perf::test({.verbose = true}); // run-time/sanity check tests
    }

    -DNTEST can be used to disable tests (not recommended)

    $CXX -DNTEST ... # tests will NOT be compiled in

    perf tests execution model

    #ifndef NTEST
    "perf"_suite = [] {
      "run-time and compile-time"_test = [] constexpr {
        expect(3 == accumulate({1, 2, 3}, 0));
      };
    
      "run-time"_test = [] mutable {
        expect(std::rand() >= 0);
      };
    
      "compile-time"_test = [] consteval {
        expect(sizeof(int) == sizeof(0));
      };
    };
    #endif
Performance

Resources

Specification
Multimedia
Workbench
Tooling

License

MIT/Apache2+LLVM
license namespace guard description
MIT perf::* - https://opensource.org/license/mit
Apache2+LLVM perf::mca::* PERF_LLVM == 1 https://llvm.org/LICENSE.txt

LICENSE

Releases

No releases published

Languages