Measures calculations execution/transfer time for both CPU and GPU (CUDA).
Simply build with make
, no external dependencies apart from the toolchain, see Requirements.
The options are straightforward but they may change with future updates.
Run cuda-benchmark
with no arguments for help.