Iris uses an Airspeed Velocity (ASV) setup to benchmark performance. This is primarily designed to check for performance shifts between commits using statistical analysis, but can also be easily repurposed for manual comparative and scalability analyses.
The benchmarks are automatically run overnight by a GitHub Action, with any notable shifts in performance being flagged in a new GitHub issue.
On GitHub: a Pull Request can be benchmarked by adding the
benchmark_this
HEAD
and its merge-base with
the PR's base branch, thus showing performance differences introduced
by the PR. (This run is managed by
the aforementioned GitHub Action).
To run locally: the benchmark runner provides conveniences for
common benchmark setup and run tasks, including replicating the automated
overnight run locally. This is accessed via the Nox benchmarks
session - see
nox -s benchmarks -- --help
for detail (see also:
bm_runner.py). Alternatively you can directly run asv ...
commands from this directory (you will still need Nox installed - see
Benchmark environments).
A significant portion of benchmark run time is environment management. Run-time can be reduced by placing the benchmark environment on the same file system as your Conda package cache, if it is not already. You can achieve this by either:
- Temporarily reconfiguring
ENV_PARENT
indelegated_env_commands
in asv.conf.json to reference a location on the same file system as the Conda package cache. - Using an alternative Conda package cache location during the benchmark run,
e.g. via the
$CONDA_PKGS_DIRS
environment variable. - Moving your Iris repo to the same file system as the Conda package cache.
OVERRIDE_TEST_DATA_REPOSITORY
- required - some benchmarks useiris-test-data
content, and your localsite.cfg
is not available for benchmark scripts. The benchmark runner defers to any value already set in the shell, but will otherwise downloadiris-test-data
and set the variable accordingly.DATA_GEN_PYTHON
- required - path to a Python executable that can be used to generate benchmark test objects/files; see Data generation. The benchmark runner sets this automatically, but will defer to any value already set in the shell. Note that Mule will be automatically installed into this environment, and sometimes iris-test-data (seeOVERRIDE_TEST_DATA_REPOSITORY
).BENCHMARK_DATA
- optional - path to a directory for benchmark synthetic test data, which the benchmark scripts will create if it doesn't already exist. Defaults to<root>/benchmarks/.data/
if not set. Note that some of the generated files, especially in the 'SPerf' suite, are many GB in size so plan accordingly.ON_DEMAND_BENCHMARKS
- optional - when set (to any value): benchmarks decorated with@on_demand_benchmark
are included in the ASV run. Usually coupled with the ASV--bench
argument to only run the benchmark(s) of interest. Is set during the benchmark runnercperf
andsperf
sub-commands.ASV_COMMIT_ENVS
- optional - instruct the delegated environment management to create a dedicated environment for each commit being benchmarked when set (to any value). This means that benchmarking commits with different environment requirements will not be delayed by repeated environment setup - especially relevant given the benchmark runner's use of --interleave-rounds, or any time you know you will repeatedly benchmark the same commit. NOTE: Iris environments are large so this option can consume a lot of disk space.
See the ASV docs for full detail.
It is not possible to maintain a full suite of 'unit style' benchmarks:
- Benchmarks take longer to run than tests.
- Small benchmarks are more vulnerable to noise - they report a lot of false positive regressions.
We therefore recommend writing benchmarks representing scripts or single operations that are likely to be run at the user level.
The drawback of this approach: a reported regression is less likely to reveal the root cause (e.g. if a commit caused a regression in coordinate-creation time, but the only benchmark covering this was for file-loading). Be prepared for manual investigations; and consider committing any useful benchmarks as on-demand benchmarks for future developers to use.
Important: be sure not to use the benchmarking environment to generate any test objects/files, as this environment changes with each commit being benchmarked, creating inconsistent benchmark 'conditions'. The generate_data module offers a solution; read more detail there.
Note that ASV re-runs a benchmark multiple times between its setup()
routine.
This is a problem for benchmarking certain Iris operations such as data
realisation, since the data will no longer be lazy after the first run.
Consider writing extra steps to restore objects' original state within the
benchmark itself.
If adding steps to the benchmark will skew the result too much then re-running
can be disabled by setting an attribute on the benchmark: number = 1
. To
maintain result accuracy this should be accompanied by increasing the number of
repeats between setup()
calls using the repeat
attribute.
warmup_time = 0
is also advisable since ASV performs independent re-runs to
estimate run-time, and these will still be subject to the original problem.
Iris benchmarking implements custom benchmark types, such as a tracemalloc
benchmark to measure memory growth. See custom_bms/ for more
detail.
(We no longer advocate the below for benchmarks run during CI, given the limited available runtime and risk of false-positives. It remains useful for manual investigations).
When comparing performance between commits/file-type/whatever it can be helpful
to know if the differences exist in scaling or non-scaling parts of the Iris
functionality in question. This can be done using a size parameter, setting
one value to be as small as possible (e.g. a scalar Cube
), and the other to
be significantly larger (e.g. a 1000x1000 Cube
). Performance differences
might only be seen for the larger value, or the smaller, or both, getting you
closer to the root cause.
Some benchmarks provide useful insight but are inappropriate to be included in
a benchmark run by default, e.g. those with long run-times or requiring a local
file. These benchmarks should be decorated with @on_demand_benchmark
(see benchmarks init), which
sets the benchmark to only be included in a run when the ON_DEMAND_BENCHMARKS
environment variable is set. Examples include the CPerf and SPerf benchmark
suites for the UK Met Office NG-VAT project.
We have disabled ASV's standard environment management, instead using an environment built using the same Nox scripts as Iris' test environments. This is done using ASV's plugin architecture - see asv_delegated_conda.py and the extra config items in asv.conf.json.
(ASV is written to control the environment(s) that benchmarks are run in - minimising external factors and also allowing it to compare between a matrix of dependencies (each in a separate environment). We have chosen to sacrifice these features in favour of testing each commit with its intended dependencies, controlled by Nox + lock-files).