Simple Fortran parallel IO benchmark for teaching and benchmarking purposes.
Benchio builds on a one-dimensional parallel IO benchmark previously developed under the EU-funded EUFORIA project. See "High Performance I/O", Adrian Jackson, Fiona Reid, Joachim Hein, Alejandro Soba and Xavier Saez; https://ieeexplore.ieee.org/document/5739034/.
ADIOS2 functionality was added by Stephen Farr under the EU-funded EuroCC project.
Note that, before running the benchmark, you must set the Lustre striping on the three directories unstriped
, striped
and fullstriped
.
- Set
unstriped
to have a single stripe:lfs setstripe -c 1 unstriped
- Set
fullstriped
to use the maximum number of stripes:lfs setstripe -c -1 fullstriped
- Set
striped
to use an intermediate number of stripes, e.g. for 4 stripes:lfs setstripe -c 4 striped
The program has a very basic set of command-line options. The first three arguments must be the dimensions of the dataset; the fourth argument specifies if these are local sizes (i.e. weak scaling), or global sizes (strong scaling).
For example, to run using a 256 x 256 x 256 data array on every process (i.e. weak scaling):
benchio 256 256 256 local
In this case, the total file size will scale with the number of processes. If run on 8 processes then the total file size would be 1 GiB.
To run using a 256 x 256 x 256 global array (i.e. strong scaling):
benchio 256 256 256 global
In this case, the file size will be 128 MiB regardless of the number of processes.
If the local array size is n1 x n2 x n3, then the double precision
arrays are defined with halos as: double precision :: iodata(0:n1+1, 0:n2+1, 0:n3+1)
.
A 3D cartesian topology p1 x p2 x p3 is created with dimensions
suggested by MPI_Dims_create()
to create a global 3D array of size
l1 x l2 x l3 where l1 = p1 x n1 etc.
The entries of the distributed IO array are set to globally unique values 1, 2, ... l1xl2xl3 using the normal Fortran ordering; the halo values are set to -1. When writing to file, the halos are omitted.
The code can use seven IO methods, and for each of them can use up to three directories with different stripings.
All files are deleted immediately after being written to avoid excess disk usage.
The full set of options is:
benchio (n1, n2, n3) (local|global)
[serial] [proc] [node] [mpiio] [hdf5] [netcdf] [adios]
[unstriped] [striped] [fullstriped]
If only the first four mandatory arguments are specified then all six IO methods and all three stripings are used. However, you can pick subsets by setting additional optional command-line options.
serial
: Serial IO from one controller process to a single fileserial.dat
using Fortran binary unformatted write withaccess = stream
proc
: File-per-process with multiple serial IO to P filesrankXXXXXX.dat
using Fortran binary unformatted write withaccess = stream
node
: File-per-node with multiple serial IO to Nnode filesnodeXXXXXX.dat
using Fortran binary unformatted write withaccess = stream
mpiio
: MPI-IO collective IO to a single filempiio.dat
using native (i.e. binary) formathdf5
: HDF5 collective IO to a single filehdf5.dat
netcdf
: NetCDF collective IO to a single filenetcdf.dat
adios
: ADIOS2 collective IO to a BP5 directoryadios.dat
- ADIOS2 aggregator settings can be changed in the
adios2.xml
file
- ADIOS2 aggregator settings can be changed in the
Note that the serial part is designed to give a baseline IO rate. For simplicity, and to ensure we write the same amount of data as for the parallel
methods, rank 0 writes out its
own local array size
times in succession. Unlike the parallel IO formats, the contents of the file will therefore not be a linearly increasing set of
values 1, 2, 3, ..., l1xl2xl3.