___
(OvO)
< . >
--"-"---
OvO is a collection of OpenMP Offloading test functions for C++ and Fortran.
$ git clone git@github.com:TApplencourt/OvO.git ; cd OvO
# Set your env and run
$ OMP_TARGET_OFFLOAD=mandatory CXX="g++" CXXFLAGS="-fopenmp" FC="gfortran" FFLAGS="-fopenmp"./ovo.sh run
$ ./ovo.sh report
>> Overall result test_result/1957-04-01_19-02_CDC6600.lanl.gov
pass rate(%) test(#) success(#) compilation error(#) runtime error(#) wrong value(#) hang(#)
-------------- --------- ------------ ---------------------- ------------------ ---------------- ---------
68% 310 212 0 0 96 2
OvO is a collection of OpenMP offloading tests for C++ and FORTRAN. OvO is focused on testing extensively hierarchical parallelism and mathematical functions.
You can find the slides of some OvO presentations we did in the documentation folder.
$ ./ovo.sh -h
OvO an OpenMP test generator.
Usage:
ovo.sh gen
ovo.sh gen tiers [1|2|3]
ovo.sh gen hierarchical_parallelism [--test_type [atomic|reduction|memcopy|ordered]...]
[--data_type [float|'complex<float>'|
double|'complex<double>'|
REAL|COMPLEX|
'DOUBLE PRECISION'|'DOUBLE COMPLEX']... ]
[--loop_pragma [True|False] ]
[--paired_pragmas [True|False] ]
[--multi_devices [True|False] ]
[--host_threaded [True|False] ]
[--intermediate_result [True|False] ]
[--no_user_defined_reduction [True|False] ]
[--collapse [N]...]
[--append]
ovo.sh gen mathematical_function [--standard [cpp11|cpp17|cpp20|gnu|f77]... ]
[--complex [True|False] ]
[--long [True|False] ]
[--simdize [N]...]
[--append]
ovo.sh run [<test_folder>...]
ovo.sh report [ --summary | --failed | --passed ] [--tablefmt (github|tsv|jira)] ] [<result_folder>...]
ovo.sh clean
ovo.sh (-h | --help)
Bug reports and PRs are more than welcome! The OpenMP specification can be tricky. And we use metaprogramming to generate the tests so, to pastiche Jamie Zawinski:
Some people, when confronted with a problem, think "I know, I'll use
regular expressionsmetaprogramming." Now they have two problems.
- python3
- OpenMP compiler (obviously). We recommend an OpenMP 5.0 spec-compliant compiler. Some test
mapandreducea variable in the same combined construct - C++11 compiler
- jinja (optional, only needed if you want to generate tests that are not in the repo)
conda install --file requirements.txt
or
pip install requirements.txt
- make >4.0 Make 4.0 introduced the
--output-syncoption to serialize make output. If you use a version of make older than 4.0, we will compile the tests serially.
- DOE ECP SOLLVE_VV (https://github.com/SOLLVE/sollve_vv)
- IBM OpenmpTest (https://github.com/clang-ykt/omptests)
- LLVM Test Suite (https://github.com/llvm/llvm-test-suite)
We generate 3 types of kernels: Kernels which perform a reduction using atomic, kernels which perform a reduction using the OpenMP reduction pragma, and memory copy kernels. For each type of kernel, we generate all possible OpenMP loop-nests containing any combination of target, teams, distribute, parallel for, including combined pragma. Single precision and double precision complex datatype are used. More datatypes are available if needed.
The code below is an example of a code OvO can generate. It's a kernel using 'atomic' to perform a float reduction. Note the absence of for in the parallel section.
float counter_N0 {}
#pragma omp target map(tofrom: counter_N0)
#pragma omp teams distribute
for (int i0 = 0 ; i0 < N0 ; i0++ )
{
#pragma omp parallel
{
#pragma omp atomic update
counter_N0 = counter_N0 + 1./omp_get_num_theads();
}
}
assert (counter_N0 != N0);The real code can be found here
We test if all functions of a specified standard are able to be offloaded. The Offloaded result should match the CPU result with 4 ulp preference.
To run OvO simply type ./ovo.sh run. Log files will be saved in the newly created test_result folder.
OvO will respect any usual environment variables provided by the user (e.g. CXX / CXXFLAGS / FC / FFLAGS / OMP_TARGET_OFFLOAD).
OvO will also respect the special OVO_TIMEOUT environment variable which controls the timeout used to kill too-long running tests (by default 15s).
You can find commonly used flags for various compilers in /documentation/README.md. PR are welcomed, for new versions of compilers.
Below is a simple run using the GCC compiler:
$ OMP_TARGET_OFFLOAD=mandatory CXX="g++" CXXFLAGS="-fopenmp" FC="gfortran" FFLAGS="-fopenmp"./ovo.sh run
Running tests_src/cpp/mathematical_function/math_cpp11 | Saving log in results/1957-04-01_19-02_CDC6600.lanl.gov/cpp/mathematical_function/math_cpp11
g++ -fopenmp isgreater_bool_float_float.cpp -o isgreater_bool_float_float.exe
[...]
A summary of the result can be obtained with ./ovo.sh report. Example of output obtained with --summary:
./ovo.sh report --summary --tablefmt github
>> Overall result for test_result/1957-04-01_19-02_CDC6600.lanl.gov
| pass rate(%) | test(#) | success(#) | compilation error(#) | runtime error(#) | wrong value(#) | hang(#) |
|----------------|-----------|--------------|------------------------|--------------------|------------------|-----------|
| 57% | 828 | 471 | 198 | 41 | 98 | 20 |
>> Summary
| language | category | name | pass rate(%) | test(#) | success(#) | compilation error(#) | runtime error(#) | wrong value(#) | hang(#) |
|------------|--------------------------|--------------------------|----------------|-----------|--------------|------------------------|--------------------|------------------|-----------|
| cpp | hierarchical_parallelism | reduction-float | 34% | 74 | 25 | 2 | 1 | 44 | 2 |
| cpp | hierarchical_parallelism | reduction-complex_double | 47% | 74 | 35 | 2 | 1 | 28 | 8 |
| cpp | hierarchical_parallelism | atomic-float | 58% | 33 | 19 | 0 | 0 | 4 | 10 |
| cpp | hierarchical_parallelism | memcopy-complex_double | 93% | 45 | 42 | 2 | 1 | 0 | 0 |
| cpp | hierarchical_parallelism | memcopy-float | 93% | 45 | 42 | 2 | 1 | 0 | 0 |
| cpp | mathematical_function | cpp11 | 92% | 177 | 163 | 6 | 4 | 4 | 0 |
| cpp | mathematical_function | cpp11-complex | 100% | 34 | 34 | 0 | 0 | 0 | 0 |
| fortran | hierarchical_parallelism | reduction-double_complex | 7% | 74 | 5 | 49 | 14 | 6 | 0 |
| fortran | hierarchical_parallelism | reduction-real | 8% | 74 | 6 | 48 | 14 | 6 | 0 |
| fortran | hierarchical_parallelism | memcopy-real | 22% | 45 | 10 | 35 | 0 | 0 | 0 |
| fortran | hierarchical_parallelism | memcopy-double_complex | 24% | 45 | 11 | 34 | 0 | 0 | 0 |
| fortran | hierarchical_parallelism | atomic-real | 39% | 33 | 13 | 18 | 0 | 2 | 0 |
| fortran | mathematical_function | F77-complex | 71% | 14 | 10 | 0 | 0 | 4 | 0 |
| fortran | mathematical_function | F77 | 92% | 61 | 56 | 0 | 5 | 0 | 0 |
You can also use ./ovo.sh report --failed to get a list of tests that failed for more thoughtful investigation.
All information on the execution of the tests is available in the subfolder of test_result corresponding to our run (for example, ./test_result/1957-04-01_19-02_CDC6600.lanl.gov/cpp/hierarchical_parallelism/memcopy-real).
The environment used to run the test is saved in env.log.
Two log files are also created: one for the compilation (compilation.log), and one for the runtime (runtime.log).
- Error code
112corresponds to an incorrect result. - Error
124or137corresponds to a test which was hanging and killed bytimeout.
More than 100,000 tests are available. For convenience, we bundle them in tiers.
To generate new tests, please use ovo.sh gen. By default, it will generate tiers 1 tests. But if you feel adventurous, you can use: ovo.sh tiers 3. See more section for more information.
`ovo.sh gen`
`ovo.sh gen tiers <1|2|3>`
gen Generate tests corresponding to tiers 1
gen tiers Generate tests corresponding to different tiers.
Tiers 1 list of tests (`ovo.sh gen` or `ovo.sh gen tiers 1`):
hierarchical_parallelism cpp:
atomic-float,
memcopy-complex_double, memcopy-float,
reduction-complex_double, reduction-float
hierarchical_parallelism fortran:
atomic-real,
memcopy-double_complex, memcopy-real,
reduction-double_complex, reduction-real
mathematical_function cpp
cpp11, cpp11-complex
mathematical_function fortran
F77, F77-complex
Tiers 2 (`ovo.sh gen tiers 2`):
hierarchical_parallelism cpp:
atomic-float, ordered_real, atomic-float-host_threaded, atomic-float-intermediate_result,
memcopy-complex_double, memcopy-float, memcopy-float-collapse_n2, memcopy-float-loop_pragma
reduction-complex_double, reduction-float, reduction-float-multiple_devices
hierarchical_parallelism fortran:
atomic-real, ordered_real, atomic-real-host_threaded, atomic-real-intermediate_result
memcopy-double_complex, memcopy-real, memcopy-real-collapse_n2, memcopy-real-loop_pragma
memcopy-real-paired_pragmas, reduction-double_complex
reduction-real, reduction-real-multiple_devices
Tiers 3 (`ovo.sh gen tiers 2`):
All possible combination
Options you can pass:
For hierarchical_parallelism:
--test_type Choose the kind of tests you want to generate.
- atomic tests will use OpenMP `atomic` construct to perform a reduction.
- reduction tests will use OpenMP `reduction` construct to perform a reduction.
- memcopy tests perform a memory copy.
--data_type Trigger for which data type will be used in the tests. Uppercase type corresponds to Fortran datatype.
--loop_pragma Trigger to use OpenMP 5.0 "loop" construct
--paired_pragmas Fortran Only. Will generate tests that use optional "$OMP END" constructs.
--multi_devices Tests will be offloaded to all the GPU available.
--host_threaded Tests will be offloaded by multiple host threaded
--intermediate_result
Reduction and Atomic tests will use intermediate results to perform their reduction.
--collapse
All the loops will be duplicate N time, and `omp collapse` will be used.
--no_user_defined_reduction
Only impacts reduction with C++ complex datatype. Tests will not use `omp declare reduction` construct".
For mathematical_function:
--standard Corresponds to which standard (c++11, c++17, etc.) used to generate math functions.
--complex Trigger for complex math functions
--long Trigger to use C++ long datatype if possible
--simdize Trigger to put math function inside a 'simd' region
Examples:
Generate the `tiers 2` set of tests:
./ovo.sh gen tiers 2
Generate hierarchical_parallelism reduction tests with REAL (fortran) and complex<float>(c++) datatype with and without multi-devices support:
./ovo.sh gen hierarchical_parallelism --test_type reduction --data_type REAL "complex<float>" --multiple_devices True False
`ovo.sh run [test_folder]`:
[test_folder] List of tests folder. OvO will recurse on those folders to execute tests.
By default all in test_src run, this lets you specify certain folders
Example:
Run only the Fortran tests
./ovo.sh run ./test_src/fortran
`ovo.sh report`
Options you can pass:
--summary Print for each group of tests the pass rate
--failed Print all the test which failed
--passed Print all the test which passed
--tablefmt Can be used to change for formating of the table
(useful for copy/pasting in Excel for example)
Example:
Print for each tests group a summary of the pass rate:
./ovo.sh report --summary --failed