Skip to content

Vectorization of the Jacobi-type methods for the SVD and the EVD

License

Notifications You must be signed in to change notification settings

venovako/VecJac

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VecJac

Vectorization of the Jacobi-type methods for the SVD and the 2x2 EVD

This software is a supplementary material for the paper doi:10.1137/22M1478847 (arXiv:2202.08361 [math.NA]).

Building

Prerequisites

A recent Intel CPU (with at least the AVX512F instruction subset) and oneAPI Base and HPC Toolkits (e.g., 2021.2 or newer) on a 64-bit Linux (e.g., CentOS 7.9) are required.

First, clone and build JACSD repository, with the same parent directory as this one has (e.g., venovako/JACSD and venovako/VecJac). In fact, only the jstrat library is required to be built there, with the same compilers and the (no-)debug mode as it is meant to be used here.

Make options

Run make in the src subdirectory as follows:

make [COMPILER=x64x|x200|gnu] [MARCH=...] [NDEBUG=optimization_level] [ABI=ilp64|lp64] [FPU=precise|strict] [WP=q|l] [MKL=sequential|intel_thread] [SLEEF=/path/to/sleef] [CR_MATH=/path/to/core-math] [all|clean|help]

where COMPILER should be set to x64x for Xeon processors or to x200 for Xeon Phi KNLs, and NDEBUG to the desired optimization level (3 is a sensible choice).

The option gnu selects the GCC family of compilers, for which the SLEEF-related parts cannot yet be built, and SLEEF has to remain unset. This option might cause the code to miscompile and should not be used at the moment! Neither x64x is guaranteed to be safe, since the testing was performed only with the obsolete non-LLVM Intel compilers (see src/x64.mk).

As a hack, setting MKL explicitly to sequential turns off OpenMP (otherwise it is turned on, unless debugging). The MARCH option might be set to a particular CPU generation (e.g., ICELAKE-SERVER, or its lowercase variant for gnu) or left undefined to take the default value of Host (or common-avx512 for x200). Other options should not be used unless their consequences are fully understood. For example, make COMPILER=x64x NDEBUG=3 clean all will trigger a full, release-mode rebuild for Xeons.

Running

The etc subdirectory contains env.sh, the script that should be sourced before launching the executables. The script should be modified for any Linux distribution different from RedHat/CentOS 7. On newer Intel Xeon CPUs it might be sufficient to query the CPUID leaf 0x15 for the TSC frequency in run-time, but since it is not guaranteed to work, a safer timing procedure, using CLOCK_MONOTONIC_RAW, is default there. The OpenMP-enabled executables require setting OMP_NUM_THREADS environment variable for the upper bound on the number of threads to be used.

This work has been supported in part by Croatian Science Foundation under the project IP-2014-09-3670 (MFBDA).