The vectorized (AVX-512) batched singular value decomposition algorithm for matrices of order two.
This software is a supplementary material for the paper doi:10.1142/S0129626420500152 (arXiv:2005.07403 [cs.MS]).
A recent Intel C compiler on a 64-bit Linux (e.g., CentOS 7.8) is required.
The Intel MKL (Math Kernel Library) is recommended, but another LAPACK library could work with some tweaking.
The repository libpvn has to be cloned in a parallel directory to this one and built with the desired compiler and the OPENMP make variable at least 0.
Run make in the src subdirectory as follows:
make [TEST=0..15] [all|clean|help]For testing, TEST=0 builds the vectorized code, and TEST=4 builds the pointwise code.
Adding two to TEST enables the optional backscaling, while adding one enables the step-by-step printouts.
Adding eight to TEST turns on tracking of IA32_MPERF and IA32_APERF MSRs (requires running the executables as root).
To write N finite pseudorandom doubles into FileName file, run:
./src/rndgen.exe N FileNameTo test the real (or the complex, in the second line) algorithm T, where T=TEST, on N vectors from FileName, run:
./src/d8svd2tT.exe N FileName
./src/z8svd2tT.exe N FileNameTo test the real (or the complex, in the second line) algorithm T, where T=TEST, on #batches batches, each with n matrices read from infile, run:
./src/dbatchT.exe n #batches infile
./src/zbatchT.exe n #batches infileFor now, n has to be a power of two (not a constraint on the algorithm itself, but only on the error testing procedure).
This work has been supported in part by Croatian Science Foundation under the project IP-2014-09-3670 (MFBDA).