OpenCL is a framework for programming highly parallel devices like graphic cards (GPUs) and mutlicore CPUs. The implementation works on modern GPUs at current state. The main limitation is the required support of double precision floating point operations.
There are two options:
- Download and install it from the corresponding vendor (Nvidia or AMD).
- On Unix systems driver packages may be available. It is recommended to use them, since they are more convenient to install and update. Also they cause less trouble with kernel updates. The drawback is a delay in updates relative to original vendor drivers. Moreover, for some systems there are no official packages, since the driver is not open-source. In this case search in non-official repositories, like RPM Fusion. On Ubuntu search for packages fglrx-updates (AMD) or nvidia-###-updates.
The recent driver alone is sufficient to run ADDA
OpenCL executables, e.g. those that are provided for Windows.
These are required to compile OpenCL version of ADDA
. There are several alternative approaches:
- On Unix a special package may be available, e.g. ocl-icd-opencl-dev for Ubuntu. It is based on open-source implementation of
libOpenCL.so
, which further links to vendor-specific libraries. - Install full SDK from the corresponding vendor (Nvidia CUDA toolkit or AMD APP SDK). Additionally to headers and libraries you will get some nice development tools, such as visual profiler. We have, however, encountered several problems with this approach:
- Intel compiler (version 11.0) has problems with OpenCL headers from CUDA toolkit 3.2. This was fixed in later versions of the toolkit.
OpenCL.lib
provided by both CUDA toolkit and AMD APP SDK on 64-bit Windows is incompatible withMinGW64
. Compilation goes fine butADDA
breaks down as soon as it calls the first OpenCL function. This is probably due to some limitations ofMinGW64
itself. Fortunately, AMD APP SDK also provideslibOpenCL.a
, which links and works fine. To compileADDA
on 64-bit Windows with Nvidia GPUs one should either uselibOpenCL.a
from AMD APP SDK or resort to a manual approach described below.
- Supply missing headers and/or libraries manually
- Since OpenCL is an open standard, the headers can be obtained from the official website. Ideally, one should use the same version of OpenCL headers as that of library. They may also be available as a package, e.g. opencl-headers
- Installed GPU driver always provides libraries (DLLs), which are used at runtime. On Unix systems compiler may link to this libraries directly. Similarly, on Windows GCC can link to
OpenCL.dll
(located inC:\Windows\System32
). However, on 64-bit Windows it is obligatory to use 64-bit application (Windows Explorer or a file manager) to move this dll to non-system folder, because Windows will supply 32-bit instead of 64-bit version ofOpenCL.dll
to any 32-bit application (such as the compiler).
If you obtain headers and libraries in a package (on Unix), the latter should also set the paths appropriately (to make them available to the compiler). Otherwise, you should either
- set these paths yourself:
- on Unix add headers location to environmental variables
C_INCLUDE_PATH
andCPLUS_INCLUDE_PATH
and libraries location - toLIBRARY_PATH
andLD_LIBRARY_PATH
(for linking and runtime). SettingLD_LIBRARY_PATH
can be replaced by modifying/etc/ld.so.conf
, which sometimes is done automatically during driver installation. - on Windows copy contents of corresponding folders to
include
andlib
folders of the MinGW/MSYS environment.
- on Unix add headers location to environmental variables
- or specify paths to corresponding
include
andlib
folders in filesrc/ocl/Makefile
, as described in CompilingADDA.
In general the whole matrix-vector multiplication is done on the GPU.
For details and benchmarks see J. Comput. Sci. 2, 262-271 (2011). Currently, ADDA
on high-end gaming GPUs outperform that on high-end processors at least several times (sometimes by a factor of 10 or more). And this factor is significantly limited by relatively slow double-precision computations on current gaming GPUs and very modest performance of OpenCL FFT routines (compared to sophisticated FFTW3
). Recently, we switched (in the default mode) to clAmdFft, which further increased speed a few times.
Although OpenCL version of ADDA
is fully operational, there are a number of limitations. In particular, the maximum problem size is limited by the available GPU memory, more specifically, by its part available to ADDA
.