This is a basic example for OpenCL programming model, serving a main purpose for being a good reference when new to OpenCL. For simplicity, most OpenCL API calls are wrapped in C++ wrapper functions. Reference the main
function for the programming flow, and trace details in each wrapper function to understand how exactly the OpenCL host-side API works.
The repository contains two example programs, "Vector Add" and "Dot Product", each with independent host code (.cpp) and kernel code (.cl).
A very simple vector operation like C = A + B.
Since it is not a compute-intensive operation and OpenCL has some setup overhead, it is possible that the OpenCL kernel is slower in performance compared to C++ sequential code for small vector sizes and global work sizes.
The program calculate the inner product of two vectors. This is also a BLAS1 operation but with good potential for optimization.
Refer to this program to see more techniques used in OpenCL programming such as utilizing local memory, choosing local work size (in other words, the work group size) and work group reduction.
You should be able to run OpenCL on most CPUs and/or GPUs.
A quick way to check if your machine is capable of running OpenCL is to use clinfo
, simply run:
$ clinfo
If at least one OpenCL platform is shown, your have one or more hardwares that support OpenCL.
To run OpenCL programs, you have to install one of the OpenCL runtime/sdk:
- AMD APP SDK 3.0 - Mirror file for linux x86_64
- Intel SDK for OpenCL - Requires registration
- Intel Compute Runtime
AMD SDK is recommended if you do not know which one to use.
$ wget http://debian.nullivex.com/amd/AMD-APP-SDKInstaller-v3.0.130.136-GA-linux64.tar.bz2
$ tar jxvf AMD-APP-SDKInstaller-v3.0.130.136-GA-linux64.tar.bz2
$ chmod +x ./AMD-APP-SDK-v3.0.130.136-GA-linux64.sh
$ ./AMD-APP-SDK-v3.0.130.136-GA-linux64.sh
After installation, environment variables OCL_INC_DIR
and OCL_LIB_DIR
should be set to OpenCL include directory and OpenCL lib directory, respectively.
If you are using AMD APP SDK, run:
$ export OCL_IND_DIR=$AMDAPPSDKROOT/include
$ export OCL_LIB_IDR=$AMDAPPSDKROOT/lib/x86_64
Alternatively, you could just modify the compiling flags in Makefile.
Compile:
make
Clean:
make clean
Specify vector size and global work size when running the program.
For example, if you want to run the vector add operation while vector size = 100,000 and global work size = 1,000, run:
./build/vector_add 100000 1000
You could see whether the program is successful or not and the relative speed compared to C++ sequential version.