OpenMPApps/comd-mp4/README.omp at master · AMDComputeLibraries/OpenMPApps · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
To compile:
---------------
*	Type make in
*	src-omp directory
*	Paths to the omptargetlib must be set before compiling. The following env.
*	variables help to set all the necessary paths.

export OMPTARGET_LIBS=/usr/local/libomptarget/lib
export PATH=$CUDA/bin:$PATH
export LIBOMP_LIB=/usr/local/intel_openmp_rt/lib
export LIBRARY_PATH=$OMPTARGET_LIBS:$CUDA/nvvm/libdevice
export LD_LIBRARY_PATH=$LIBOMP_LIB:$OMPTARGET_LIBS:$CUDA/targets/x86_64-linux/lib
export CLANG=/usr/local/llvm_openmp/bin/clang

*	A binary should be created in the ../bin directory

To run:
----------------
*   ../bin/CoMD-openmp    -- for LJ. You can use -x NN -y NN -z NN options to
*   specify problem set sizes in x,y,z direction
*   The EAM kernel is not quite working in OpenMP4.  There are issues with the
*   reduction that affect correctness.

Notes:
----------------
*	Kernels or Loops parallelized on the GPU: LJForce
*	The first step to OpenMP parallelization is creating aliases to the
	structs	for the ease of copying data to the device. Using complex struct of
	arrays	results in GPU not being able to recognize the size of each array.
	The size of each array has to be manually pass to copy the data correctly.

*	Vector parallelism will result in 1:1 mapping between CoMD Box and GPU
	wavefronts. This means that for boxes which do not have 64 atoms, there
	will be some threads which will be left ideal. We should devise a strategy
	in which all the GPU threads are appropriately utilized, thereby fully
	occupying the GPU.