-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathREADME.omp
More file actions
executable file
·36 lines (31 loc) · 1.49 KB
/
README.omp
File metadata and controls
executable file
·36 lines (31 loc) · 1.49 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
To compile:
---------------
* Type make in
* src-omp directory
* Paths to the omptargetlib must be set before compiling. The following env.
* variables help to set all the necessary paths.
export OMPTARGET_LIBS=/usr/local/libomptarget/lib
export PATH=$CUDA/bin:$PATH
export LIBOMP_LIB=/usr/local/intel_openmp_rt/lib
export LIBRARY_PATH=$OMPTARGET_LIBS:$CUDA/nvvm/libdevice
export LD_LIBRARY_PATH=$LIBOMP_LIB:$OMPTARGET_LIBS:$CUDA/targets/x86_64-linux/lib
export CLANG=/usr/local/llvm_openmp/bin/clang
* A binary should be created in the ../bin directory
To run:
----------------
* ../bin/CoMD-openmp -- for LJ. You can use -x NN -y NN -z NN options to
* specify problem set sizes in x,y,z direction
* The EAM kernel is not quite working in OpenMP4. There are issues with the
* reduction that affect correctness.
Notes:
----------------
* Kernels or Loops parallelized on the GPU: LJForce
* The first step to OpenMP parallelization is creating aliases to the
structs for the ease of copying data to the device. Using complex struct of
arrays results in GPU not being able to recognize the size of each array.
The size of each array has to be manually pass to copy the data correctly.
* Vector parallelism will result in 1:1 mapping between CoMD Box and GPU
wavefronts. This means that for boxes which do not have 64 atoms, there
will be some threads which will be left ideal. We should devise a strategy
in which all the GPU threads are appropriately utilized, thereby fully
occupying the GPU.