Skip to content

Commit

Permalink
Initial integration with QDP-JIT
Browse files Browse the repository at this point in the history
 - packers are fully threaded and use raw QDPJIT pointer access.
 - packers work only when QDP-JIT is run with -layout ocsri

Todo:
 - recheck integration has not broken non QDP-JIT build
 - bulletproof (with assertions) to ensure layout is ocsri, or adapta also to oscri
 - Use as 'native' dslash: only possible in ocsri mode since that is what is wired into QDPJIT
 - lots of testing.
        - test full chroma QDP-JIT build (e.g. Leapfrog)
  • Loading branch information
Balint Joo committed Jun 25, 2016
1 parent 3e14b69 commit 7394787
Show file tree
Hide file tree
Showing 7 changed files with 362 additions and 757 deletions.
1 change: 1 addition & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
6/24/2015 (Balint Joo/Thorsten Kurth) - Added lots of cleanup for KNL. Added packers for QDP-JIT (OCSRI layout for now)
1/12/2016 (Balint Joo/Aaron Walden) - Added AVX512 output from code generator. Tested on Intel SDE. Single and Double precision work, 16-bit precision is currently broken.

4/9/2015 (Balint Joo): Added Basic Multi-Shift solvers -- but can still add more optimizations. Need to test multi-node etc.
Expand Down
9 changes: 6 additions & 3 deletions TODO
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
To do with QDPJIT:
- Thorough testing for SOA combinations (initially done with 4)
- Bullet proofing assertions that data is in OCSRI layout or make the code generically figure out the right layout.
- Write native dslash using raw pointers: need OSCRI layout assertion and InnerLength assertion (easy)
- but issue w.r.t currently onle inner=8 from QDPJIT apparently working in my tests

TO Do:
- Add cache blocking in X -- hooks are already present in the kernels, but need to adapt loops.
- Add other solvers specially multi-shift
- Solve N-systems at once (ongoing PhD project at Old Dominion University Computer Science Department: ODU-JLab collaboration)
- Clean code for non ICC compilers (work ongoing by Diptorup Deb, Renaissance Computing Institute, University of North Carolina, Chapel Hill
- Add other processor targets: SSE, AVX2, AVX512 - codegen supports these already but need to be added
- Add other fermions: e.g. Twisted Mass, HiSQ, Domain Wall?
- Bullet proof a little (A LOT!!!)
- Better tuning (implement 1/2/3/4 threads per core instead of just all 4)
- reduce verbosite and use master printf to better effect (log levels?)
Expand Down
2 changes: 1 addition & 1 deletion include/qphix/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ qphix_include_HEADERS += \
#

if QPHIX_BUILD_WITH_QDP
qphix_include_HEADERS += ./qdp_packer.h
qphix_include_HEADERS += ./qdp_packer.h ./qdp_packer_parscalar.h qdp_packer_qdpjit.h
endif


Expand Down
Loading

0 comments on commit 7394787

Please sign in to comment.