Skip to content

Patches for Mac (including ARM NEON 128 bits)#425

Merged
valassi merged 31 commits intomadgraph5:masterfrom
valassi:mac
Apr 9, 2022
Merged

Patches for Mac (including ARM NEON 128 bits)#425
valassi merged 31 commits intomadgraph5:masterfrom
valassi:mac

Conversation

@valassi
Copy link
Member

@valassi valassi commented Apr 8, 2022

This is a WIP comprehensive patch for Mac specific issues (#375).

It also includes, modifies and supersedes Stefan's PS #421. For more detailed comments why this was necessary, see #221.

@valassi
Copy link
Member Author

valassi commented Apr 8, 2022

This is still in WIP. Amngst the things that are missing:

  • generate tput logs for mac
  • presently the above fails in the fortran test (it crashes? note that the fortran/c++ does build however)
  • backport to codegen
  • regenerate all five processes auto and manual
  • rerun all tputs on mac for these five processes

valassi added 15 commits April 9, 2022 23:22
…sing MacOS SIP

(SIP would drop DYLD_LIBRARY_PATH inside the script if this is set outside the script)

Example ./tput/teeThroughputX.sh -ggtt -dlp $(cd $(dirname $(gfortran --print-file-name libgfortran.dylib)); pwd)
This now does proceed one step further, fcheck.exe does succeed!
…issue (python is not found, should use python3)

This is ./tput/teeThroughputX.sh -ggtt -dlp $(cd $(dirname $(gfortran --print-file-name libgfortran.dylib)); pwd)
… almost all ok

Still to do: improve handling of missing avx2, 512y, 512z builds
…ths (now rpath on linux and full paths on mac)
…2x/double and ~4x/float speedup from NEON SIMD
@valassi valassi self-assigned this Apr 9, 2022
@valassi valassi marked this pull request as ready for review April 9, 2022 22:53
@valassi
Copy link
Member Author

valassi commented Apr 9, 2022

This is now complete (no longer WIP). Amongst the things that were added:

  • generate tput logs for mac
  • fixed the fortran test (I was missing a path in DYLD_LIBRARY_PATH for my compiler, and I had to bypass MacOS SIP, System Integrity Protection, in my scripts)
  • backport to codegen
  • regenerate all five processes auto and manual
  • rerun all tputs on mac for these five processes
  • added a summary table

This is a summary table from the tests of all five processes double/float on Mac M1 ARMv8 with and without NEON SIMD. One nicely sees factors close to 2x and 4x for double and float
https://github.com/madgraph5/madgraph4gpu/blob/b5ef53ca10a77bece14c0814e5d5ef0275dee53e/epochX/cudacpp/tput/summaryTable_macm1.txt

*** FPTYPE=d ******************************************************************

+++ cudacpp REVISION ea28661 +++
On mac-1T0-438.local [CPU: Apple M1] [GPU: none]:

[Apple clang 12.0.5] 
HELINL=0 HRDCOD=0
                eemumu          ggtt         ggttg        ggttgg       ggttggg
         [2048/256/12]  [2048/256/1]    [64/256/1]    [64/256/1]     [1/256/1]
CPP/none      3.51e+06      5.77e+05      7.02e+04      5.43e+03      2.24e+02
CPP/sse4      7.55e+06      9.79e+05      1.39e+05      1.11e+04      3.65e+02

*** FPTYPE=f ******************************************************************

+++ cudacpp REVISION ea28661 +++
On mac-1T0-438.local [CPU: Apple M1] [GPU: none]:

[Apple clang 12.0.5] 
HELINL=0 HRDCOD=0
                eemumu          ggtt         ggttg        ggttgg       ggttggg
         [2048/256/12]  [2048/256/1]    [64/256/1]    [64/256/1]     [1/256/1]
CPP/none      3.57e+06      5.76e+05      7.10e+04      5.59e+03      2.61e+02
CPP/sse4      1.50e+07      1.85e+06      2.48e+05      2.16e+04      7.21e+02

@valassi valassi changed the title WIP: Patches for Mac (including ARM NEON 128 bits) Patches for Mac (including ARM NEON 128 bits) Apr 9, 2022
@valassi
Copy link
Member Author

valassi commented Apr 9, 2022

All checks have passed. I am self merging.

@roiser, I will need to merge this into your alphas patch, I hope this does not cause issues. (I actually started from there, I saw that the alphas patch includes the Mac NEON patch, so I wanted to check it and merge it standalone first).

@valassi valassi merged commit e2c4c0a into madgraph5:master Apr 9, 2022
@valassi valassi mentioned this pull request Apr 9, 2022
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Apr 21, 2022
… which I will now merge

Revert "enable ARM NEON (128 bit) vector registers via compiler defined macros"
This reverts commit a497e1a.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants