First patch for automatic Fortran vs cuda/cpp comparisons#454
Merged
valassi merged 190 commits intomadgraph5:masterfrom May 20, 2022
Merged
First patch for automatic Fortran vs cuda/cpp comparisons#454valassi merged 190 commits intomadgraph5:masterfrom
valassi merged 190 commits intomadgraph5:masterfrom
Conversation
At runtime, cmadevent_cudacpp was succeeding but gmadevent_cudacpp was crashing
[avalassi@itscrd70 gcc10.2/cvmfs] /data/avalassi/GPU2020/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx> ./gmadevent_cudacpp < ../../../tmad/input_app_32_NOMULTICHANNEL.txt
__CudaRuntime: calling cudaSetDevice(0)
terminate called after throwing an instance of 'std::runtime_error'
what(): Bridge constructor: nevt should be a multiple of 32
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/vterminate.cc:95
at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:48
at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:58
at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:95
Aborted
Now both cmadevent and gmadevent are ok
./cmadevent_cudacpp < ../../../tmad/input_app_32_NOMULTICHANNEL.txt
...
RESET CUMULATIVE VARIABLE
1 1.0585767962875954 1.0585746799742819 0.99999800079378187
2 0.34621899802570244 0.34621895155590371 0.99999986577917732
3 0.69641550793457307 0.69639622166046955 0.99997230636899415
4 1.1905598627825309 1.1905091128440706 0.99995737304771748
5 1.1553353972932170 1.1553050542823655 0.99997373662148448
6 1.0742962501588520 1.0742704701687749 0.99997600290415856
7 9.8420484883663377 9.8420305789332669 0.99999818031448517
8 0.32582685677422046 0.32582816664358427 1.0000040201393365
9 2.5149439789248635 2.5149325040463246 0.99999543732240759
10 5.0208307517816033 5.0205082151740275 0.99993576031068931
11 0.44788113881489167 0.44788033248057252 0.99999819966895398
12 0.48986309244196768 0.48985700370132096 0.99998757052584553
13 0.63328330290184909 0.63326795025360871 0.99997575706138153
14 6.1412376054978886 6.1408422471646906 0.99993562236822686
15 0.39555567583957080 0.39555551971280867 0.99999960529763154
16 0.52747130178068424 0.52746612471597976 0.99999018512535753
17 1.7437975984037744 1.7437125329156875 0.99995121825596922
18 1.4622130913237883 1.4621759933936889 0.99997462891672939
19 2.8196698917337062 2.8195133794161817 0.99994449268051433
20 0.49718730336075750 0.49718175998606901 0.99998885053047204
21 0.93730472272336696 0.93730460352298761 0.99999987282643898
22 0.98559242564719773 0.98555491534526574 0.99996194136546113
23 1.3917471502668604 1.3916827326186247 0.99995371454633608
24 0.39335740001747643 0.39335537900079015 0.99999486213635180
25 0.89921455803138484 0.89918280136248963 0.99996468399158844
26 1.7000555878730745 1.6999701821900650 0.99994976300562244
27 1.2501450481526981 1.2500906207696398 0.99995646314550568
28 0.40328952406822066 0.40328705864351660 0.99999388671275369
29 0.25396745231910817 0.25396745202905652 0.99999999885791802
30 0.59946736522418109 0.59945385928135198 0.99997747009493321
31 8.9853421261395425 8.9847441073480194 0.99993344507274984
32 0.71323994225410603 0.71322868924826455 0.99998422269256837
Iteration 1 Mean: 0.1982E+03 Abs mean: 0.1982E+03 Fluctuation: 0.751E+02 0.157E+04 100.0%
./gmadevent_cudacpp < ../../../tmad/input_app_32_NOMULTICHANNEL.txt
...
RESET CUMULATIVE VARIABLE
1 1.0585767962875954 1.0585746799742821 0.99999800079378209
2 0.34621899802570244 0.34621895155590371 0.99999986577917732
3 0.69641550793457307 0.69639622166047044 0.99997230636899537
4 1.1905598627825309 1.1905091128440715 0.99995737304771826
5 1.1553353972932170 1.1553050542823660 0.99997373662148492
6 1.0742962501588520 1.0742704701687740 0.99997600290415778
7 9.8420484883663377 9.8420305789332652 0.99999818031448495
8 0.32582685677422046 0.32582816664358416 1.0000040201393361
9 2.5149439789248635 2.5149325040463246 0.99999543732240759
10 5.0208307517816033 5.0205082151740275 0.99993576031068931
11 0.44788113881489167 0.44788033248057252 0.99999819966895398
12 0.48986309244196768 0.48985700370132101 0.99998757052584564
13 0.63328330290184909 0.63326795025360894 0.99997575706138186
14 6.1412376054978886 6.1408422471646871 0.99993562236822631
15 0.39555567583957080 0.39555551971280867 0.99999960529763154
16 0.52747130178068424 0.52746612471597976 0.99999018512535753
17 1.7437975984037744 1.7437125329156880 0.99995121825596944
18 1.4622130913237883 1.4621759933936902 0.99997462891673028
19 2.8196698917337062 2.8195133794161817 0.99994449268051433
20 0.49718730336075750 0.49718175998606901 0.99998885053047204
21 0.93730472272336696 0.93730460352298728 0.99999987282643865
22 0.98559242564719773 0.98555491534526585 0.99996194136546124
23 1.3917471502668604 1.3916827326186247 0.99995371454633608
24 0.39335740001747643 0.39335537900079010 0.99999486213635169
25 0.89921455803138484 0.89918280136248874 0.99996468399158744
26 1.7000555878730745 1.6999701821900661 0.99994976300562310
27 1.2501450481526981 1.2500906207696396 0.99995646314550546
28 0.40328952406822066 0.40328705864351633 0.99999388671275302
29 0.25396745231910817 0.25396745202905652 0.99999999885791802
30 0.59946736522418109 0.59945385928135186 0.99997747009493310
31 8.9853421261395425 8.9847441073480176 0.99993344507274962
32 0.71323994225410603 0.71322868924826466 0.99998422269256848
…hQuiet=-1, BothDebug=-2)
Note a peculiar tendency to go LOW rather than go HIGH... systematic errors in the calculation?! C++ and CUDA agree with each other, but are often lower than Fortran ./gmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt ... RESET CUMULATIVE VARIABLE WARNING! Deviation more than 5E-5 10 5.0208307517816033 5.0205082151740275 0.99993576031068931 WARNING! Deviation more than 5E-5 14 6.1412376054978886 6.1408422471646871 0.99993562236822631 WARNING! Deviation more than 5E-5 19 2.8196698917337062 2.8195133794161817 0.99994449268051433 WARNING! Deviation more than 5E-5 26 1.7000555878730745 1.6999701821900661 0.99994976300562310 WARNING! Deviation more than 5E-5 31 8.9853421261395425 8.9847441073480176 0.99993344507274962 RESET CUMULATIVE VARIABLE WARNING! Deviation more than 5E-5 1 2.9952828789093009 2.9951147060108139 0.99994385408481079 WARNING! Deviation more than 5E-5 5 3.2674597739718134 3.2672949541081930 0.99994955718661538 WARNING! Deviation more than 5E-5 11 23.906164531244425 23.904572425204815 0.99993340186220470 WARNING! Deviation more than 5E-5 12 7.2115125554333357 7.2111024304164699 0.99994312912669658 WARNING! Deviation more than 5E-5 18 5.0277582816984809 5.0274636035675559 0.99994138975773805 WARNING! Deviation more than 5E-5 21 5.1118977339682887 5.1115741556696488 0.99993670094444775 WARNING! Deviation more than 5E-5 28 1.8208877147178686 1.8207945747421526 0.99994884913827298 WARNING! Deviation more than 5E-5 9 7.5632375656327859 7.5628103733018515 0.99994351726661668 WARNING! Deviation more than 5E-5 15 4.1469688581967885 4.1467152375571983 0.99993884192327875 WARNING! Deviation more than 5E-5 26 4.8212544866832658 4.8209820679043265 0.99994349628718171 WARNING! Deviation more than 5E-5 27 20.810883398215640 20.809421655551752 0.99992976066244199 ... ME ratio CudaCpp/Fortran: MIN = 0.99992976066244199 ME ratio CudaCpp/Fortran: MAX = 1.0000040201393361 ME ratio CudaCpp/Fortran: 1-MIN = 7.0239337558009041E-005 ME ratio CudaCpp/Fortran: MAX-1 = 4.0201393360916882E-006 ./cmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt ... ME ratio CudaCpp/Fortran: MIN = 0.99992976066244166 ME ratio CudaCpp/Fortran: MAX = 1.0000040201393365 ME ratio CudaCpp/Fortran: 1-MIN = 7.0239337558342108E-005 ME ratio CudaCpp/Fortran: MAX-1 = 4.0201393365357774E-006
./gmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt ... RESET CUMULATIVE VARIABLE WARNING! Deviation more than 5E-5 10 5.0208307517816033 5.0204830169677734 0.99993074157823258 WARNING! Deviation more than 5E-5 14 6.1412376054978886 6.1409120559692383 0.99994698958914097 WARNING! Deviation more than 5E-5 19 2.8196698917337062 2.8195171356201172 0.99994582482366579 WARNING! Deviation more than 5E-5 23 1.3917471502668604 1.3916516304016113 0.99993136694030182 WARNING! Deviation more than 5E-5 30 0.59946736522418109 0.59986174106597900 1.0006578770833512 WARNING! Deviation more than 5E-5 31 8.9853421261395425 8.9848155975341797 0.99994140138483645 RESET CUMULATIVE VARIABLE WARNING! Deviation more than 5E-5 1 2.9952828789093009 2.9951152801513672 0.99994404576639029 WARNING! Deviation more than 5E-5 12 7.2115125554333357 7.2111072540283203 0.99994379800327604 WARNING! Deviation more than 5E-5 18 5.0277582816984809 5.0274577140808105 0.99994021836356684 WARNING! Deviation more than 5E-5 21 5.1118977339682887 5.1116113662719727 0.99994398015938124 WARNING! Deviation more than 5E-5 28 1.8208877147178686 1.8207850456237793 0.99994361591148129 WARNING! Deviation more than 5E-5 9 7.5632375656327859 7.5628528594970703 0.99994913472803448 WARNING! Deviation more than 5E-5 12 0.34758494237401355 0.34689116477966309 0.99800400561194635 WARNING! Deviation more than 5E-5 13 0.35451575056455586 0.35453897714614868 1.0000655163601500 WARNING! Deviation more than 5E-5 15 4.1469688581967885 4.1467485427856445 0.99994687314550035 WARNING! Deviation more than 5E-5 26 4.8212544866832658 4.8209867477416992 0.99994446695516570 WARNING! Deviation more than 5E-5 27 20.810883398215640 20.813144683837891 1.0001086588002528 WARNING! Deviation more than 5E-5 28 1.5052106057124179 1.5051305294036865 0.99994680059492835 Iteration 1 Mean: 0.4017E+03 Abs mean: 0.4017E+03 Fluctuation: 0.119E+03 0.525E+04 100.0% ... ME ratio CudaCpp/Fortran: MIN = 0.99800400561194635 ME ratio CudaCpp/Fortran: MAX = 1.0006578770833512 ME ratio CudaCpp/Fortran: 1-MIN = 1.9959943880536457E-003 ME ratio CudaCpp/Fortran: MAX-1 = 6.5787708335118822E-004 ./cmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt ... ME ratio CudaCpp/Fortran: MIN = 0.99800323394186297 ME ratio CudaCpp/Fortran: MAX = 1.0006573799366487 ME ratio CudaCpp/Fortran: 1-MIN = 1.9967660581370339E-003 ME ratio CudaCpp/Fortran: MAX-1 = 6.5737993664871652E-004
…s instead (deal with FPTYPE=f) Revert "[fvsc] increase the threshold for warnings to 1E-4 to reduce verbosity" This reverts commit dbdd7b1.
…ase the number of events...
… my tput scripts for ggtt) For cmadevent (default 512y) on ggtt: ME ratio CudaCpp/Fortran: MIN = 0.99992609952020872 ME ratio CudaCpp/Fortran: MAX = 1.0000053090530967 ME ratio CudaCpp/Fortran: 1-MIN = 7.3900479791277895E-005 ME ratio CudaCpp/Fortran: MAX-1 = 5.3090530967025984E-006 PROGRAM : 32.5986s SMATRIX1MULTI : 9.0047s for 524320 Fortran events => throughput is 5.82E+04 events/s FBRIDGESEQUENCE : 1.3071s for 524320 CudaCpp events => throughput is 4.01E+05 events/s Note that tput tests give 6.12E5... probably losing something in data copies, but still a factor 6-7 faster than Fortran!
git diff --no-ext-diff e5a2db8 gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f > CODEGEN/MG5aMC_patches/patch.driver.f
…ence to create patches!)
Revert "[fvsc] TEMPORARY! generate ggtt.mad without patchMad (use it as reference to create patches!)" This reverts commit 63b32f9c09dff5659cd20947a499746c5d6ccfdb.
…orary ref) git diff --no-ext-diff 63b32f9c gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f > CODEGEN/MG5aMC_patches/patch.auto_dsig1.f
Use this on eemumu, funnily it generates 16385 events but goes through 512k MEs (which is what I wanted) ME ratio CudaCpp/Fortran: MIN = 0.99999999999999911 ME ratio CudaCpp/Fortran: MAX = 1.0000000000000007 ME ratio CudaCpp/Fortran: 1-MIN = 8.8817841970012523E-016 ME ratio CudaCpp/Fortran: MAX-1 = 6.6613381477509392E-016 PROGRAM : 8.3820s SMATRIX1MULTI : 3.6777s for 524320 Fortran events => throughput is 1.43E+05 events/s FBRIDGESEQUENCE : 0.0983s for 524320 CudaCpp events => throughput is 5.33E+06 events/s Notable points: - Fortran and Cudacpp MEs agree to E-16! Very different from ggtt - MEs (SMATRIXMULTI) are around 50% of the total time in Fortran - MEs in CPP are a factor 40(!) faster than in Fortran?!
Note nice results for ggttggg CUDA (still 32 events in flight only!) ME ratio CudaCpp/Fortran: MIN = 0.99996229761036137 ME ratio CudaCpp/Fortran: MAX = 1.0006315515457513 ME ratio CudaCpp/Fortran: 1-MIN = 3.7702389638627487E-005 ME ratio CudaCpp/Fortran: MAX-1 = 6.3155154575134098E-004 PROGRAM : 50.8005s SMATRIX1MULTI : 39.5586s for 1056 Fortran events => throughput is 2.67E+01 events/s FBRIDGESEQUENCE : 9.2071s for 1056 CudaCpp events => throughput is 1.15E+02 events/s CPP ME ratio CudaCpp/Fortran: MIN = 0.99996229761036193 ME ratio CudaCpp/Fortran: MAX = 1.0006315515457525 ME ratio CudaCpp/Fortran: 1-MIN = 3.7702389638072376E-005 ME ratio CudaCpp/Fortran: MAX-1 = 6.3155154575245120E-004 PROGRAM : 44.8238s SMATRIX1MULTI : 39.6807s for 1056 Fortran events => throughput is 2.66E+01 events/s FBRIDGESEQUENCE : 3.6954s for 1056 CudaCpp events => throughput is 2.86E+02 events/s
…longer in the P1 patch
coupl_write.inc:8:32:
8 | WRITE(*,2) 'GC_3 = ', GC_3(1)
| 1
Error: PROCEDURE attribute conflicts with COMMON attribute in ‘gc_3’ at (1)
coupl_write.inc:9:34:
9 | WRITE(*,2) 'GC_50 = ', GC_50(1)
| 1
Error: PROCEDURE attribute conflicts with COMMON attribute in ‘gc_50’ at (1)
coupl_write.inc:10:34:
10 | WRITE(*,2) 'GC_59 = ', GC_59(1)
| 1
Error: PROCEDURE attribute conflicts with COMMON attribute in ‘gc_59’ at (1)
…builds ok! But it fails at runtime, oliviermattelaer/mg5amc_test#13 *** EXECUTE MADEVENT (create results.dat) *** -------------------- 2048 1 1 ! Number of events and max and min iterations 0.000001 ! Accuracy (ignored because max iterations = min iterations) 0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present) 0 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement) 0 ! Helicity Sum/event 0=exact 1 ! Channel number for single-diagram enhancement multi-channel (IGNORED as suppress amplitude is 0?) -------------------- Executing ' ./madevent < /tmp/avalassi/tmp.KAEy48Pdo5_fortran > /tmp/avalassi/tmp.Gkw3eKB3xm' Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG [XSECTION] ERROR! No cross section in log file: /tmp/avalassi/tmp.Gkw3eKB3xm ... xqcutij # 3> 0.0 0.0 Added good helicity 1 0.13852041978098753 in event 1 local: 1 Added good helicity 4 6.9350349637150188 in event 1 local: 1 Added good helicity 13 8.7879241967230062 in event 1 local: 1 Added good helicity 16 0.13852041978098747 in event 1 local: 1 RESET CUMULATIVE VARIABLE RESET CUMULATIVE VARIABLE 1024 points passed the cut but all returned zero therefore considering this contribution as zero
Fix conflicts: keep codegen logs from origin/fvsc
STARTED AT Thu May 19 20:20:35 CEST 2022 ./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttg -ggttgg -ggttggg ENDED(1) AT Thu May 19 22:31:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttgg -inlonly ENDED(2) AT Thu May 19 22:44:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -bridge ENDED(3) AT Thu May 19 22:48:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Thu May 19 22:50:41 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Thu May 19 22:53:04 CEST 2022 [Status=0]
Member
Author
|
I mark this as ready and will soon merge it. There are very many things that still need to be done, but many things are done, including (not exhaustive list)
Amongst the things that still need work (non exhaustive list)
|
Member
Author
|
All tests passed, I am self merging |
Member
Author
|
PS Example of a tmad log |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a WIP PR for the issues described in #417, namely some automatic comparison of Fortran vs Cudacpp, both for physics and computing performance