First patch for automatic Fortran vs cuda/cpp comparisons by valassi · Pull Request #454 · madgraph5/madgraph4gpu

valassi · 2022-05-11T16:12:34Z

This is a WIP PR for the issues described in #417, namely some automatic comparison of Fortran vs Cudacpp, both for physics and computing performance

At runtime, cmadevent_cudacpp was succeeding but gmadevent_cudacpp was crashing [avalassi@itscrd70 gcc10.2/cvmfs] /data/avalassi/GPU2020/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx> ./gmadevent_cudacpp < ../../../tmad/input_app_32_NOMULTICHANNEL.txt __CudaRuntime: calling cudaSetDevice(0) terminate called after throwing an instance of 'std::runtime_error' what(): Bridge constructor: nevt should be a multiple of 32 Program received signal SIGABRT: Process abort signal. Backtrace for this error: at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/vterminate.cc:95 at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:48 at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:58 at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:95 Aborted

Now both cmadevent and gmadevent are ok ./cmadevent_cudacpp < ../../../tmad/input_app_32_NOMULTICHANNEL.txt ... RESET CUMULATIVE VARIABLE 1 1.0585767962875954 1.0585746799742819 0.99999800079378187 2 0.34621899802570244 0.34621895155590371 0.99999986577917732 3 0.69641550793457307 0.69639622166046955 0.99997230636899415 4 1.1905598627825309 1.1905091128440706 0.99995737304771748 5 1.1553353972932170 1.1553050542823655 0.99997373662148448 6 1.0742962501588520 1.0742704701687749 0.99997600290415856 7 9.8420484883663377 9.8420305789332669 0.99999818031448517 8 0.32582685677422046 0.32582816664358427 1.0000040201393365 9 2.5149439789248635 2.5149325040463246 0.99999543732240759 10 5.0208307517816033 5.0205082151740275 0.99993576031068931 11 0.44788113881489167 0.44788033248057252 0.99999819966895398 12 0.48986309244196768 0.48985700370132096 0.99998757052584553 13 0.63328330290184909 0.63326795025360871 0.99997575706138153 14 6.1412376054978886 6.1408422471646906 0.99993562236822686 15 0.39555567583957080 0.39555551971280867 0.99999960529763154 16 0.52747130178068424 0.52746612471597976 0.99999018512535753 17 1.7437975984037744 1.7437125329156875 0.99995121825596922 18 1.4622130913237883 1.4621759933936889 0.99997462891672939 19 2.8196698917337062 2.8195133794161817 0.99994449268051433 20 0.49718730336075750 0.49718175998606901 0.99998885053047204 21 0.93730472272336696 0.93730460352298761 0.99999987282643898 22 0.98559242564719773 0.98555491534526574 0.99996194136546113 23 1.3917471502668604 1.3916827326186247 0.99995371454633608 24 0.39335740001747643 0.39335537900079015 0.99999486213635180 25 0.89921455803138484 0.89918280136248963 0.99996468399158844 26 1.7000555878730745 1.6999701821900650 0.99994976300562244 27 1.2501450481526981 1.2500906207696398 0.99995646314550568 28 0.40328952406822066 0.40328705864351660 0.99999388671275369 29 0.25396745231910817 0.25396745202905652 0.99999999885791802 30 0.59946736522418109 0.59945385928135198 0.99997747009493321 31 8.9853421261395425 8.9847441073480194 0.99993344507274984 32 0.71323994225410603 0.71322868924826455 0.99998422269256837 Iteration 1 Mean: 0.1982E+03 Abs mean: 0.1982E+03 Fluctuation: 0.751E+02 0.157E+04 100.0% ./gmadevent_cudacpp < ../../../tmad/input_app_32_NOMULTICHANNEL.txt ... RESET CUMULATIVE VARIABLE 1 1.0585767962875954 1.0585746799742821 0.99999800079378209 2 0.34621899802570244 0.34621895155590371 0.99999986577917732 3 0.69641550793457307 0.69639622166047044 0.99997230636899537 4 1.1905598627825309 1.1905091128440715 0.99995737304771826 5 1.1553353972932170 1.1553050542823660 0.99997373662148492 6 1.0742962501588520 1.0742704701687740 0.99997600290415778 7 9.8420484883663377 9.8420305789332652 0.99999818031448495 8 0.32582685677422046 0.32582816664358416 1.0000040201393361 9 2.5149439789248635 2.5149325040463246 0.99999543732240759 10 5.0208307517816033 5.0205082151740275 0.99993576031068931 11 0.44788113881489167 0.44788033248057252 0.99999819966895398 12 0.48986309244196768 0.48985700370132101 0.99998757052584564 13 0.63328330290184909 0.63326795025360894 0.99997575706138186 14 6.1412376054978886 6.1408422471646871 0.99993562236822631 15 0.39555567583957080 0.39555551971280867 0.99999960529763154 16 0.52747130178068424 0.52746612471597976 0.99999018512535753 17 1.7437975984037744 1.7437125329156880 0.99995121825596944 18 1.4622130913237883 1.4621759933936902 0.99997462891673028 19 2.8196698917337062 2.8195133794161817 0.99994449268051433 20 0.49718730336075750 0.49718175998606901 0.99998885053047204 21 0.93730472272336696 0.93730460352298728 0.99999987282643865 22 0.98559242564719773 0.98555491534526585 0.99996194136546124 23 1.3917471502668604 1.3916827326186247 0.99995371454633608 24 0.39335740001747643 0.39335537900079010 0.99999486213635169 25 0.89921455803138484 0.89918280136248874 0.99996468399158744 26 1.7000555878730745 1.6999701821900661 0.99994976300562310 27 1.2501450481526981 1.2500906207696396 0.99995646314550546 28 0.40328952406822066 0.40328705864351633 0.99999388671275302 29 0.25396745231910817 0.25396745202905652 0.99999999885791802 30 0.59946736522418109 0.59945385928135186 0.99997747009493310 31 8.9853421261395425 8.9847441073480176 0.99993344507274962 32 0.71323994225410603 0.71322868924826466 0.99998422269256848

…hQuiet=-1, BothDebug=-2)

Note a peculiar tendency to go LOW rather than go HIGH... systematic errors in the calculation?! C++ and CUDA agree with each other, but are often lower than Fortran ./gmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt ... RESET CUMULATIVE VARIABLE WARNING! Deviation more than 5E-5 10 5.0208307517816033 5.0205082151740275 0.99993576031068931 WARNING! Deviation more than 5E-5 14 6.1412376054978886 6.1408422471646871 0.99993562236822631 WARNING! Deviation more than 5E-5 19 2.8196698917337062 2.8195133794161817 0.99994449268051433 WARNING! Deviation more than 5E-5 26 1.7000555878730745 1.6999701821900661 0.99994976300562310 WARNING! Deviation more than 5E-5 31 8.9853421261395425 8.9847441073480176 0.99993344507274962 RESET CUMULATIVE VARIABLE WARNING! Deviation more than 5E-5 1 2.9952828789093009 2.9951147060108139 0.99994385408481079 WARNING! Deviation more than 5E-5 5 3.2674597739718134 3.2672949541081930 0.99994955718661538 WARNING! Deviation more than 5E-5 11 23.906164531244425 23.904572425204815 0.99993340186220470 WARNING! Deviation more than 5E-5 12 7.2115125554333357 7.2111024304164699 0.99994312912669658 WARNING! Deviation more than 5E-5 18 5.0277582816984809 5.0274636035675559 0.99994138975773805 WARNING! Deviation more than 5E-5 21 5.1118977339682887 5.1115741556696488 0.99993670094444775 WARNING! Deviation more than 5E-5 28 1.8208877147178686 1.8207945747421526 0.99994884913827298 WARNING! Deviation more than 5E-5 9 7.5632375656327859 7.5628103733018515 0.99994351726661668 WARNING! Deviation more than 5E-5 15 4.1469688581967885 4.1467152375571983 0.99993884192327875 WARNING! Deviation more than 5E-5 26 4.8212544866832658 4.8209820679043265 0.99994349628718171 WARNING! Deviation more than 5E-5 27 20.810883398215640 20.809421655551752 0.99992976066244199 ... ME ratio CudaCpp/Fortran: MIN = 0.99992976066244199 ME ratio CudaCpp/Fortran: MAX = 1.0000040201393361 ME ratio CudaCpp/Fortran: 1-MIN = 7.0239337558009041E-005 ME ratio CudaCpp/Fortran: MAX-1 = 4.0201393360916882E-006 ./cmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt ... ME ratio CudaCpp/Fortran: MIN = 0.99992976066244166 ME ratio CudaCpp/Fortran: MAX = 1.0000040201393365 ME ratio CudaCpp/Fortran: 1-MIN = 7.0239337558342108E-005 ME ratio CudaCpp/Fortran: MAX-1 = 4.0201393365357774E-006

./gmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt ... RESET CUMULATIVE VARIABLE WARNING! Deviation more than 5E-5 10 5.0208307517816033 5.0204830169677734 0.99993074157823258 WARNING! Deviation more than 5E-5 14 6.1412376054978886 6.1409120559692383 0.99994698958914097 WARNING! Deviation more than 5E-5 19 2.8196698917337062 2.8195171356201172 0.99994582482366579 WARNING! Deviation more than 5E-5 23 1.3917471502668604 1.3916516304016113 0.99993136694030182 WARNING! Deviation more than 5E-5 30 0.59946736522418109 0.59986174106597900 1.0006578770833512 WARNING! Deviation more than 5E-5 31 8.9853421261395425 8.9848155975341797 0.99994140138483645 RESET CUMULATIVE VARIABLE WARNING! Deviation more than 5E-5 1 2.9952828789093009 2.9951152801513672 0.99994404576639029 WARNING! Deviation more than 5E-5 12 7.2115125554333357 7.2111072540283203 0.99994379800327604 WARNING! Deviation more than 5E-5 18 5.0277582816984809 5.0274577140808105 0.99994021836356684 WARNING! Deviation more than 5E-5 21 5.1118977339682887 5.1116113662719727 0.99994398015938124 WARNING! Deviation more than 5E-5 28 1.8208877147178686 1.8207850456237793 0.99994361591148129 WARNING! Deviation more than 5E-5 9 7.5632375656327859 7.5628528594970703 0.99994913472803448 WARNING! Deviation more than 5E-5 12 0.34758494237401355 0.34689116477966309 0.99800400561194635 WARNING! Deviation more than 5E-5 13 0.35451575056455586 0.35453897714614868 1.0000655163601500 WARNING! Deviation more than 5E-5 15 4.1469688581967885 4.1467485427856445 0.99994687314550035 WARNING! Deviation more than 5E-5 26 4.8212544866832658 4.8209867477416992 0.99994446695516570 WARNING! Deviation more than 5E-5 27 20.810883398215640 20.813144683837891 1.0001086588002528 WARNING! Deviation more than 5E-5 28 1.5052106057124179 1.5051305294036865 0.99994680059492835 Iteration 1 Mean: 0.4017E+03 Abs mean: 0.4017E+03 Fluctuation: 0.119E+03 0.525E+04 100.0% ... ME ratio CudaCpp/Fortran: MIN = 0.99800400561194635 ME ratio CudaCpp/Fortran: MAX = 1.0006578770833512 ME ratio CudaCpp/Fortran: 1-MIN = 1.9959943880536457E-003 ME ratio CudaCpp/Fortran: MAX-1 = 6.5787708335118822E-004 ./cmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt ... ME ratio CudaCpp/Fortran: MIN = 0.99800323394186297 ME ratio CudaCpp/Fortran: MAX = 1.0006573799366487 ME ratio CudaCpp/Fortran: 1-MIN = 1.9967660581370339E-003 ME ratio CudaCpp/Fortran: MAX-1 = 6.5737993664871652E-004

…atrix1.f

…s instead (deal with FPTYPE=f) Revert "[fvsc] increase the threshold for warnings to 1E-4 to reduce verbosity" This reverts commit dbdd7b1.

…ase the number of events...

… my tput scripts for ggtt) For cmadevent (default 512y) on ggtt: ME ratio CudaCpp/Fortran: MIN = 0.99992609952020872 ME ratio CudaCpp/Fortran: MAX = 1.0000053090530967 ME ratio CudaCpp/Fortran: 1-MIN = 7.3900479791277895E-005 ME ratio CudaCpp/Fortran: MAX-1 = 5.3090530967025984E-006 PROGRAM : 32.5986s SMATRIX1MULTI : 9.0047s for 524320 Fortran events => throughput is 5.82E+04 events/s FBRIDGESEQUENCE : 1.3071s for 524320 CudaCpp events => throughput is 4.01E+05 events/s Note that tput tests give 6.12E5... probably losing something in data copies, but still a factor 6-7 faster than Fortran!

git diff --no-ext-diff e5a2db8 gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f > CODEGEN/MG5aMC_patches/patch.driver.f

…ence to create patches!)

Revert "[fvsc] TEMPORARY! generate ggtt.mad without patchMad (use it as reference to create patches!)" This reverts commit 63b32f9c09dff5659cd20947a499746c5d6ccfdb.

…orary ref) git diff --no-ext-diff 63b32f9c gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f > CODEGEN/MG5aMC_patches/patch.auto_dsig1.f

Use this on eemumu, funnily it generates 16385 events but goes through 512k MEs (which is what I wanted) ME ratio CudaCpp/Fortran: MIN = 0.99999999999999911 ME ratio CudaCpp/Fortran: MAX = 1.0000000000000007 ME ratio CudaCpp/Fortran: 1-MIN = 8.8817841970012523E-016 ME ratio CudaCpp/Fortran: MAX-1 = 6.6613381477509392E-016 PROGRAM : 8.3820s SMATRIX1MULTI : 3.6777s for 524320 Fortran events => throughput is 1.43E+05 events/s FBRIDGESEQUENCE : 0.0983s for 524320 CudaCpp events => throughput is 5.33E+06 events/s Notable points: - Fortran and Cudacpp MEs agree to E-16! Very different from ggtt - MEs (SMATRIXMULTI) are around 50% of the total time in Fortran - MEs in CPP are a factor 40(!) faster than in Fortran?!

Note nice results for ggttggg CUDA (still 32 events in flight only!) ME ratio CudaCpp/Fortran: MIN = 0.99996229761036137 ME ratio CudaCpp/Fortran: MAX = 1.0006315515457513 ME ratio CudaCpp/Fortran: 1-MIN = 3.7702389638627487E-005 ME ratio CudaCpp/Fortran: MAX-1 = 6.3155154575134098E-004 PROGRAM : 50.8005s SMATRIX1MULTI : 39.5586s for 1056 Fortran events => throughput is 2.67E+01 events/s FBRIDGESEQUENCE : 9.2071s for 1056 CudaCpp events => throughput is 1.15E+02 events/s CPP ME ratio CudaCpp/Fortran: MIN = 0.99996229761036193 ME ratio CudaCpp/Fortran: MAX = 1.0006315515457525 ME ratio CudaCpp/Fortran: 1-MIN = 3.7702389638072376E-005 ME ratio CudaCpp/Fortran: MAX-1 = 6.3155154575245120E-004 PROGRAM : 44.8238s SMATRIX1MULTI : 39.6807s for 1056 Fortran events => throughput is 2.66E+01 events/s FBRIDGESEQUENCE : 3.6954s for 1056 CudaCpp events => throughput is 2.86E+02 events/s

…longer in the P1 patch

coupl_write.inc:8:32: 8 | WRITE(*,2) 'GC_3 = ', GC_3(1) | 1 Error: PROCEDURE attribute conflicts with COMMON attribute in ‘gc_3’ at (1) coupl_write.inc:9:34: 9 | WRITE(*,2) 'GC_50 = ', GC_50(1) | 1 Error: PROCEDURE attribute conflicts with COMMON attribute in ‘gc_50’ at (1) coupl_write.inc:10:34: 10 | WRITE(*,2) 'GC_59 = ', GC_59(1) | 1 Error: PROCEDURE attribute conflicts with COMMON attribute in ‘gc_59’ at (1)

…adgraph5#456

…builds ok! But it fails at runtime, oliviermattelaer/mg5amc_test#13 *** EXECUTE MADEVENT (create results.dat) *** -------------------- 2048 1 1 ! Number of events and max and min iterations 0.000001 ! Accuracy (ignored because max iterations = min iterations) 0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present) 0 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement) 0 ! Helicity Sum/event 0=exact 1 ! Channel number for single-diagram enhancement multi-channel (IGNORED as suppress amplitude is 0?) -------------------- Executing ' ./madevent < /tmp/avalassi/tmp.KAEy48Pdo5_fortran > /tmp/avalassi/tmp.Gkw3eKB3xm' Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG [XSECTION] ERROR! No cross section in log file: /tmp/avalassi/tmp.Gkw3eKB3xm ... xqcutij # 3> 0.0 0.0 Added good helicity 1 0.13852041978098753 in event 1 local: 1 Added good helicity 4 6.9350349637150188 in event 1 local: 1 Added good helicity 13 8.7879241967230062 in event 1 local: 1 Added good helicity 16 0.13852041978098747 in event 1 local: 1 RESET CUMULATIVE VARIABLE RESET CUMULATIVE VARIABLE 1024 points passed the cut but all returned zero therefore considering this contribution as zero

Fix conflicts: keep codegen logs from origin/fvsc

STARTED AT Thu May 19 20:20:35 CEST 2022 ./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttg -ggttgg -ggttggg ENDED(1) AT Thu May 19 22:31:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttgg -inlonly ENDED(2) AT Thu May 19 22:44:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -bridge ENDED(3) AT Thu May 19 22:48:12 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Thu May 19 22:50:41 CEST 2022 [Status=0] ./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Thu May 19 22:53:04 CEST 2022 [Status=0]

…tmad logs

valassi · 2022-05-20T13:33:51Z

I mark this as ready and will soon merge it.

There are very many things that still need to be done, but many things are done, including (not exhaustive list)

I have madX.sh and teeMadX.sh scripts which generate some logs with interesting numbers, both for physics values and for computing performance (see an example below). They include some process-dependent tuning of how many events should be executed through madevent (with fortran, cuda, cpp MEs) and also of how many events per loop should be used.
There is some initial progress at running many more GPU events via Madevent (see In Fortran MadEvent, use larger arrays (beyond 16 events, up to 16k) to allow efficient usage of GPUs #455). However there are too many arrays in Fortran that need to be dimensioned this way, and overall I do not manage to go beyond 16k events per grid, else it segfaults (I guess it is out of memory)
NB the code is still doing the main ME calculation in Fortran, with cudacpp as only a cross check. We need multichannel and the full API with color/helicity choice to go beyond this step.

Amongst the things that still need work (non exhaustive list)

Go beyond 16k events per grid, see In Fortran MadEvent, use larger arrays (beyone 16k) to allow efficient usage of GPU #460
Understand the performance differences between cudacpp in standalone mode and in madevent mode, Understand performance difference in cudacpp between SA and madevent #461
Fix the eemumu process, which is returning a zero cross section Simple ee->mumu returns 0 cross section in 311 branch oliviermattelaer/mg5amc_test#13
Go beyond the cross section calculation in madevent and into unweighted event generation, which is failing with "Error: failed to reduce to color indices" in gg to ttg (only for event generation, not for cross sections) oliviermattelaer/mg5amc_test#14
Once multichannel is done (and later random choices of color/helicity) - MadEvent final bridge interface: add multichannel, alphas, color index #404 - do the full calculation of xsec and unweighted event generation in cudacpp instead of fortran, and compare physics results,
In parallel (this can be done also now at the ME level), understand the E-3 level differences in MEs between fortran and cudacpp, Physics results are different in fortran and cudacpp MEs (by factors or orders of magnitude) #417

valassi · 2022-05-20T13:34:21Z

All tests passed, I am self merging

valassi · 2022-05-20T13:35:23Z

PS Example of a tmad log

https://github.com/madgraph5/madgraph4gpu/blob/d1fd8b201803b366579f925017f09a85001ce9f8/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt

DATE: 2022-05-20_15:00:32

Working directory: /data/avalassi/GPU2020/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg

*** EXECUTE MADEVENT (create results.dat) ***
 [XSECTION] Cross section = 80.03
 [COUNTERS] PROGRAM TOTAL          :    4.2070s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.2606s
 [COUNTERS] Fortran MEs      ( 1 ) :    3.9463s for       96 events => throughput is 2.43E+01 events/s

*** EXECUTE CMADEVENT_CUDACPP (create results.dat) ***
 [XSECTION] Cross section = 80.03
 [MERATIOS] ME ratio CudaCpp/Fortran: MIN = 0.99997121 = 1 - 2.9e-05
 [MERATIOS] ME ratio CudaCpp/Fortran: MAX = 1.00014921 = 1 + 0.00015
 [MERATIOS] ME ratio CudaCpp/Fortran: NENTRIES =     96
 [MERATIOS] ME ratio CudaCpp/Fortran - 1: AVG = 1.3e-05 +- 2.9e-06
 [COUNTERS] PROGRAM TOTAL          :    4.2596s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.2727s
 [COUNTERS] Fortran MEs      ( 1 ) :    3.5988s for       96 events => throughput is 2.67E+01 events/s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    0.3881s for       96 events => throughput is 2.47E+02 events/s

*** EXECUTE CHECK -p 2 32 1 --bridge ***
Process                     = SIGMA_SM_GG_TTXGGG_CPP [gcc 10.2.0] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CPP:DBL+CXS:CURHST+RMBHST+BRDHST/512y+CXVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.888841e+02                 )  sec^-1

*** EXECUTE CHECK -p 2 32 1 ***
Process                     = SIGMA_SM_GG_TTXGGG_CPP [gcc 10.2.0] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CPP:DBL+CXS:CURHST+RMBHST+MESHST/512y+CXVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.892185e+02                 )  sec^-1

*** EXECUTE GMADEVENT_CUDACPP (create results.dat) ***
 [XSECTION] Cross section = 404.9
 [MERATIOS] ME ratio CudaCpp/Fortran: MIN = 0.99997121 = 1 - 2.9e-05
 [MERATIOS] ME ratio CudaCpp/Fortran: MAX = 1.00007834 = 1 + 7.8e-05
 [MERATIOS] ME ratio CudaCpp/Fortran: NENTRIES =     64
 [MERATIOS] ME ratio CudaCpp/Fortran - 1: AVG = 1.1e-05 +- 2.9e-06
 [COUNTERS] PROGRAM TOTAL          :    3.6001s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.5031s
 [COUNTERS] Fortran MEs      ( 1 ) :    2.4043s for       64 events => throughput is 2.66E+01 events/s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    0.6928s for       64 events => throughput is 9.24E+01 events/s

*** EXECUTE GCHECK -p 2 32 1 --bridge ***
Process                     = SIGMA_SM_GG_TTXGGG_CUDA [nvcc 11.6.124 (gcc 10.2.0)] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CUD:DBL+THX:CURHST+RMBHST+BRDDEV/none+NAVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.032552e+02                 )  sec^-1

*** EXECUTE GCHECK -p 2 32 1 ***
Process                     = SIGMA_SM_GG_TTXGGG_CUDA [nvcc 11.6.124 (gcc 10.2.0)] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CUD:DBL+THX:CURDEV+RMBDEV+MESDEV/none+NAVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.367983e+02                 )  sec^-1

TEST COMPLETED

valassi added 2 commits May 11, 2022 18:06

valassi self-assigned this May 11, 2022

valassi marked this pull request as draft May 11, 2022 16:12

valassi added 26 commits May 11, 2022 18:53

[fvsc] in ggtt.mad add MEEXPORTER_MODE (CppOnly=1, FortranOnly=0, Bot…

e2f0a49

…hQuiet=-1, BothDebug=-2)

[fvsc] inggtt.mad remove old commented out code for printing momenta

8445067

[fvsc] add an input file for 64 events

532639c

[fvsc] dump counters results also to stdout

2464d0e

[fvsc] add counters for Fortran in auto_dsig1.f rather than only in m…

91a488c

…atrix1.f

[fvsc] move counters infrastructure to time smatrix1multi

5bc57fd

[fvsc] print SMATRIX1MULTI or FBRIDGESEQUENCE

a3e7679

[fvsc] remove COUNTERS_USETIMER to simplify the code

f3bb5de

[fvsc] increase the threshold for warnings to 1E-4 to reduce verbosity

dbdd7b1

[fvsc] Revert to 5E-5 threshold, will add a maximum number of warning…

8c8a8f5

…s instead (deal with FPTYPE=f) Revert "[fvsc] increase the threshold for warnings to 1E-4 to reduce verbosity" This reverts commit dbdd7b1.

[fvsc] limit warnings to 20 and improve formatting - prepare to incre…

2368b5f

…ase the number of events...

[fvsc] backport driver.f changes to codegen

d09bea6

git diff --no-ext-diff e5a2db8 gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f > CODEGEN/MG5aMC_patches/patch.driver.f

[fvsc] backport to codegen the changes in counters.cpp and fbridge.inc

e8b519c

[fvsc] TEMPORARY! generate ggtt.mad without patchMad (use it as refer…

8094658

…ence to create patches!)

[fvsc] revert to codegen with patchMad and to ggtt with desired changes

48bac81

Revert "[fvsc] TEMPORARY! generate ggtt.mad without patchMad (use it as reference to create patches!)" This reverts commit 63b32f9c09dff5659cd20947a499746c5d6ccfdb.

[fvsc] backport to codegen the changes in autodsig1.f (using the temp…

f17357d

…orary ref) git diff --no-ext-diff 63b32f9c gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f > CODEGEN/MG5aMC_patches/patch.auto_dsig1.f

[fvsc] regenerate ggtt.mad, check it is stable, all ok

f0bdaef

[fvsc] regenerate eemumu.mad

7ca72f3

[fvsc] (harmless) bug fix in counter position in ggtt.mad auto_dsig1.f

0afca72

[fvsc] backport auto_dsig1.f fix to codegen

98a032d

[fvsc] regenerate ggtt.mad, all ok no change

b319591

valassi added 6 commits May 19, 2022 20:02

[fvsc] in codegen remove older handling of auto_dsig1.f, which is no …

7c0572a

…longer in the P1 patch

[fvsc] regenerate all four ggtt*mad, stable, all ok

a2ca023

[fvsc] in codegen add different printouts for scalar/vector couplings m…

127b256

…adgraph5#456

[fvsc] regenerate all four ggtt* mad,all ok, stable

9ef9837

valassi force-pushed the fvsc branch from d977541 to 9ef9837 Compare May 19, 2022 18:03

valassi mentioned this pull request May 19, 2022

failed to convert GOTPCREL relocation #459

Closed

valassi linked an issue May 19, 2022 that may be closed by this pull request

failed to convert GOTPCREL relocation #459

Closed

valassi added 11 commits May 19, 2022 20:10

Merge branch 'quad' into fvsc

0ad1a8c

Fix conflicts: keep codegen logs from origin/fvsc

[fvsc] regenerate all five *mad including latest quad PR madgraph5#457

f3ed17e

[fvsc] regenerate all five processes (non mad) auto and resync manu

c545779

[fvsc] regenerate also heftggh auto and manu

f00a5b7

Merge remote-tracking branch 'upstream/master' into fvsc

ca3b0bf

[fvsc] in madX.sh split getnevt from getinputfile

af83afc

[fvsc] add check.exe/gcheck.exe in madX.sh script

85f3e55

[fvsc] add makecleanonly and makeonly to madX.sh

6262344

[fvsc] add a first version of teeMadX.sh

256a118

[fvsc] ** COMPLETE (FIRST PART OF) FVSC ** add first version of five …

d1fd8b2

…tmad logs

valassi changed the title ~~WIP - progress on automatic Fortran vs cuda/cpp comparisons~~ First patch for automatic Fortran vs cuda/cpp comparisons May 20, 2022

valassi marked this pull request as ready for review May 20, 2022 13:05

This was referenced May 20, 2022

In Fortran MadEvent, use larger arrays (beyone 16k) to allow efficient usage of GPU #460

Open

In Fortran MadEvent, use larger arrays (beyond 16 events, up to 16k) to allow efficient usage of GPUs #455

Open

valassi linked an issue May 20, 2022 that may be closed by this pull request

In Fortran MadEvent, use larger arrays (beyond 16 events, up to 16k) to allow efficient usage of GPUs #455

Open

valassi mentioned this pull request May 20, 2022

Understand performance difference in cudacpp between SA and madevent #461

Closed

valassi merged commit ca21d57 into madgraph5:master May 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First patch for automatic Fortran vs cuda/cpp comparisons#454

First patch for automatic Fortran vs cuda/cpp comparisons#454
valassi merged 190 commits intomadgraph5:masterfrom
valassi:fvsc

valassi commented May 11, 2022

Uh oh!

valassi commented May 20, 2022

Uh oh!

valassi commented May 20, 2022

Uh oh!

valassi commented May 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

valassi commented May 11, 2022

Uh oh!

valassi commented May 20, 2022

Uh oh!

valassi commented May 20, 2022

Uh oh!

valassi commented May 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant