Skip to content

First patch for automatic Fortran vs cuda/cpp comparisons#454

Merged
valassi merged 190 commits intomadgraph5:masterfrom
valassi:fvsc
May 20, 2022
Merged

First patch for automatic Fortran vs cuda/cpp comparisons#454
valassi merged 190 commits intomadgraph5:masterfrom
valassi:fvsc

Conversation

@valassi
Copy link
Member

@valassi valassi commented May 11, 2022

This is a WIP PR for the issues described in #417, namely some automatic comparison of Fortran vs Cudacpp, both for physics and computing performance

valassi added 2 commits May 11, 2022 18:06
At runtime, cmadevent_cudacpp was succeeding but gmadevent_cudacpp was crashing

[avalassi@itscrd70 gcc10.2/cvmfs] /data/avalassi/GPU2020/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx> ./gmadevent_cudacpp < ../../../tmad/input_app_32_NOMULTICHANNEL.txt
__CudaRuntime: calling cudaSetDevice(0)
terminate called after throwing an instance of 'std::runtime_error'
  what():  Bridge constructor: nevt should be a multiple of 32

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
        at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/vterminate.cc:95
        at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:48
        at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:58
        at /build/dkonst/CONTRIB/build/contrib/gcc-10.2.0/src/gcc/10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:95
Aborted
Now both cmadevent and gmadevent are ok

./cmadevent_cudacpp < ../../../tmad/input_app_32_NOMULTICHANNEL.txt
...
 RESET CUMULATIVE VARIABLE
           1   1.0585767962875954        1.0585746799742819       0.99999800079378187
           2  0.34621899802570244       0.34621895155590371       0.99999986577917732
           3  0.69641550793457307       0.69639622166046955       0.99997230636899415
           4   1.1905598627825309        1.1905091128440706       0.99995737304771748
           5   1.1553353972932170        1.1553050542823655       0.99997373662148448
           6   1.0742962501588520        1.0742704701687749       0.99997600290415856
           7   9.8420484883663377        9.8420305789332669       0.99999818031448517
           8  0.32582685677422046       0.32582816664358427        1.0000040201393365
           9   2.5149439789248635        2.5149325040463246       0.99999543732240759
          10   5.0208307517816033        5.0205082151740275       0.99993576031068931
          11  0.44788113881489167       0.44788033248057252       0.99999819966895398
          12  0.48986309244196768       0.48985700370132096       0.99998757052584553
          13  0.63328330290184909       0.63326795025360871       0.99997575706138153
          14   6.1412376054978886        6.1408422471646906       0.99993562236822686
          15  0.39555567583957080       0.39555551971280867       0.99999960529763154
          16  0.52747130178068424       0.52746612471597976       0.99999018512535753
          17   1.7437975984037744        1.7437125329156875       0.99995121825596922
          18   1.4622130913237883        1.4621759933936889       0.99997462891672939
          19   2.8196698917337062        2.8195133794161817       0.99994449268051433
          20  0.49718730336075750       0.49718175998606901       0.99998885053047204
          21  0.93730472272336696       0.93730460352298761       0.99999987282643898
          22  0.98559242564719773       0.98555491534526574       0.99996194136546113
          23   1.3917471502668604        1.3916827326186247       0.99995371454633608
          24  0.39335740001747643       0.39335537900079015       0.99999486213635180
          25  0.89921455803138484       0.89918280136248963       0.99996468399158844
          26   1.7000555878730745        1.6999701821900650       0.99994976300562244
          27   1.2501450481526981        1.2500906207696398       0.99995646314550568
          28  0.40328952406822066       0.40328705864351660       0.99999388671275369
          29  0.25396745231910817       0.25396745202905652       0.99999999885791802
          30  0.59946736522418109       0.59945385928135198       0.99997747009493321
          31   8.9853421261395425        8.9847441073480194       0.99993344507274984
          32  0.71323994225410603       0.71322868924826455       0.99998422269256837
 Iteration  1   Mean: 0.1982E+03 Abs mean: 0.1982E+03   Fluctuation:  0.751E+02   0.157E+04   100.0%

./gmadevent_cudacpp < ../../../tmad/input_app_32_NOMULTICHANNEL.txt
...
 RESET CUMULATIVE VARIABLE
           1   1.0585767962875954        1.0585746799742821       0.99999800079378209
           2  0.34621899802570244       0.34621895155590371       0.99999986577917732
           3  0.69641550793457307       0.69639622166047044       0.99997230636899537
           4   1.1905598627825309        1.1905091128440715       0.99995737304771826
           5   1.1553353972932170        1.1553050542823660       0.99997373662148492
           6   1.0742962501588520        1.0742704701687740       0.99997600290415778
           7   9.8420484883663377        9.8420305789332652       0.99999818031448495
           8  0.32582685677422046       0.32582816664358416        1.0000040201393361
           9   2.5149439789248635        2.5149325040463246       0.99999543732240759
          10   5.0208307517816033        5.0205082151740275       0.99993576031068931
          11  0.44788113881489167       0.44788033248057252       0.99999819966895398
          12  0.48986309244196768       0.48985700370132101       0.99998757052584564
          13  0.63328330290184909       0.63326795025360894       0.99997575706138186
          14   6.1412376054978886        6.1408422471646871       0.99993562236822631
          15  0.39555567583957080       0.39555551971280867       0.99999960529763154
          16  0.52747130178068424       0.52746612471597976       0.99999018512535753
          17   1.7437975984037744        1.7437125329156880       0.99995121825596944
          18   1.4622130913237883        1.4621759933936902       0.99997462891673028
          19   2.8196698917337062        2.8195133794161817       0.99994449268051433
          20  0.49718730336075750       0.49718175998606901       0.99998885053047204
          21  0.93730472272336696       0.93730460352298728       0.99999987282643865
          22  0.98559242564719773       0.98555491534526585       0.99996194136546124
          23   1.3917471502668604        1.3916827326186247       0.99995371454633608
          24  0.39335740001747643       0.39335537900079010       0.99999486213635169
          25  0.89921455803138484       0.89918280136248874       0.99996468399158744
          26   1.7000555878730745        1.6999701821900661       0.99994976300562310
          27   1.2501450481526981        1.2500906207696396       0.99995646314550546
          28  0.40328952406822066       0.40328705864351633       0.99999388671275302
          29  0.25396745231910817       0.25396745202905652       0.99999999885791802
          30  0.59946736522418109       0.59945385928135186       0.99997747009493310
          31   8.9853421261395425        8.9847441073480176       0.99993344507274962
          32  0.71323994225410603       0.71322868924826466       0.99998422269256848
@valassi valassi self-assigned this May 11, 2022
@valassi valassi marked this pull request as draft May 11, 2022 16:12
valassi added 26 commits May 11, 2022 18:53
Note a peculiar tendency to go LOW rather than go HIGH... systematic errors in the calculation?!

C++ and CUDA agree with each other, but are often lower than Fortran

./gmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt
...
 RESET CUMULATIVE VARIABLE
 WARNING! Deviation more than 5E-5          10   5.0208307517816033        5.0205082151740275       0.99993576031068931
 WARNING! Deviation more than 5E-5          14   6.1412376054978886        6.1408422471646871       0.99993562236822631
 WARNING! Deviation more than 5E-5          19   2.8196698917337062        2.8195133794161817       0.99994449268051433
 WARNING! Deviation more than 5E-5          26   1.7000555878730745        1.6999701821900661       0.99994976300562310
 WARNING! Deviation more than 5E-5          31   8.9853421261395425        8.9847441073480176       0.99993344507274962
 RESET CUMULATIVE VARIABLE
 WARNING! Deviation more than 5E-5           1   2.9952828789093009        2.9951147060108139       0.99994385408481079
 WARNING! Deviation more than 5E-5           5   3.2674597739718134        3.2672949541081930       0.99994955718661538
 WARNING! Deviation more than 5E-5          11   23.906164531244425        23.904572425204815       0.99993340186220470
 WARNING! Deviation more than 5E-5          12   7.2115125554333357        7.2111024304164699       0.99994312912669658
 WARNING! Deviation more than 5E-5          18   5.0277582816984809        5.0274636035675559       0.99994138975773805
 WARNING! Deviation more than 5E-5          21   5.1118977339682887        5.1115741556696488       0.99993670094444775
 WARNING! Deviation more than 5E-5          28   1.8208877147178686        1.8207945747421526       0.99994884913827298
 WARNING! Deviation more than 5E-5           9   7.5632375656327859        7.5628103733018515       0.99994351726661668
 WARNING! Deviation more than 5E-5          15   4.1469688581967885        4.1467152375571983       0.99993884192327875
 WARNING! Deviation more than 5E-5          26   4.8212544866832658        4.8209820679043265       0.99994349628718171
 WARNING! Deviation more than 5E-5          27   20.810883398215640        20.809421655551752       0.99992976066244199
...
 ME ratio CudaCpp/Fortran: MIN =   0.99992976066244199
 ME ratio CudaCpp/Fortran: MAX =    1.0000040201393361
 ME ratio CudaCpp/Fortran: 1-MIN =    7.0239337558009041E-005
 ME ratio CudaCpp/Fortran: MAX-1 =    4.0201393360916882E-006

./cmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt
...
 ME ratio CudaCpp/Fortran: MIN =   0.99992976066244166
 ME ratio CudaCpp/Fortran: MAX =    1.0000040201393365
 ME ratio CudaCpp/Fortran: 1-MIN =    7.0239337558342108E-005
 ME ratio CudaCpp/Fortran: MAX-1 =    4.0201393365357774E-006
./gmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt
...
 RESET CUMULATIVE VARIABLE
 WARNING! Deviation more than 5E-5          10   5.0208307517816033        5.0204830169677734       0.99993074157823258
 WARNING! Deviation more than 5E-5          14   6.1412376054978886        6.1409120559692383       0.99994698958914097
 WARNING! Deviation more than 5E-5          19   2.8196698917337062        2.8195171356201172       0.99994582482366579
 WARNING! Deviation more than 5E-5          23   1.3917471502668604        1.3916516304016113       0.99993136694030182
 WARNING! Deviation more than 5E-5          30  0.59946736522418109       0.59986174106597900        1.0006578770833512
 WARNING! Deviation more than 5E-5          31   8.9853421261395425        8.9848155975341797       0.99994140138483645
 RESET CUMULATIVE VARIABLE
 WARNING! Deviation more than 5E-5           1   2.9952828789093009        2.9951152801513672       0.99994404576639029
 WARNING! Deviation more than 5E-5          12   7.2115125554333357        7.2111072540283203       0.99994379800327604
 WARNING! Deviation more than 5E-5          18   5.0277582816984809        5.0274577140808105       0.99994021836356684
 WARNING! Deviation more than 5E-5          21   5.1118977339682887        5.1116113662719727       0.99994398015938124
 WARNING! Deviation more than 5E-5          28   1.8208877147178686        1.8207850456237793       0.99994361591148129
 WARNING! Deviation more than 5E-5           9   7.5632375656327859        7.5628528594970703       0.99994913472803448
 WARNING! Deviation more than 5E-5          12  0.34758494237401355       0.34689116477966309       0.99800400561194635
 WARNING! Deviation more than 5E-5          13  0.35451575056455586       0.35453897714614868        1.0000655163601500
 WARNING! Deviation more than 5E-5          15   4.1469688581967885        4.1467485427856445       0.99994687314550035
 WARNING! Deviation more than 5E-5          26   4.8212544866832658        4.8209867477416992       0.99994446695516570
 WARNING! Deviation more than 5E-5          27   20.810883398215640        20.813144683837891        1.0001086588002528
 WARNING! Deviation more than 5E-5          28   1.5052106057124179        1.5051305294036865       0.99994680059492835
 Iteration  1   Mean: 0.4017E+03 Abs mean: 0.4017E+03   Fluctuation:  0.119E+03   0.525E+04   100.0%
...
 ME ratio CudaCpp/Fortran: MIN =   0.99800400561194635
 ME ratio CudaCpp/Fortran: MAX =    1.0006578770833512
 ME ratio CudaCpp/Fortran: 1-MIN =    1.9959943880536457E-003
 ME ratio CudaCpp/Fortran: MAX-1 =    6.5787708335118822E-004

./cmadevent_cudacpp < ../../../tmad/input_app_64_NOMULTICHANNEL.txt
...
 ME ratio CudaCpp/Fortran: MIN =   0.99800323394186297
 ME ratio CudaCpp/Fortran: MAX =    1.0006573799366487
 ME ratio CudaCpp/Fortran: 1-MIN =    1.9967660581370339E-003
 ME ratio CudaCpp/Fortran: MAX-1 =    6.5737993664871652E-004
…s instead (deal with FPTYPE=f)

Revert "[fvsc] increase the threshold for warnings to 1E-4 to reduce verbosity"
This reverts commit dbdd7b1.
… my tput scripts for ggtt)

For cmadevent (default 512y) on ggtt:
 ME ratio CudaCpp/Fortran: MIN =   0.99992609952020872
 ME ratio CudaCpp/Fortran: MAX =    1.0000053090530967
 ME ratio CudaCpp/Fortran: 1-MIN =    7.3900479791277895E-005
 ME ratio CudaCpp/Fortran: MAX-1 =    5.3090530967025984E-006
PROGRAM         :   32.5986s
SMATRIX1MULTI   :    9.0047s for   524320 Fortran events => throughput is 5.82E+04 events/s
FBRIDGESEQUENCE :    1.3071s for   524320 CudaCpp events => throughput is 4.01E+05 events/s

Note that tput tests give 6.12E5... probably losing something in data copies, but still a factor 6-7 faster than Fortran!
git diff --no-ext-diff e5a2db8 gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f > CODEGEN/MG5aMC_patches/patch.driver.f
Revert "[fvsc] TEMPORARY! generate ggtt.mad without patchMad (use it as reference to create patches!)"
This reverts commit 63b32f9c09dff5659cd20947a499746c5d6ccfdb.
…orary ref)

git diff --no-ext-diff 63b32f9c gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f > CODEGEN/MG5aMC_patches/patch.auto_dsig1.f
Use this on eemumu, funnily it generates 16385 events but goes through 512k MEs (which is what I wanted)

 ME ratio CudaCpp/Fortran: MIN =   0.99999999999999911
 ME ratio CudaCpp/Fortran: MAX =    1.0000000000000007
 ME ratio CudaCpp/Fortran: 1-MIN =    8.8817841970012523E-016
 ME ratio CudaCpp/Fortran: MAX-1 =    6.6613381477509392E-016
PROGRAM         :    8.3820s
SMATRIX1MULTI   :    3.6777s for   524320 Fortran events => throughput is 1.43E+05 events/s
FBRIDGESEQUENCE :    0.0983s for   524320 CudaCpp events => throughput is 5.33E+06 events/s

Notable points:
- Fortran and Cudacpp MEs agree to E-16! Very different from ggtt
- MEs (SMATRIXMULTI) are around 50% of the total time in Fortran
- MEs in CPP are a factor 40(!) faster than in Fortran?!
Note nice results for ggttggg

CUDA (still 32 events in flight only!)
 ME ratio CudaCpp/Fortran: MIN =   0.99996229761036137
 ME ratio CudaCpp/Fortran: MAX =    1.0006315515457513
 ME ratio CudaCpp/Fortran: 1-MIN =    3.7702389638627487E-005
 ME ratio CudaCpp/Fortran: MAX-1 =    6.3155154575134098E-004
PROGRAM         :   50.8005s
SMATRIX1MULTI   :   39.5586s for     1056 Fortran events => throughput is 2.67E+01 events/s
FBRIDGESEQUENCE :    9.2071s for     1056 CudaCpp events => throughput is 1.15E+02 events/s

CPP
 ME ratio CudaCpp/Fortran: MIN =   0.99996229761036193
 ME ratio CudaCpp/Fortran: MAX =    1.0006315515457525
 ME ratio CudaCpp/Fortran: 1-MIN =    3.7702389638072376E-005
 ME ratio CudaCpp/Fortran: MAX-1 =    6.3155154575245120E-004
PROGRAM         :   44.8238s
SMATRIX1MULTI   :   39.6807s for     1056 Fortran events => throughput is 2.66E+01 events/s
FBRIDGESEQUENCE :    3.6954s for     1056 CudaCpp events => throughput is 2.86E+02 events/s
valassi added 6 commits May 19, 2022 20:02
coupl_write.inc:8:32:

    8 |       WRITE(*,2) 'GC_3 = ', GC_3(1)
      |                                1
Error: PROCEDURE attribute conflicts with COMMON attribute in ‘gc_3’ at (1)
coupl_write.inc:9:34:

    9 |       WRITE(*,2) 'GC_50 = ', GC_50(1)
      |                                  1
Error: PROCEDURE attribute conflicts with COMMON attribute in ‘gc_50’ at (1)
coupl_write.inc:10:34:

   10 |       WRITE(*,2) 'GC_59 = ', GC_59(1)
      |                                  1
Error: PROCEDURE attribute conflicts with COMMON attribute in ‘gc_59’ at (1)
…builds ok!

But it fails at runtime,
oliviermattelaer/mg5amc_test#13

*** EXECUTE MADEVENT (create results.dat) ***
--------------------
2048 1 1 ! Number of events and max and min iterations
0.000001 ! Accuracy (ignored because max iterations = min iterations)
0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present)
0 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement)
0 ! Helicity Sum/event 0=exact
1 ! Channel number for single-diagram enhancement multi-channel (IGNORED as suppress amplitude is 0?)
--------------------
Executing ' ./madevent < /tmp/avalassi/tmp.KAEy48Pdo5_fortran > /tmp/avalassi/tmp.Gkw3eKB3xm'
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG
 [XSECTION] ERROR! No cross section in log file:
   /tmp/avalassi/tmp.Gkw3eKB3xm
   ...
xqcutij # 3>     0.0     0.0
 Added good helicity            1  0.13852041978098753       in event            1 local:           1
 Added good helicity            4   6.9350349637150188       in event            1 local:           1
 Added good helicity           13   8.7879241967230062       in event            1 local:           1
 Added good helicity           16  0.13852041978098747       in event            1 local:           1
 RESET CUMULATIVE VARIABLE
 RESET CUMULATIVE VARIABLE
        1024  points passed the cut but all returned zero
 therefore considering this contribution as zero
@valassi valassi linked an issue May 19, 2022 that may be closed by this pull request
valassi added 11 commits May 19, 2022 20:10
Fix conflicts: keep codegen logs from origin/fvsc
STARTED  AT Thu May 19 20:20:35 CEST 2022
./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttg -ggttgg -ggttggg
ENDED(1) AT Thu May 19 22:31:12 CEST 2022 [Status=0]
./tput/teeThroughputX.sh -mad -flt -hrd -makej -makeclean -eemumu -ggtt -ggttgg -inlonly
ENDED(2) AT Thu May 19 22:44:12 CEST 2022 [Status=0]
./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -bridge
ENDED(3) AT Thu May 19 22:48:12 CEST 2022 [Status=0]
./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Thu May 19 22:50:41 CEST 2022 [Status=0]
./tput/teeThroughputX.sh -mad -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Thu May 19 22:53:04 CEST 2022 [Status=0]
@valassi valassi changed the title WIP - progress on automatic Fortran vs cuda/cpp comparisons First patch for automatic Fortran vs cuda/cpp comparisons May 20, 2022
@valassi valassi marked this pull request as ready for review May 20, 2022 13:05
@valassi
Copy link
Member Author

valassi commented May 20, 2022

I mark this as ready and will soon merge it.

There are very many things that still need to be done, but many things are done, including (not exhaustive list)

  • I have madX.sh and teeMadX.sh scripts which generate some logs with interesting numbers, both for physics values and for computing performance (see an example below). They include some process-dependent tuning of how many events should be executed through madevent (with fortran, cuda, cpp MEs) and also of how many events per loop should be used.
  • There is some initial progress at running many more GPU events via Madevent (see In Fortran MadEvent, use larger arrays (beyond 16 events, up to 16k) to allow efficient usage of GPUs #455). However there are too many arrays in Fortran that need to be dimensioned this way, and overall I do not manage to go beyond 16k events per grid, else it segfaults (I guess it is out of memory)
  • NB the code is still doing the main ME calculation in Fortran, with cudacpp as only a cross check. We need multichannel and the full API with color/helicity choice to go beyond this step.

Amongst the things that still need work (non exhaustive list)

@valassi
Copy link
Member Author

valassi commented May 20, 2022

All tests passed, I am self merging

@valassi valassi merged commit ca21d57 into madgraph5:master May 20, 2022
@valassi
Copy link
Member Author

valassi commented May 20, 2022

PS Example of a tmad log

https://github.com/madgraph5/madgraph4gpu/blob/d1fd8b201803b366579f925017f09a85001ce9f8/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt

DATE: 2022-05-20_15:00:32

Working directory: /data/avalassi/GPU2020/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg

*** EXECUTE MADEVENT (create results.dat) ***
 [XSECTION] Cross section = 80.03
 [COUNTERS] PROGRAM TOTAL          :    4.2070s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.2606s
 [COUNTERS] Fortran MEs      ( 1 ) :    3.9463s for       96 events => throughput is 2.43E+01 events/s

*** EXECUTE CMADEVENT_CUDACPP (create results.dat) ***
 [XSECTION] Cross section = 80.03
 [MERATIOS] ME ratio CudaCpp/Fortran: MIN = 0.99997121 = 1 - 2.9e-05
 [MERATIOS] ME ratio CudaCpp/Fortran: MAX = 1.00014921 = 1 + 0.00015
 [MERATIOS] ME ratio CudaCpp/Fortran: NENTRIES =     96
 [MERATIOS] ME ratio CudaCpp/Fortran - 1: AVG = 1.3e-05 +- 2.9e-06
 [COUNTERS] PROGRAM TOTAL          :    4.2596s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.2727s
 [COUNTERS] Fortran MEs      ( 1 ) :    3.5988s for       96 events => throughput is 2.67E+01 events/s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    0.3881s for       96 events => throughput is 2.47E+02 events/s

*** EXECUTE CHECK -p 2 32 1 --bridge ***
Process                     = SIGMA_SM_GG_TTXGGG_CPP [gcc 10.2.0] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CPP:DBL+CXS:CURHST+RMBHST+BRDHST/512y+CXVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.888841e+02                 )  sec^-1

*** EXECUTE CHECK -p 2 32 1 ***
Process                     = SIGMA_SM_GG_TTXGGG_CPP [gcc 10.2.0] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CPP:DBL+CXS:CURHST+RMBHST+MESHST/512y+CXVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.892185e+02                 )  sec^-1

*** EXECUTE GMADEVENT_CUDACPP (create results.dat) ***
 [XSECTION] Cross section = 404.9
 [MERATIOS] ME ratio CudaCpp/Fortran: MIN = 0.99997121 = 1 - 2.9e-05
 [MERATIOS] ME ratio CudaCpp/Fortran: MAX = 1.00007834 = 1 + 7.8e-05
 [MERATIOS] ME ratio CudaCpp/Fortran: NENTRIES =     64
 [MERATIOS] ME ratio CudaCpp/Fortran - 1: AVG = 1.1e-05 +- 2.9e-06
 [COUNTERS] PROGRAM TOTAL          :    3.6001s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.5031s
 [COUNTERS] Fortran MEs      ( 1 ) :    2.4043s for       64 events => throughput is 2.66E+01 events/s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    0.6928s for       64 events => throughput is 9.24E+01 events/s

*** EXECUTE GCHECK -p 2 32 1 --bridge ***
Process                     = SIGMA_SM_GG_TTXGGG_CUDA [nvcc 11.6.124 (gcc 10.2.0)] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CUD:DBL+THX:CURHST+RMBHST+BRDDEV/none+NAVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.032552e+02                 )  sec^-1

*** EXECUTE GCHECK -p 2 32 1 ***
Process                     = SIGMA_SM_GG_TTXGGG_CUDA [nvcc 11.6.124 (gcc 10.2.0)] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CUD:DBL+THX:CURDEV+RMBDEV+MESDEV/none+NAVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.367983e+02                 )  sec^-1

TEST COMPLETED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant