Understand performance difference in cudacpp between SA and madevent

This is a followup to PR #454 where I am doing some first tests (with logs) of madevent+cudacpp MEs, comparing physics and performance to madevent and to cudacpp.

I see that the cudacpp ME calculation is faster in standalone (bridge!) mode than in madevent. IN principle I thought the standalone bridge mode was exactly equivalent? 
- There is most likely an issue with helicity filtering, which should be excluded from the measurements
- But there may be more than just that, to be understood
- (Note that in principle both are using the same CUDA grids, so any slowdown - #460 - from using too small CUDA grids should not be here)

See for instance
https://github.com/madgraph5/madgraph4gpu/blob/d1fd8b201803b366579f925017f09a85001ce9f8/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
```
DATE: 2022-05-20_15:00:32

Working directory: /data/avalassi/GPU2020/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg

*** EXECUTE MADEVENT (create results.dat) ***
 [XSECTION] Cross section = 80.03
 [COUNTERS] PROGRAM TOTAL          :    4.2070s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.2606s
 [COUNTERS] Fortran MEs      ( 1 ) :    3.9463s for       96 events => throughput is 2.43E+01 events/s

*** EXECUTE CMADEVENT_CUDACPP (create results.dat) ***
 [XSECTION] Cross section = 80.03
 [MERATIOS] ME ratio CudaCpp/Fortran: MIN = 0.99997121 = 1 - 2.9e-05
 [MERATIOS] ME ratio CudaCpp/Fortran: MAX = 1.00014921 = 1 + 0.00015
 [MERATIOS] ME ratio CudaCpp/Fortran: NENTRIES =     96
 [MERATIOS] ME ratio CudaCpp/Fortran - 1: AVG = 1.3e-05 +- 2.9e-06
 [COUNTERS] PROGRAM TOTAL          :    4.2596s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.2727s
 [COUNTERS] Fortran MEs      ( 1 ) :    3.5988s for       96 events => throughput is 2.67E+01 events/s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    0.3881s for       96 events => throughput is 2.47E+02 events/s

*** EXECUTE CHECK -p 2 32 1 --bridge ***
Process                     = SIGMA_SM_GG_TTXGGG_CPP [gcc 10.2.0] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CPP:DBL+CXS:CURHST+RMBHST+BRDHST/512y+CXVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.888841e+02                 )  sec^-1

*** EXECUTE CHECK -p 2 32 1 ***
Process                     = SIGMA_SM_GG_TTXGGG_CPP [gcc 10.2.0] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CPP:DBL+CXS:CURHST+RMBHST+MESHST/512y+CXVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.892185e+02                 )  sec^-1

*** EXECUTE GMADEVENT_CUDACPP (create results.dat) ***
 [XSECTION] Cross section = 404.9
 [MERATIOS] ME ratio CudaCpp/Fortran: MIN = 0.99997121 = 1 - 2.9e-05
 [MERATIOS] ME ratio CudaCpp/Fortran: MAX = 1.00007834 = 1 + 7.8e-05
 [MERATIOS] ME ratio CudaCpp/Fortran: NENTRIES =     64
 [MERATIOS] ME ratio CudaCpp/Fortran - 1: AVG = 1.1e-05 +- 2.9e-06
 [COUNTERS] PROGRAM TOTAL          :    3.6001s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.5031s
 [COUNTERS] Fortran MEs      ( 1 ) :    2.4043s for       64 events => throughput is 2.66E+01 events/s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    0.6928s for       64 events => throughput is 9.24E+01 events/s

*** EXECUTE GCHECK -p 2 32 1 --bridge ***
Process                     = SIGMA_SM_GG_TTXGGG_CUDA [nvcc 11.6.124 (gcc 10.2.0)] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CUD:DBL+THX:CURHST+RMBHST+BRDDEV/none+NAVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.032552e+02                 )  sec^-1

*** EXECUTE GCHECK -p 2 32 1 ***
Process                     = SIGMA_SM_GG_TTXGGG_CUDA [nvcc 11.6.124 (gcc 10.2.0)] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CUD:DBL+THX:CURDEV+RMBDEV+MESDEV/none+NAVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.367983e+02                 )  sec^-1

TEST COMPLETED
```

This is ggttggg. In Madevent+cuda I get 9.2E01 events per second. In standalone cuda (bridge mode) I get 2.0E2 events per second. The factor two is probably just helicity filtering? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understand performance difference in cudacpp between SA and madevent #461

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Understand performance difference in cudacpp between SA and madevent #461

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions