-
Notifications
You must be signed in to change notification settings - Fork 37
Closed
Description
This is a followup to PR #454 where I am doing some first tests (with logs) of madevent+cudacpp MEs, comparing physics and performance to madevent and to cudacpp.
I see that the cudacpp ME calculation is faster in standalone (bridge!) mode than in madevent. IN principle I thought the standalone bridge mode was exactly equivalent?
- There is most likely an issue with helicity filtering, which should be excluded from the measurements
- But there may be more than just that, to be understood
- (Note that in principle both are using the same CUDA grids, so any slowdown - In Fortran MadEvent, use larger arrays (beyone 16k) to allow efficient usage of GPU #460 - from using too small CUDA grids should not be here)
DATE: 2022-05-20_15:00:32
Working directory: /data/avalassi/GPU2020/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg
*** EXECUTE MADEVENT (create results.dat) ***
[XSECTION] Cross section = 80.03
[COUNTERS] PROGRAM TOTAL : 4.2070s
[COUNTERS] Fortran Overhead ( 0 ) : 0.2606s
[COUNTERS] Fortran MEs ( 1 ) : 3.9463s for 96 events => throughput is 2.43E+01 events/s
*** EXECUTE CMADEVENT_CUDACPP (create results.dat) ***
[XSECTION] Cross section = 80.03
[MERATIOS] ME ratio CudaCpp/Fortran: MIN = 0.99997121 = 1 - 2.9e-05
[MERATIOS] ME ratio CudaCpp/Fortran: MAX = 1.00014921 = 1 + 0.00015
[MERATIOS] ME ratio CudaCpp/Fortran: NENTRIES = 96
[MERATIOS] ME ratio CudaCpp/Fortran - 1: AVG = 1.3e-05 +- 2.9e-06
[COUNTERS] PROGRAM TOTAL : 4.2596s
[COUNTERS] Fortran Overhead ( 0 ) : 0.2727s
[COUNTERS] Fortran MEs ( 1 ) : 3.5988s for 96 events => throughput is 2.67E+01 events/s
[COUNTERS] CudaCpp MEs ( 2 ) : 0.3881s for 96 events => throughput is 2.47E+02 events/s
*** EXECUTE CHECK -p 2 32 1 --bridge ***
Process = SIGMA_SM_GG_TTXGGG_CPP [gcc 10.2.0] [inlineHel=0] [hardcodePARAM=0]
Workflow summary = CPP:DBL+CXS:CURHST+RMBHST+BRDHST/512y+CXVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.888841e+02 ) sec^-1
*** EXECUTE CHECK -p 2 32 1 ***
Process = SIGMA_SM_GG_TTXGGG_CPP [gcc 10.2.0] [inlineHel=0] [hardcodePARAM=0]
Workflow summary = CPP:DBL+CXS:CURHST+RMBHST+MESHST/512y+CXVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.892185e+02 ) sec^-1
*** EXECUTE GMADEVENT_CUDACPP (create results.dat) ***
[XSECTION] Cross section = 404.9
[MERATIOS] ME ratio CudaCpp/Fortran: MIN = 0.99997121 = 1 - 2.9e-05
[MERATIOS] ME ratio CudaCpp/Fortran: MAX = 1.00007834 = 1 + 7.8e-05
[MERATIOS] ME ratio CudaCpp/Fortran: NENTRIES = 64
[MERATIOS] ME ratio CudaCpp/Fortran - 1: AVG = 1.1e-05 +- 2.9e-06
[COUNTERS] PROGRAM TOTAL : 3.6001s
[COUNTERS] Fortran Overhead ( 0 ) : 0.5031s
[COUNTERS] Fortran MEs ( 1 ) : 2.4043s for 64 events => throughput is 2.66E+01 events/s
[COUNTERS] CudaCpp MEs ( 2 ) : 0.6928s for 64 events => throughput is 9.24E+01 events/s
*** EXECUTE GCHECK -p 2 32 1 --bridge ***
Process = SIGMA_SM_GG_TTXGGG_CUDA [nvcc 11.6.124 (gcc 10.2.0)] [inlineHel=0] [hardcodePARAM=0]
Workflow summary = CUD:DBL+THX:CURHST+RMBHST+BRDDEV/none+NAVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.032552e+02 ) sec^-1
*** EXECUTE GCHECK -p 2 32 1 ***
Process = SIGMA_SM_GG_TTXGGG_CUDA [nvcc 11.6.124 (gcc 10.2.0)] [inlineHel=0] [hardcodePARAM=0]
Workflow summary = CUD:DBL+THX:CURDEV+RMBDEV+MESDEV/none+NAVBRK
EvtsPerSec[MECalcOnly] (3a) = ( 2.367983e+02 ) sec^-1
TEST COMPLETED
This is ggttggg. In Madevent+cuda I get 9.2E01 events per second. In standalone cuda (bridge mode) I get 2.0E2 events per second. The factor two is probably just helicity filtering?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels