Skip to content

tmad test crashes in rotxxx (SIGFPE erroneous arithmetic operation) #855

@valassi

Description

@valassi

"tmad test crashes for some iconfig (channel/iconfig mapping issues and SIGFPE erroneous arithmetic operation)"

Hi @oliviermattelaer this is a follow up to the discussions in #826 and PR #853.

I prefer to open this as a clean issue and investigate this independently of SUSY, or in any case of zero cross section #826.

In these discussions from your patch #853 I realised that we risk having a MAJOR problem not only for BSM but also for SM, namely: all of my 'tmad' tests test only iconfig=1. These were ok so far (in some cases by luck maybe), but for different iconfig (i.e. if we put a number different from 1 in the input_app.txt piped to madevent.

Indeed I found a crash on the first test I executed, ggttgg with iconfig=104.

 ./tmad/madX.sh -ggttgg -iconfig 104
...
On itscrd90.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: 1x Tesla V100S-PCIE-32GB]:
Working directory (run): /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg

*** (1) EXECUTE MADEVENT_FORTRAN (create results.dat) ***
 [OPENMPTH] omp_get_max_threads/nproc = 1/4
 [NGOODHEL] ngoodhel/ncomb = 64/64
 [XSECTION] VECSIZE_USED = 8192
 [XSECTION] MultiChannel = TRUE
 [XSECTION] Configuration = 104
 [XSECTION] ChannelId = 112
 [XSECTION] Cross section = 0.4632 [0.46320556621222242] fbridge_mode=0
 [UNWEIGHT] Wrote 11 events (found 187 events)
 [COUNTERS] PROGRAM TOTAL          :    4.4430s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.2478s
 [COUNTERS] Fortran MEs      ( 1 ) :    4.1953s for     8192 events => throughput is 1.95E+03 events/s

*** (1) EXECUTE MADEVENT_FORTRAN x1 (create events.lhe) ***
 [OPENMPTH] omp_get_max_threads/nproc = 1/4
 [NGOODHEL] ngoodhel/ncomb = 64/64
 [XSECTION] VECSIZE_USED = 8192
 [XSECTION] MultiChannel = TRUE
 [XSECTION] Configuration = 104
 [XSECTION] ChannelId = 112
 [XSECTION] Cross section = 0.4632 [0.46320556621222242] fbridge_mode=0
 [UNWEIGHT] Wrote 11 events (found 168 events)
 [COUNTERS] PROGRAM TOTAL          :    4.4488s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.2487s
 [COUNTERS] Fortran MEs      ( 1 ) :    4.2002s for     8192 events => throughput is 1.95E+03 events/s

*** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) ***

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7effbd423860 in ???
#1  0x7effbd422a05 in ???
#2  0x7effbd054def in ???
#3  0x44b5ff in ???
#4  0x4087df in ???
#5  0x409848 in ???
#6  0x40bb83 in ???
#7  0x40d1a9 in ???
#8  0x45c804 in ???
#9  0x434269 in ???
#10  0x40371e in ???
#11  0x7effbd03feaf in ???
#12  0x7effbd03ff5f in ???
#13  0x403844 in ???
#14  0xffffffffffffffff in ???
./tmad/madX.sh: line 387: 780951 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp}
ERROR! ' ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttgg_x1_cudacpp > /tmp/avalassi/output_ggttgg_x1_cudacpp' failed

This uses a sightly modified script, I will pur it in a PR.

I guess that the solution goes through what you proposed in #852 and the additional modifications you and I discussed there.

(Note: the 'tlau' tests that I proposed in July last year just before my absence were supposed to test exactly this (see #711), i.e. test all possible iconfig at the same time in a user-like enviornment, for all processes, but using a short manageable time. I continue to think that allowing the possibility to run shorter generate_events tests is necessary to allow better testing. There was disagreement last year, I hope we can come back and agree on this).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions