Closed
Description
With the latest master, in dvm mode, after running around a couple of thousand tasks I repeatedly run into the following:
Core was generated by `orted -mca orte_debug "1" -mca orte_debug_daemons "1" --hnp-topo-sig 0N:1S:1L3:'.
Program terminated with signal 7, Bus error.
(gdb) bt
#0 0x00002aaaadcaa85b in __memset_sse2 () from /lib64/libc.so.6
#1 0x00002aaaac2b2718 in _create_new_segment (type=NS_META_SEGMENT, ns_map=0x2aaaae3a76d0, id=0)
at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c:1456
#2 0x00002aaaac2b2bcb in _update_ns_elem (ns_elem=0x2aaab3b3bde0, info=0x2aaaae3a76d0)
at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c:1550
#3 0x00002aaaac2afce1 in _esh_store (nspace=0xe2fb2c8 "528289639", rank=4294967294, kv=0xdf52210)
at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c:918
#4 0x00002aaaac2ad2cb in pmix_dstore_store (nspace=0xe2fb2c8 "528289639", rank=4294967294, kv=0xdf52210)
at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_dstore.c:66
#5 0x00002aaaac28fa9d in _rank_key_dstore_store (cbdata=0xe01e8c0) at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/common/pmix_jobdata.c:96
#6 0x00002aaaac2917a7 in _job_data_store (nspace=0xd1aca68 "528289639", cbdata=0xe01e8c0)
at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/common/pmix_jobdata.c:386
#7 0x00002aaaac28fc2c in pmix_job_data_dstore_store (nspace=0xd1aca68 "528289639", bptr=0xe01cf80)
at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/common/pmix_jobdata.c:118
#8 0x00002aaaac2658ff in _register_nspace (sd=-1, args=4, cbdata=0xd1ac9b0) at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/server/pmix_server.c:459
#9 0x00002aaaab1e4151 in event_process_active_single_queue (activeq=0x7079c0, base=0x707730)
at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/event/libevent2022/libevent/event.c:1370
#10 event_process_active (base=<optimized out>) at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/event/libevent2022/libevent/event.c:1440
#11 opal_libevent2022_event_base_loop (base=0x707730, flags=1) at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/event/libevent2022/libevent/event.c:1644
#12 0x00002aaaac2b7efa in progress_engine (obj=0x7072d8) at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/runtime/pmix_progress_threads.c:149
#13 0x00002aaaada0e806 in start_thread () from /lib64/libpthread.so.0
#14 0x00002aaaadd029bd in clone () from /lib64/libc.so.6
#15 0x0000000000000000 in ?? ()
Will dig further, but increasing the set of eyes looking at it.
Metadata
Metadata
Assignees
Labels
No labels