pmix dstore esh bus error in orted on cray in dvm mode

With the latest master, in dvm mode, after running around a couple of thousand tasks I repeatedly run into the following:

```
Core was generated by `orted -mca orte_debug "1" -mca orte_debug_daemons "1" --hnp-topo-sig 0N:1S:1L3:'.
Program terminated with signal 7, Bus error.
(gdb) bt
#0  0x00002aaaadcaa85b in __memset_sse2 () from /lib64/libc.so.6
#1  0x00002aaaac2b2718 in _create_new_segment (type=NS_META_SEGMENT, ns_map=0x2aaaae3a76d0, id=0)
    at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c:1456
#2  0x00002aaaac2b2bcb in _update_ns_elem (ns_elem=0x2aaab3b3bde0, info=0x2aaaae3a76d0)
    at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c:1550
#3  0x00002aaaac2afce1 in _esh_store (nspace=0xe2fb2c8 "528289639", rank=4294967294, kv=0xdf52210)
    at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c:918
#4  0x00002aaaac2ad2cb in pmix_dstore_store (nspace=0xe2fb2c8 "528289639", rank=4294967294, kv=0xdf52210)
    at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_dstore.c:66
#5  0x00002aaaac28fa9d in _rank_key_dstore_store (cbdata=0xe01e8c0) at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/common/pmix_jobdata.c:96
#6  0x00002aaaac2917a7 in _job_data_store (nspace=0xd1aca68 "528289639", cbdata=0xe01e8c0)
    at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/common/pmix_jobdata.c:386
#7  0x00002aaaac28fc2c in pmix_job_data_dstore_store (nspace=0xd1aca68 "528289639", bptr=0xe01cf80)
    at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/common/pmix_jobdata.c:118
#8  0x00002aaaac2658ff in _register_nspace (sd=-1, args=4, cbdata=0xd1ac9b0) at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/server/pmix_server.c:459
#9  0x00002aaaab1e4151 in event_process_active_single_queue (activeq=0x7079c0, base=0x707730)
    at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/event/libevent2022/libevent/event.c:1370
#10 event_process_active (base=<optimized out>) at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/event/libevent2022/libevent/event.c:1440
#11 opal_libevent2022_event_base_loop (base=0x707730, flags=1) at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/event/libevent2022/libevent/event.c:1644
#12 0x00002aaaac2b7efa in progress_engine (obj=0x7072d8) at /ccs/home/marksant1/openmpi/src/ompi/opal/mca/pmix/pmix2x/pmix/src/runtime/pmix_progress_threads.c:149
#13 0x00002aaaada0e806 in start_thread () from /lib64/libpthread.so.0
#14 0x00002aaaadd029bd in clone () from /lib64/libc.so.6
#15 0x0000000000000000 in ?? ()
```

Will dig further, but increasing the set of eyes looking at it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pmix dstore esh bus error in orted on cray in dvm mode #2737

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pmix dstore esh bus error in orted on cray in dvm mode #2737

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions