Closed
Description
A customer is looking into a memory leak issue using MPI RMA in both Open MPI and MPICH. I've been looking in to the problem on the Open MPI side and one thing that valgrind finds is that the acoll component is leaking about 728 bytes per MPI_Win_create/MPI_Win_free set of calls:
=56433== 7,280 bytes in 10 blocks are definitely lost in loss record 334 of 347
==56433== at 0x48388E4: malloc (vg_replace_malloc.c:446)
==56433== by 0x4A53B1D: opal_obj_new (opal_object.h:495)
==56433== by 0x4A53A22: opal_obj_new_debug (opal_object.h:256)
==56433== by 0x4A53BCC: mca_coll_acoll_comm_query (coll_acoll_module.c:63)
==56433== by 0x49D1219: query_3_0_0 (coll_base_comm_select.c:588)
==56433== by 0x49D11DD: query (coll_base_comm_select.c:571)
==56433== by 0x49D10D0: check_one_component (coll_base_comm_select.c:536)
==56433== by 0x49D0C85: check_components (coll_base_comm_select.c:456)
==56433== by 0x49CFB04: mca_coll_base_comm_select (coll_base_comm_select.c:233)
==56433== by 0x48CD150: ompi_comm_activate_complete (comm_cid.c:923)
==56433== by 0x48CE02C: ompi_comm_activate_nb_complete (comm_cid.c:1119)
==56433== by 0x48D1903: ompi_comm_request_progress (comm_request.c:154)
==56433==
The acoll module destructor is getting invoked but it looks pretty complex and there's probably something not being freed.
I was using the share/openmpi/openmpi-valgrind.supp suppression file.
Metadata
Metadata
Assignees
Labels
No labels