Skip to content

MPI_Intercomm_create leaks memory even with symmetric MPI_Comm_free calls #12019

Open
@oj-lappi

Description

@oj-lappi

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

4.1.6 (also verified on 4.0.3)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Debians libopenmpi-dev package (also verified on Ubuntu)

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: Debian in a docker container, debian:unstable-20211011 (also verified on Ubuntu host)
  • Computer hardware: x86_64, Intel i5-8500 CPU, Intel 8th Gen Core Processor Host Bridge
  • Network type: MPI ranks all running on one machine

Details of the problem

MPI_Intercomm_create seems to leak memory, calling MPI_Comm_free on an intercomm does not free all memory allocated for the intercomm.

Here is a minimal example which you can build to see the issue:

#include <cstdlib>
#include <mpi.h>

int
main(int  /*argc*/, char ** /*argv*/)
{
    MPI_Init(nullptr, nullptr);
    int rank =-1;
    int world_size = -1;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    if (world_size % 2 != 0){
        exit(1);
    }

    int partner = (world_size + ((rank % 2 == 0)? rank + 1 : rank - 1)) % world_size;

    MPI_Comm comm = MPI_COMM_NULL;

    constexpr int iterations = 100000;
    for (int i = 0; i < iterations; i++){
        MPI_Intercomm_create(MPI_COMM_SELF, 0, MPI_COMM_WORLD, partner, i, &comm);
        MPI_Comm_free(&comm);
    }
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();
}

If you just watch the resident set size of two processes executing this, you will see them increasing linearly in time, with one process allocating roughly double what the other one does.

Compiling this with clang++-16 and -fsanitize=address, and running mpirun -n 2 a.out gives (among other minor memory leak output), the following output for iterations=100000:

... one rank
Direct leak of 6400416 byte(s) in 200013 object(s) allocated from:
    #0 0x55dde7e7afb2 in malloc (/project/build/test/mpi/mpi_intercomm_memleak+0xb9fb2) (BuildId: 46971416e65faae54e6870e4db559c394b3d131d)
    #1 0x7f6af13cbc12  (<unknown module>)

Direct leak of 1300013 byte(s) in 100001 object(s) allocated from:
    #0 0x55dde7e7afb2 in malloc (/project/build/test/mpi/mpi_intercomm_memleak+0xb9fb2) (BuildId: 46971416e65faae54e6870e4db559c394b3d131d)
    #1 0x7f6af4d24017 in __vasprintf_internal libio/vasprintf.c:116:16
    #2 0x9abaa5f78902caff  (<unknown module>)
...

SUMMARY: AddressSanitizer: 8142121 byte(s) leaked in 400523 allocation(s).


... other rank
Direct leak of 3200384 byte(s) in 100012 object(s) allocated from:
    #0 0x5632a7758fb2 in malloc (/project/build/test/mpi/mpi_intercomm_memleak+0xb9fb2) (BuildId: 46971416e65faae54e6870e4db559c394b3d131d)
    #1 0x7f01091cbc12  (<unknown module>)


SUMMARY: AddressSanitizer: 3242072 byte(s) leaked in 100520 allocation(s).

I find the inclusion of vasprintf most surprising in the stack traces.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions