Open
Description
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
4.1.6 (also verified on 4.0.3)
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Debians libopenmpi-dev package (also verified on Ubuntu)
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
Please describe the system on which you are running
- Operating system/version: Debian in a docker container, debian:unstable-20211011 (also verified on Ubuntu host)
- Computer hardware: x86_64, Intel i5-8500 CPU, Intel 8th Gen Core Processor Host Bridge
- Network type: MPI ranks all running on one machine
Details of the problem
MPI_Intercomm_create seems to leak memory, calling MPI_Comm_free on an intercomm does not free all memory allocated for the intercomm.
Here is a minimal example which you can build to see the issue:
#include <cstdlib>
#include <mpi.h>
int
main(int /*argc*/, char ** /*argv*/)
{
MPI_Init(nullptr, nullptr);
int rank =-1;
int world_size = -1;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
if (world_size % 2 != 0){
exit(1);
}
int partner = (world_size + ((rank % 2 == 0)? rank + 1 : rank - 1)) % world_size;
MPI_Comm comm = MPI_COMM_NULL;
constexpr int iterations = 100000;
for (int i = 0; i < iterations; i++){
MPI_Intercomm_create(MPI_COMM_SELF, 0, MPI_COMM_WORLD, partner, i, &comm);
MPI_Comm_free(&comm);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}
If you just watch the resident set size of two processes executing this, you will see them increasing linearly in time, with one process allocating roughly double what the other one does.
Compiling this with clang++-16 and -fsanitize=address, and running mpirun -n 2 a.out
gives (among other minor memory leak output), the following output for iterations=100000:
... one rank
Direct leak of 6400416 byte(s) in 200013 object(s) allocated from:
#0 0x55dde7e7afb2 in malloc (/project/build/test/mpi/mpi_intercomm_memleak+0xb9fb2) (BuildId: 46971416e65faae54e6870e4db559c394b3d131d)
#1 0x7f6af13cbc12 (<unknown module>)
Direct leak of 1300013 byte(s) in 100001 object(s) allocated from:
#0 0x55dde7e7afb2 in malloc (/project/build/test/mpi/mpi_intercomm_memleak+0xb9fb2) (BuildId: 46971416e65faae54e6870e4db559c394b3d131d)
#1 0x7f6af4d24017 in __vasprintf_internal libio/vasprintf.c:116:16
#2 0x9abaa5f78902caff (<unknown module>)
...
SUMMARY: AddressSanitizer: 8142121 byte(s) leaked in 400523 allocation(s).
... other rank
Direct leak of 3200384 byte(s) in 100012 object(s) allocated from:
#0 0x5632a7758fb2 in malloc (/project/build/test/mpi/mpi_intercomm_memleak+0xb9fb2) (BuildId: 46971416e65faae54e6870e4db559c394b3d131d)
#1 0x7f01091cbc12 (<unknown module>)
SUMMARY: AddressSanitizer: 3242072 byte(s) leaked in 100520 allocation(s).
I find the inclusion of vasprintf most surprising in the stack traces.