btl_tcp_endpoint open-mpi error in ufs-weather-model regression test #1313
Open
Description
Describe the bug
A GNU debug version of a ufs-weather-model regression test in development for GEFS fails in initialization with error of
[../../../../../opal/mca/btl/tcp/btl_tcp_endpoint.c:730:mca_btl_tcp_endpoint_start_connect] bind on local address (removed) failed: Address already in use (98).
This seems similar to an existing open-mpi issue (open-mpi/ompi#7246) and something to do with use of all available ports.
It would be nice to confirm that is indeed the issue and resolve if possible (maybe change # of tasks or ports?)
To Reproduce
Try to run gnu cpld_debug_gefs regression test on Hera:
git clone https://github.com/NickSzapiro-NOAA/ufs-weather-model/tree/RT_bmark_gefs
cd ufs-weather-model
git checkout RT_bmark_gefs
git submodule update --init --recursive
cd tests
./rt.sh -a {ACCT} -n "cpld_debug_gefs gnu"
Expected behavior
Regression test should run to completion
System:
Hera
Additional context
As this seems like an issue involving open-mpi, NOAA RDHPCS help desk suggested making an issue here