Skip to content

btl_tcp_endpoint open-mpi error in ufs-weather-model regression test #1313

Open
@NickSzapiro-NOAA

Description

Describe the bug
A GNU debug version of a ufs-weather-model regression test in development for GEFS fails in initialization with error of

[../../../../../opal/mca/btl/tcp/btl_tcp_endpoint.c:730:mca_btl_tcp_endpoint_start_connect] bind on local address (removed) failed: Address already in use (98). 

This seems similar to an existing open-mpi issue (open-mpi/ompi#7246) and something to do with use of all available ports.

It would be nice to confirm that is indeed the issue and resolve if possible (maybe change # of tasks or ports?)

To Reproduce
Try to run gnu cpld_debug_gefs regression test on Hera:

git clone https://github.com/NickSzapiro-NOAA/ufs-weather-model/tree/RT_bmark_gefs
cd ufs-weather-model
git checkout RT_bmark_gefs
git submodule update --init --recursive
cd tests
./rt.sh -a {ACCT} -n "cpld_debug_gefs gnu"

Expected behavior
Regression test should run to completion

System:
Hera

Additional context
As this seems like an issue involving open-mpi, NOAA RDHPCS help desk suggested making an issue here

Metadata

Labels

OAR-EPICNOAA Oceanic and Atmospheric Research and Earth Prediction Innovation CenterbugSomething is not working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions