Skip to content

mip4py: Regression in GitHub Actions #12857

Closed
openpmix/prrte
#2043
@dalcinl

Description

@dalcinl

mpi4py 's own nightly CI testing on GHA has been failing with ompi@main in the last three days.
However, I understand that OMPI's own CI had not failed, otherwise you would not have merged in the regression.

These are the full logs from the last failed run:
https://github.com/mpi4py/mpi4py-testing/actions/runs/11310080041/job/31454714552
This is the specific error, not the message prte-rmaps-base:all-available-resources-used in the output:

test_async_error_callback (test_util_pool.TestProcessPool.test_async_error_callback) ... 1 more process has sent help message help-prte-rmaps-base.txt / prte-rmaps-base:all-available-resources-used
Exception in thread Thread-7 (_manager_spawn):
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/threading.py", line 1012, in run
    self._target(*self._args, **self._kwargs)
  File "/home/runner/work/mpi4py-testing/mpi4py-testing/build/lib.linux-x86_64-cpython-312/mpi4py/futures/_core.py", line 350, in _manager_spawn
    comm = serialized(client_spawn)(pyexe, pyargs, nprocs, info)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/mpi4py-testing/mpi4py-testing/build/lib.linux-x86_64-cpython-312/mpi4py/futures/_core.py", line 1058, in client_spawn
    comm = MPI.COMM_SELF.Spawn(python_exe, args, max_workers, info)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/mpi4py/MPI.src/Comm.pyx", line 2544, in mpi4py.MPI.Intracomm.Spawn
    with nogil: CHKERR( MPI_Comm_spawn(
mpi4py.MPI.Exception: MPI_ERR_UNKNOWN: unknown error

On closer inspection, I noticed a difference between mpi4py and open-mpi configuration:

mpi4py configures oversubscription via $HOME/.openmpi/mca-params.conf
https://github.com/mpi4py/mpi4py-testing/blob/master/.github/workflows/openmpi.yml#L101

Open MPI configures oversubscription in both $HOME/.openmpi/mca-params.conf and $HOME/.prte/mca-params.conf
https://github.com/open-mpi/ompi/blob/main/.github/workflows/ompi_mpi4py.yaml#L80

Looks like something has changed recently, and oversubscription settings in $HOME/.openmpi/mca-params.conf are being ignored. Was this change intentional or is it a regression?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions