pytest-plugin: support openmpi #17

qbisi · 2025-04-16T22:55:19Z

This commit add compatiblity to run pytest with Open MPI implemention.

connorjward

Thanks for this! I'm away from my computer until next Wednesday so can't give a proper review until then. I have some initial comments:

I don't think there needs to be a new module and global variable for the distribution type. Just having a free function that we call in the first appropriate pytest hook in plugin.py would be fine.
I prefer enums to strings for this sort of thing.

I'm also a little surprised that this works. Do you have a real case where the "outer" process does not call MPI_Init?

qbisi · 2025-04-18T09:06:32Z

I am newbie in writting python module. This PR is more than just an issue—a request for compatibility with OpenMPI.
I can have a try taking your advice.

I use this patched mpi-pytest plugin doing pytest check of firedrakeProject(compiled with openmpi).

The check command is

python -m pytest -n auto tests

qbisi · 2025-04-18T09:10:23Z

I don't think there needs to be a new module and global variable for the distribution type

How can i save the mpiexec implemention type as a static variable. Cause runing mpiexec --version before each test is a waste of time.

connorjward · 2025-04-18T09:45:08Z

I use this patched mpi-pytest plugin doing pytest check of firedrakeProject(compiled with openmpi).

Ah. In that case I don't think that this will actually work. To run with OpenMPI you need to run with pytest "on the inside". For example:

$ mpiexec -n 3 pytest tests -m parallel[3]

This is detailed in the README.

Have you tested that your PR works? I would be very surprised (but happy) if it actually runs.

qbisi · 2025-04-18T10:41:44Z

A short terminal catch

qbisi@x79 ~/firedrake (remotes/origin/JHopeCollins/more-apt-petsc-packages~3^2) $ python -m pytest tests/firedrake/regression/test_adv_diff*  -m "parallel[3]"
=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.12.9, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/qbisi/firedrake
configfile: setup.cfg
plugins: nbval-0.11.0, mpi-pytest-2025.4.0, xdist-3.6.1, timeout-2.3.1
collected 8 items / 4 deselected / 4 selected                                                                                                                                             

tests/firedrake/regression/test_adv_diff.py ..                                                                                                                                      [ 50%]
tests/firedrake/regression/test_adv_diff_nonsplit.py ..                                                                                                                             [100%]

============================================================================ 4 passed, 4 deselected in 49.09s =============================================================================

connorjward · 2025-04-18T10:57:36Z

Interesting... I'll have to try this myself next week. I assume if you run with-sv you get sensible output?

qbisi · 2025-04-18T11:18:27Z

adding code snippet in pytest unit tests/firedrake/regression/test_adv_diff.py

@pytest.mark.parallel
def test_adv_diff_parallel():
    from mpi4py import MPI

    comm = MPI.COMM_WORLD
    size = comm.Get_size()  # Total number of processes
    rank = comm.Get_rank()  # Rank of this process

    if rank == 0:
        print(f"Number of processes (nproc): {size}")
    run_adv_diff()

I can valid that the tests do run parallel, the outputs looks like this

% python -m pytest -s tests/firedrake/regression/test_adv_diff.py  -m "parallel[3]"
============================================================================== test session starts ==============================================================================
platform linux -- Python 3.12.9, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/qbisi/firedrake
configfile: setup.cfg
plugins: nbval-0.11.0, mpi-pytest-2025.4.0, xdist-3.6.1, timeout-2.3.1
collected 4 items / 2 deselected / 2 selected                                                                                                                                   

tests/firedrake/regression/test_adv_diff.py Number of processes (nproc): 3
...
1 passed in 7.49s

qbisi · 2025-04-19T01:48:34Z

I have a CI server that run full pytestCheck of the nix built firedrake (compiled with openmpi). The log is available here
https://hydra.csrc.eu.org/build/960/nixlog/1

There are however two failure
FAILED tests/firedrake/multigrid/test_embedded_transfer.py::test_riesz_parallel[Exact-2-CG]
FAILED tests/firedrake/regression/test_interpolate_cross_mesh.py::test_interpolate_cross_mesh_parallel[spheresphere]

Can you reproduce these failure on Github CI.

connorjward · 2025-04-21T08:34:37Z

We currently run our test suite using OpenMPI and aren't seeing those errors. I think I have seen the interpolation failure before but didn't have the time to fix it.

Please note that Firedrake does not use mpi-pytest with mpiexec "on the inside". We instead call it on the outside like so. Whilst it is cool that OpenMPI seems to work on the inside I wouldn't necessarily recommend that you do it that way. Our test suite is enormous and known to be quite fragile to very subtle global state/caching issues so I would recommend trying to emulate what we do as much as possible.

connorjward

I have tested this locally and it runs, which is pretty astonishing! I would be very happy to merge this code subject to some changes.

connorjward · 2025-04-23T10:31:02Z

pytest_mpi/__init__.py

@@ -1 +1,4 @@
 from .parallel_assert import parallel_assert  # noqa: F401
+from .detect_mpiexec_implementation import detect_mpiexec_implementation
+
+impl = detect_mpiexec_implementation()


I think we should avoid doing this at import time. If someone doesn't have MPI installed but they do have this plugin (unlikely I know) then this could break their system. Also some systems don't call mpiexec and use srun instead. I think a lazy global variable inside plugin.py would be better:

_mpi_implementation = None def _set_parallel_callback(...): global _mpi_implementation if _mpi_implementation is None: _mpi_implementation = sniff_mpi_implementation() ...

connorjward · 2025-04-23T10:32:34Z

pytest_mpi/detect_mpiexec_implementation.py

+        if "open mpi" in output or "open-rte" in output:
+            return "Open MPI"
+        elif "mpich" in output:
+            return "MPICH"


Please use an enum instead:

class MPIImplementation(enum.Enum): OPENMPI = enum.auto() MPICH = enum.auto() UNKNOWN = enum.auto()

JHopeCollins

Thanks for contributing!

Its interesting that forking mode now seems to work with OpenMPI. Do we know what is different vs what we were doing before?

JHopeCollins · 2025-04-23T10:54:44Z

pytest_mpi/detect_mpiexec_implementation.py

+        else:
+            return "Unknown MPI implementation"
+    except FileNotFoundError:
+        raise FileNotFoundError("mpiexec not found in PATH")


Suggested change

raise FileNotFoundError("mpiexec not found in PATH")

raise FileNotFoundError("mpiexec not found in PATH. Please run in non-forking mode to specify a different MPI executable.")

The message should tell the user how to correct the error.

I'm hesitant to just error in this case because it will cause incorrect failures in the case that someone has an MPI distribution without mpiexec. I'm thinking specifically of HPC systems where they are likely to have srun or aprun, or something else depending on the scheduling manager. There's also mpirun but I think that's only MPICH and OpenMPI and is deprecated anyway.

The proper fix should should probably go in another PR, but I wanted to just draw attention to it here.
I think the right thing to do here is allow the user to pass the MPI executable, as in the third point in #16, and only error if both no executable is given and we cannot find mpiexec.. It's probably unlikely that someone would be running "on the inside" on an HPC because you have way less control, but users are users.

I agree that we should allow the user to specify a custom mpiexec equivalent command, but I don't think that that needs to be thought about in this PR. Bear in mind that HPC systems would be expected to run with MPI on the outside, at least initially, and so they won't hit this.

qbisi · 2025-04-23T11:11:45Z

I am happy to follow the required suggestion.
But I dont have time until next weak.

connorjward · 2025-04-23T11:12:27Z

Its interesting that forking mode now seems to work with OpenMPI. Do we know what is different vs what we were doing before?

Absolutely no idea. I definitely observed the no-forking issue myself at some point in the past.

This should make things more pleasant for devs who want to be able to run pytest everything, but I think we want to keep our CI as is because having MPI on the outside has less overhead and also exposes more global state bugs.

connorjward · 2025-05-22T11:59:09Z

@qbisi are you able to address these comments? If you no longer have the time to push this through I would be happy to take over the small amount of cleanup.

qbisi · 2025-05-22T12:06:03Z

Sorry, i dont have the time and capability to address these comment. Glad you can take over.

connorjward · 2025-05-22T12:09:16Z

OK. Thanks for your contribution regardless! This would never have been done because we all thought it impossible.

pytest-plugin: support openmpi

70d1ec8

connorjward requested changes Apr 18, 2025

View reviewed changes

connorjward requested changes Apr 23, 2025

View reviewed changes

JHopeCollins requested changes Apr 23, 2025

View reviewed changes

qbisi closed this May 22, 2025

This was referenced May 22, 2025

OpenMPI forking mode support #21

Merged

WIP Emit a helpful error message when trying to run in forking mode #14

Closed

	raise FileNotFoundError("mpiexec not found in PATH")
	raise FileNotFoundError("mpiexec not found in PATH. Please run in non-forking mode to specify a different MPI executable.")

pytest-plugin: support openmpi #17

pytest-plugin: support openmpi #17

Uh oh!

Conversation

qbisi commented Apr 16, 2025

Uh oh!

connorjward left a comment

Choose a reason for hiding this comment

Uh oh!

qbisi commented Apr 18, 2025

Uh oh!

qbisi commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

connorjward commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qbisi commented Apr 18, 2025

Uh oh!

connorjward commented Apr 18, 2025

Uh oh!

qbisi commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qbisi commented Apr 19, 2025

Uh oh!

connorjward commented Apr 21, 2025

Uh oh!

connorjward left a comment

Choose a reason for hiding this comment

Uh oh!

connorjward Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

connorjward Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

JHopeCollins left a comment

Choose a reason for hiding this comment

Uh oh!

JHopeCollins Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

connorjward Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

qbisi commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

connorjward commented Apr 23, 2025

Uh oh!

connorjward commented May 22, 2025

Uh oh!

qbisi commented May 22, 2025

Uh oh!

connorjward commented May 22, 2025

Uh oh!

Uh oh!

qbisi commented Apr 18, 2025 •

edited

Loading

connorjward commented Apr 18, 2025 •

edited

Loading

qbisi commented Apr 18, 2025 •

edited

Loading

qbisi commented Apr 23, 2025 •

edited

Loading