Skip to content

pytest-plugin: support openmpi #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

qbisi
Copy link
Contributor

@qbisi qbisi commented Apr 16, 2025

This commit add compatiblity to run pytest with Open MPI implemention.

Copy link
Collaborator

@connorjward connorjward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I'm away from my computer until next Wednesday so can't give a proper review until then. I have some initial comments:

  • I don't think there needs to be a new module and global variable for the distribution type. Just having a free function that we call in the first appropriate pytest hook in plugin.py would be fine.
  • I prefer enums to strings for this sort of thing.

I'm also a little surprised that this works. Do you have a real case where the "outer" process does not call MPI_Init?

@qbisi
Copy link
Contributor Author

qbisi commented Apr 18, 2025

I am newbie in writting python module. This PR is more than just an issue—a request for compatibility with OpenMPI.
I can have a try taking your advice.

I use this patched mpi-pytest plugin doing pytest check of firedrakeProject(compiled with openmpi).

The check command is

python -m pytest -n auto tests

@qbisi
Copy link
Contributor Author

qbisi commented Apr 18, 2025

  • I don't think there needs to be a new module and global variable for the distribution type

How can i save the mpiexec implemention type as a static variable. Cause runing mpiexec --version before each test is a waste of time.

@connorjward
Copy link
Collaborator

connorjward commented Apr 18, 2025

I use this patched mpi-pytest plugin doing pytest check of firedrakeProject(compiled with openmpi).

Ah. In that case I don't think that this will actually work. To run with OpenMPI you need to run with pytest "on the inside". For example:

$ mpiexec -n 3 pytest tests -m parallel[3]

This is detailed in the README.

Have you tested that your PR works? I would be very surprised (but happy) if it actually runs.

@qbisi
Copy link
Contributor Author

qbisi commented Apr 18, 2025

A short terminal catch

qbisi@x79 ~/firedrake (remotes/origin/JHopeCollins/more-apt-petsc-packages~3^2) $ python -m pytest tests/firedrake/regression/test_adv_diff*  -m "parallel[3]"
=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.12.9, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/qbisi/firedrake
configfile: setup.cfg
plugins: nbval-0.11.0, mpi-pytest-2025.4.0, xdist-3.6.1, timeout-2.3.1
collected 8 items / 4 deselected / 4 selected                                                                                                                                             

tests/firedrake/regression/test_adv_diff.py ..                                                                                                                                      [ 50%]
tests/firedrake/regression/test_adv_diff_nonsplit.py ..                                                                                                                             [100%]

============================================================================ 4 passed, 4 deselected in 49.09s =============================================================================

@connorjward
Copy link
Collaborator

Interesting... I'll have to try this myself next week. I assume if you run with-sv you get sensible output?

@qbisi
Copy link
Contributor Author

qbisi commented Apr 18, 2025

adding code snippet in pytest unit tests/firedrake/regression/test_adv_diff.py

@pytest.mark.parallel
def test_adv_diff_parallel():
    from mpi4py import MPI

    comm = MPI.COMM_WORLD
    size = comm.Get_size()  # Total number of processes
    rank = comm.Get_rank()  # Rank of this process

    if rank == 0:
        print(f"Number of processes (nproc): {size}")
    run_adv_diff()

I can valid that the tests do run parallel, the outputs looks like this

% python -m pytest -s tests/firedrake/regression/test_adv_diff.py  -m "parallel[3]"
============================================================================== test session starts ==============================================================================
platform linux -- Python 3.12.9, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/qbisi/firedrake
configfile: setup.cfg
plugins: nbval-0.11.0, mpi-pytest-2025.4.0, xdist-3.6.1, timeout-2.3.1
collected 4 items / 2 deselected / 2 selected                                                                                                                                   

tests/firedrake/regression/test_adv_diff.py Number of processes (nproc): 3
...
1 passed in 7.49s

@qbisi
Copy link
Contributor Author

qbisi commented Apr 19, 2025

I have a CI server that run full pytestCheck of the nix built firedrake (compiled with openmpi). The log is available here
https://hydra.csrc.eu.org/build/960/nixlog/1

There are however two failure
FAILED tests/firedrake/multigrid/test_embedded_transfer.py::test_riesz_parallel[Exact-2-CG]
FAILED tests/firedrake/regression/test_interpolate_cross_mesh.py::test_interpolate_cross_mesh_parallel[spheresphere]

Can you reproduce these failure on Github CI.

@connorjward
Copy link
Collaborator

We currently run our test suite using OpenMPI and aren't seeing those errors. I think I have seen the interpolation failure before but didn't have the time to fix it.

Please note that Firedrake does not use mpi-pytest with mpiexec "on the inside". We instead call it on the outside like so. Whilst it is cool that OpenMPI seems to work on the inside I wouldn't necessarily recommend that you do it that way. Our test suite is enormous and known to be quite fragile to very subtle global state/caching issues so I would recommend trying to emulate what we do as much as possible.

Copy link
Collaborator

@connorjward connorjward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this locally and it runs, which is pretty astonishing! I would be very happy to merge this code subject to some changes.

@@ -1 +1,4 @@
from .parallel_assert import parallel_assert # noqa: F401
from .detect_mpiexec_implementation import detect_mpiexec_implementation

impl = detect_mpiexec_implementation()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid doing this at import time. If someone doesn't have MPI installed but they do have this plugin (unlikely I know) then this could break their system. Also some systems don't call mpiexec and use srun instead. I think a lazy global variable inside plugin.py would be better:

_mpi_implementation = None

def _set_parallel_callback(...):
    global _mpi_implementation

    if _mpi_implementation is None:
        _mpi_implementation = sniff_mpi_implementation()

    ...

if "open mpi" in output or "open-rte" in output:
return "Open MPI"
elif "mpich" in output:
return "MPICH"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use an enum instead:

class MPIImplementation(enum.Enum):
    OPENMPI = enum.auto()
    MPICH = enum.auto()
    UNKNOWN = enum.auto()

Copy link
Member

@JHopeCollins JHopeCollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing!

Its interesting that forking mode now seems to work with OpenMPI. Do we know what is different vs what we were doing before?

else:
return "Unknown MPI implementation"
except FileNotFoundError:
raise FileNotFoundError("mpiexec not found in PATH")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise FileNotFoundError("mpiexec not found in PATH")
raise FileNotFoundError("mpiexec not found in PATH. Please run in non-forking mode to specify a different MPI executable.")

The message should tell the user how to correct the error.

I'm hesitant to just error in this case because it will cause incorrect failures in the case that someone has an MPI distribution without mpiexec. I'm thinking specifically of HPC systems where they are likely to have srun or aprun, or something else depending on the scheduling manager. There's also mpirun but I think that's only MPICH and OpenMPI and is deprecated anyway.

The proper fix should should probably go in another PR, but I wanted to just draw attention to it here.
I think the right thing to do here is allow the user to pass the MPI executable, as in the third point in #16, and only error if both no executable is given and we cannot find mpiexec.. It's probably unlikely that someone would be running "on the inside" on an HPC because you have way less control, but users are users.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should allow the user to specify a custom mpiexec equivalent command, but I don't think that that needs to be thought about in this PR. Bear in mind that HPC systems would be expected to run with MPI on the outside, at least initially, and so they won't hit this.

@qbisi
Copy link
Contributor Author

qbisi commented Apr 23, 2025

I am happy to follow the required suggestion.
But I dont have time until next weak.

@connorjward
Copy link
Collaborator

Its interesting that forking mode now seems to work with OpenMPI. Do we know what is different vs what we were doing before?

Absolutely no idea. I definitely observed the no-forking issue myself at some point in the past.

This should make things more pleasant for devs who want to be able to run pytest everything, but I think we want to keep our CI as is because having MPI on the outside has less overhead and also exposes more global state bugs.

@connorjward
Copy link
Collaborator

@qbisi are you able to address these comments? If you no longer have the time to push this through I would be happy to take over the small amount of cleanup.

@qbisi
Copy link
Contributor Author

qbisi commented May 22, 2025

Sorry, i dont have the time and capability to address these comment. Glad you can take over.

@connorjward
Copy link
Collaborator

OK. Thanks for your contribution regardless! This would never have been done because we all thought it impossible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants