Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support openmpi 5 #1528

Open
ClaudiaComito opened this issue Jun 11, 2024 · 7 comments
Open

Support openmpi 5 #1528

ClaudiaComito opened this issue Jun 11, 2024 · 7 comments
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request interoperability
Milestone

Comments

@ClaudiaComito
Copy link
Contributor

ClaudiaComito commented Jun 11, 2024

Related
conda-forge/heat-feedstock#15

Feature functionality
Verify that everything runs as expected with openmpi 5. So far, we're testing everything on 4.1.x

@ClaudiaComito ClaudiaComito added the enhancement New feature or request label Jun 11, 2024
@ClaudiaComito ClaudiaComito added this to the 1.5.0 milestone Jun 11, 2024
@ClaudiaComito ClaudiaComito changed the title Run tests with openmpi 5 Support openmpi 5 Jun 11, 2024
@mrfh92
Copy link
Collaborator

mrfh92 commented Jun 21, 2024

@ClaudiaComito I have run our GPU-tests on HAICORE on a single node with 4 MPI-processes and they were fine.
Modules:

1) dot   2) compiler/intel/2023.1.0   3) numlib/mkl/2022.0.2   4) devel/cuda/12.2 (E)   5) jupyter/minimal/2024-05-14   6) mpi/openmpi/5.0

More than one node cannot be requested in this configuration; thats why I have only run the single-node test.

[hint: instead of srun python ... you need to do srun --mpi=pmix python ... to make it run, at least in the current configuration of Slurm/OpenMPI on the system]

@mrfh92
Copy link
Collaborator

mrfh92 commented Jun 21, 2024

maybe @JuanPedroGHM can run them on two nodes?

@JuanPedroGHM
Copy link
Member

Will give it a try, but it might take some time. Will try to squeeze it during next week's business trip.

@JuanPedroGHM
Copy link
Member

JuanPedroGHM commented Jun 28, 2024

Test results:

Horeka Partition: dev_accelerated
Python 3.11
OpenMPI 5.0
mpi4py 3.1.6
CUDA 12.2

Nodes GPUs/Per Node Ranks Compiler Device Result
2 4 8 LLVM 17 CPU Stuck at test_solver.py
2 4 8 LLVM 17 GPU Stuck at test_solver.py
2 4 8 Intel 2023 CPU Stuck at test_solver.py
2 4 8 Intel 2023 GPU Stuck at test_solver.py
2 1 2 LLVM 17 CPU Stuck at test_io.py
2 1 2 LLVM 17 GPU Stuck at test_io.py
2 1 2 Intel 2023 CPU Stuck at test_io.py
2 1 2 Intel 2023 GPU Stuck at test_io.py
1 4 4 LLVM 17 CPU 481 Passed, 4 Skipped, 15 Warnings
1 4 4 LLVM 17 GPU 480 Passed, 5 Skipped, 12 Warnings
1 4 4 Intel 2023 CPU 481 Passed, 4 Skipped, 15 Warnings
1 4 4 Intel 2023 GPU 480 Passed, 5 Skipped, 12 Warnings

Multi-node is not working. Had to interrupt the run after some time, it got stuck unnaturally long when running on multiple nodes.

@mrfh92 mrfh92 added the dependencies Pull requests that update a dependency file label Jul 18, 2024
@ClaudiaComito
Copy link
Contributor Author

@JuanPedroGHM does mpi4py 4 (#1618 ) change anything?

@JuanPedroGHM
Copy link
Member

Have not had the time to test it with OpenMPI 5. I could give it a try tomorrow.

Copy link
Contributor

This issue is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stale label Oct 21, 2024
@mtar mtar modified the milestones: 1.5.0, 1.5.1 Oct 30, 2024
@github-actions github-actions bot removed the stale label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request interoperability
Projects
None yet
Development

No branches or pull requests

4 participants