Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ: MPI init failure when running wrapped MPI process #60

Open
SteVwonder opened this issue Sep 4, 2020 · 0 comments
Open

FAQ: MPI init failure when running wrapped MPI process #60

SteVwonder opened this issue Sep 4, 2020 · 0 comments

Comments

@SteVwonder
Copy link
Member

If you attempt to launch an MPI process using a wrapper, it may fail. For example, if you run flux mini run my_script.py mpi_app.exe, where my_script.py Popens mpi_app.exe, the MPI init will fail, even if it is just a single rank MPI application. Same thing for flux mini run totalview mpi_app.exe. The issue is that Python's Popen and Totalview close file descriptors before launching the child process. MPI uses the PMI_FD file descriptor to communicate with Flux in order to bootstrap.

If you are attempting to debug your MPI application with Totalview, follow these instructions: https://flux-framework.readthedocs.io/en/latest/debugging.html#parallel-debugging-using-totalview

If you are attempting to wrap your MPI application with a Python script and Popen, then make sure to pass close_fds=False to Popen: https://docs.python.org/3/library/subprocess.html#subprocess.Popen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant