Skip to content

Singleton init mode, MPI_Comm_spawn, and OPAL_PREFIX #12349

Open
@dalcinl

Description

@dalcinl

I'm working on producing a binary Open MPI Python wheel to allow for pip install openmpi. I had to add a few hacks here and there to overcome a few issues strictly related to how wheel files are stored, how they are installed in Python virtual environments, and the lack of post-install hooks, and the expectations of things to work without activating the environment . All this of course requires relocating the Open MPI installation. And at that point I've found and minor issue that I don't know how to overcome.

First, a clarification: I'm using internal PMIX and PRRTE. Supposedly, OPAL_PREFIX env var is all what is needed for things to work out of the box when relocating an Open MPI install. However, I think I came across a corner case. If using singleton init mode, then I believe OPAL_PREFIX is simply ignored, and if tools are not located via $PATH, then things do not work.

Looking at the spawn code, I see a function start_dvm in ompi/dpm/dpm.c.
This function start_dvm has the following code:

    /* find the prte binary using the install_dirs support - this also
     * checks to ensure that we can see this executable and it *is* executable by us
     */
    cmd = opal_find_absolute_path("prte");

However, I believe opal_find_absolute_path() does not care at all about the OPAL_PREFIX env var, it only uses PATH, eventually. The comment find the prte binary using the install_dirs support is simply not true.

@hppritcha Your input is much appreciated here. I do have a reproducer, but so far it is based on Python and intermediate binary assets I'm generating locally. I hope the description above is enough for you to realize the issue.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions