Skip to content

mpirun v4.1.7 prioritizing rank placement based on launch node #13199

Open
@aw-lauria

Description

@aw-lauria

Background information

Hi all - we're seeing unexpected rank placement when not launching from the first host in our hostfile. Orte seems to prioritize the launch node when assigning ranks. For example:

mpirun --hostfile ./hosts -N 4 ./echo.sh  | grep computeA
computeA: 0
computeA: 1
computeA: 2
computeA: 3

where the hostfile looks like this:

computeB
computeA

and echo.sh is just:

#!/usr/bin/bash
echo $(hostname): $OMPI_COMM_WORLD_RANK

Basically it is giving priority rank assignment to the launch node. We would expect that computeA would be assigned ranks 4 through 7 based on the hostfile ordering.

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v4.1.4 and v4.1.7

Is this expected behavior? What is the rationale? This is something we've run into occasionally, and it can have an performance impact on certain workloads. We can work-around it of course by always launching from the first node in the hostfile. It just happens that sometimes in our testing, we occasionally launch from the wrong node.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions