Description
Background information
Hi all - we're seeing unexpected rank placement when not launching from the first host in our hostfile. Orte seems to prioritize the launch node when assigning ranks. For example:
mpirun --hostfile ./hosts -N 4 ./echo.sh | grep computeA
computeA: 0
computeA: 1
computeA: 2
computeA: 3
where the hostfile looks like this:
computeB
computeA
and echo.sh is just:
#!/usr/bin/bash
echo $(hostname): $OMPI_COMM_WORLD_RANK
Basically it is giving priority rank assignment to the launch node. We would expect that computeA would be assigned ranks 4 through 7 based on the hostfile ordering.
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v4.1.4 and v4.1.7
Is this expected behavior? What is the rationale? This is something we've run into occasionally, and it can have an performance impact on certain workloads. We can work-around it of course by always launching from the first node in the hostfile. It just happens that sometimes in our testing, we occasionally launch from the wrong node.
Thanks!