Skip to content

Move --host ordering fix to v3.0.x, v3.1.x, v4.0.x? #6501

Closed
@jsquyres

Description

@jsquyres

Per #6298, we had an accidental change in behavior of mpirun --host aaa,bbb between version v2.1.x and v3.0.x. A fix just went in to master in #6493.

Here's what happened:

The question is: should we put this fix on any of v3.0.x, v3.1.x, and/or v4.0.x?

Summary of behavior change

Behavior X

The ordering of hosts in the --host list matters:

$ mpirun --host aaa,bbb rank_test
aaa: MCW rank 0
bbb: MCW rank 1
$ mpirun --host bbb,aaa rank_test
aaa: MCW rank 1
bbb: MCW rank 0

Behavior Y

The ordering of hosts in the --host list does not matter (note: this behavior was unintentional. It was always intended that we honor the ordering of hosts in the --host list):

$ mpirun --host aaa,bbb rank_test
aaa: MCW rank 0
bbb: MCW rank 1
$ mpirun --host bbb,aaa rank_test
aaa: MCW rank 0
bbb: MCW rank 1

Discussion points

We need to discuss this and decide what to do. Points (in no particular order):

  1. This is a fairly minor change in behavior.
  2. Apparently no one noticed this change in behavior between v2.1.x and v3.0.x. It was only discovered recently by @bturrubiates, a Cisco employee (while using Open MPI for other / unrelated testing).
  3. The fix is probably not worth putting into v3.0.x or v3.1.x.
  4. But it might be worthwhile to put in to v4.0.x...?
  5. That being said, even putting it in v4.0.x is at least sorta breaking backwards compatibility. You could squint at this and call it a bug and therefore allow it in. Or you could say that it was effectively the behavior of all the v3.x/v4.x releases, and they're backwards compatible with each other, so we should maintain that behavior in v4.0.x.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions