Skip to content

Document binding behavior (especially w.r.t. threads) #4845

Open
@jsquyres

Description

@jsquyres

Per discussion on the 2018-02-20 webex, and per #4799:

The general issue appears to be that since Open MPI binds to socket by default (for np>2), progress threads may not be located on the same core as the "main" thread(s). #4799 talks about this in the context of PMIx, but the issue actually exists for all progress threads in the MPI process.

The short version is that we agreed that the best way to move forward is to document the current behavior and provide information for people who want different behavior (e.g., enable binding to core). This probably entails:

  • Adding something to README
  • Adding one or more questions to the FAQ (which tends to be more Google-able than the README)

Points made during the discussion:

  • Forever ago, we used to bind-to-core by default. We changed to bind-to-socket for a few reasons, one of which was that we wanted to embrace an MPI_THREAD_MULTIPLE world. I.e., if we bind-to-core by default and an app launches a bunch of threads, they're going to be bound to core by default, and life will ...hurt. If we bind-to-socket by default (at least for np>2), then apps that launch a bunch of threads will likely hurt less.
  • This is a "no right answer" kind of scenario -- if we change the binding defaults, we're going to anger some users while appeasing others. As such, the only winning move may be to not play. I.e., document the current behavior, and explain how to change the behavior for those who want to.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions