-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locality Aware Broadcast #185
Conversation
src/scheduler/MpiWorld.cpp
Outdated
int MpiWorld::getMasterForHost(const std::string& host) | ||
{ | ||
std::vector<int> ranks = getRanksForHost(host); | ||
assert(!ranks.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only call this method internally, with values from the host list that we have been broadcasted, and that necessarily must match to a rank (otherwise it wouldn't have been sent in the first place).
Thus why we assert
here insted of throwing an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a useful comment in the code rather than in the PR.
62d04b3
to
cdd122a
Compare
cdd122a
to
a5a7922
Compare
In this PR I change the broadcast algorithm in MPI to reduce the number of cross-host messages sent. With the new implementation, the number of cross-host messages grows linearly with the number of hosts, and not with the number of ranks.
In the process I also change some minor issues with the broadcast implementation:
reduce
but only the root rank to callbroadcast
(all other ranks would callrecv
), see here for the implementation in faasm. Now all ranks call reduce, and the behaviour depends on the caller's rank.MPIMessage::BROADCAST
in the protobuf object, soMPI_Bcast
messages were labeled asNORMAL
messages.rootRank
for the rank which originates the broadcast.Regarding the algorithm itself, I introduce the notion of
localMaster
s in the MPI world. A local master is a selected leader for all MPI ranks living in a particular host. Then:For the moment local masters are the lowest rank in each host. Even though the load in local masters will now slightly increase, we are drastically reducing the load in rank 0 (which was always elected as the master, and therefore was doing a bunch of cross-host messaging).
To ease with testing I enhance the mocking tools in the MpiWorld.
Also note that this PR is rebased on top of #187 for faster GHA tests, so I'd recommend reviewing #187 first.