Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-366]Extend MXNet Distributed Training by AllReduce #10696

Closed
wants to merge 43 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
e081f33
Extend MXNet Distributed Training by MPI AllReduce
zhouhaiy Apr 23, 2018
de7439b
Code modification according to code review comment.
zhouhaiy Apr 26, 2018
3935439
temporarily change ps-lite git url
zhouhaiy Apr 26, 2018
18420f8
Fix programming style issue reported from lint
zhouhaiy Apr 26, 2018
ff3c924
Change typo in mpi_message.proto
zhouhaiy Apr 26, 2018
732237b
Fix typo in mpi_message.proto
zhouhaiy Apr 28, 2018
48cf83e
re-trigger the Jenkins.
zhouhaiy May 2, 2018
1c57d2f
Trigger jenkins
zhouhaiy May 2, 2018
5536cbf
Code modification according to Rahul Huilgol's comment.
zhouhaiy May 7, 2018
bd74e7f
Fix cpplint check error and simplify default mpi build logic in
zhouhaiy May 8, 2018
48141fb
Code modification according to Haibin and szha.
zhouhaiy May 12, 2018
e9f6dda
Add allreduce multi-node support for gluon
zhouhaiy May 15, 2018
35de580
1) Change name to make solution more general:
zhouhaiy May 15, 2018
7094a52
Fix error reported from pylint
zhouhaiy May 15, 2018
248678b
Retrigger the jenkins
zhouhaiy May 16, 2018
62b5998
Enhance test for dist_allreduce_sync_kvstore
zhouhaiy May 16, 2018
7ed53e9
Trigger jenkins
zhouhaiy May 16, 2018
17b59fc
Trigger jenkins
zhouhaiy May 16, 2018
4b0c9ac
1) Code modification according to Rahul's comment.
zhouhaiy Jun 19, 2018
86b3919
Add allreduce kvstore test into CI.
zhouhaiy Jun 19, 2018
bf541ed
Re-trigger jenkins
zhouhaiy Jun 20, 2018
f25e41e
Add cmake for allreduce kvstore
zhouhaiy Jun 20, 2018
2a0cacd
Revert back to dmlc ps-lite since our PR of change protobuf
zhouhaiy Jun 23, 2018
302f415
Add GPU support for allreduce kvstore
zhouhaiy Jun 27, 2018
f0b75e5
Fix cpplint check error
zhouhaiy Jun 27, 2018
1444a84
Re-trigger the jenkins
zhouhaiy Jun 28, 2018
31f3e35
Retrigger jenkins
zhouhaiy Jun 29, 2018
ffb752a
Add full generated files in dependency
zhouhaiy Jun 29, 2018
64090aa
Fix build error which will be exposed in corner case
zhouhaiy Jul 1, 2018
e321258
Merge branch 'master' into master
threeleafzerg Jul 1, 2018
601e84b
Add license for proto file
zhouhaiy Jul 1, 2018
8a0d88c
Merge branch 'master' of https://github.com/threeleafzerg/incubator-m…
zhouhaiy Jul 1, 2018
05aa2fa
Modify license for proto file
zhouhaiy Jul 1, 2018
716e873
Re-trigger the jenkins
zhouhaiy Jul 2, 2018
bf20cb4
Code modification according to haibin's review comment
zhouhaiy Jul 2, 2018
2956565
Add separated build for allreduce kvstore in jenkins
zhouhaiy Jul 3, 2018
62ee79c
Change protobuf library dependency from dynamic link to static link
zhouhaiy Jul 3, 2018
9fa70f4
Code modification according to macro and xcgoner's review comments.
zhouhaiy Jul 4, 2018
177bd02
Retrigger jenkins
zhouhaiy Jul 4, 2018
17d440a
Merge branch 'master' into master
threeleafzerg Jul 10, 2018
470cbbe
1. Adapt for the change of pull interface
zhouhaiy Jul 10, 2018
fb3aa9a
Fix cpplint check error
zhouhaiy Jul 10, 2018
127b085
Retrigger jenkins
zhouhaiy Jul 11, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix cpplint check error
  • Loading branch information
zhouhaiy committed Jul 10, 2018
commit fb3aa9aedefd12e630e7adf552556dc6f7c48150
4 changes: 2 additions & 2 deletions src/kvstore/collectives/src/collectives.cc
Original file line number Diff line number Diff line change
Expand Up @@ -601,8 +601,8 @@ int InitializeMPIOnce(Comm *comm) {
coll_global.device = -1;
coll_global.local_comm = comm;
coll_global.pinned_ctx = coll_global.local_comm->pinned_ctx();
coll_global.sync_var1 = mxnet::NDArray(mxnet::TShape({1,1}), coll_global.pinned_ctx, true);
coll_global.sync_var2 = mxnet::NDArray(mxnet::TShape({1,1}), coll_global.pinned_ctx, true);
coll_global.sync_var1 = mxnet::NDArray(mxnet::TShape({1, 1}), coll_global.pinned_ctx, true);
coll_global.sync_var2 = mxnet::NDArray(mxnet::TShape({1, 1}), coll_global.pinned_ctx, true);
coll_global.sync_key = 0xfeedbeaf;

coll_global.background_thread = std::thread(BackgroundThreadLoop);
Expand Down