Skip to content

Conversation

@Thunderbrook
Copy link
Contributor

@Thunderbrook Thunderbrook commented Feb 22, 2021

PR types

New features

PR changes

Others

Describe

support multi node in heterps mode

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Feb 22, 2021

✅ This PR's description meets the template requirements!
Please wait for other CI results.

@Thunderbrook Thunderbrook changed the title Multi node support multi node in heterps Feb 22, 2021
block.append_op(type='c_comm_init_all', attrs={'ring_id': 0})


class MultiThread(GradAllReduce):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要在minimize中添加MultiThread的使用

std::vector<std::vector<Path>> path_;
std::vector<LocalStorage> storage_;
int feanum_{1800 * 2048};
int multi_node_{1};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

写成可配置的形式

FeaturePushValue* d_grads, size_t len) {
comm_->push_sparse(num, d_keys, d_grads, len, opt_);
// comm_->push_sparse(num, d_keys, d_grads, len, opt_);
comm_->push_sparse_multi_node(num, d_keys, d_grads, len, opt_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要加入单机多机的判断,走push_sparse 或 push_sparse_multi_node

@Thunderbrook Thunderbrook merged commit c4f279f into PaddlePaddle:develop Feb 24, 2021
Thunderbrook added a commit to Thunderbrook/Paddle that referenced this pull request Mar 1, 2021
* push multi node

* multi node

* MultiThread

* remove log

* solve bug in 30829
fuyinno4 pushed a commit that referenced this pull request Mar 1, 2021
* solve build gpu task core (#30626)

* build gpu task core

* format

* dump to cpu (#30750)

* dump to cpu

* format

* format

* format

* support multi node in heterps (#31102)

* push multi node

* multi node

* MultiThread

* remove log

* solve bug in 30829

* optimizer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants