Skip to content

Conversation

FeixLiu
Copy link
Contributor

@FeixLiu FeixLiu commented Jan 19, 2022

PR types

Others

PR changes

Others

Describe

Init fleet exe and prepare feed&fetch

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.


bool DistModel::PrepareFleetExe() {
task_node_.reset(new TaskNode(program_.get(), config_.local_rank));
if (config_.local_rank - config_.mp_degree >= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个判断pp是否是头尾的可以抽个函数出来,这样可以一目了然

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉可以,好几个地方都用到了😂

if (config_.local_rank + config_.mp_degree < config_.nranks) {
task_node_->AddDownstreamTask(config_.local_rank + config_.mp_degree);
}
task_node_->SetType("Compute");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buffer size后面应该也需要设置一下,2。
后续仔细讨论一下inference Pipeline的流程,和目前我们的训练流程有些区别。
一方面要考虑吞吐,另一方面还需要考虑延时

}

TaskNode::TaskNode(paddle::framework::ProgramDesc* program, int64_t rank)
: program_(program), rank_(rank), task_id_(rank) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续我们可能得考虑如何对接预测的那些pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass啥没关系,这个都放在dist model的init函数里,所有pass都跑完了,再用最终program初始化TaskNode就行

@wangxicoding wangxicoding merged commit e43b6f6 into PaddlePaddle:develop Jan 19, 2022
@FeixLiu FeixLiu deleted the init_fleet_exe branch January 19, 2022 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants