-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[fleet executor] Init fleet exe and prepare feed&fetch #39032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your contribution! |
|
||
bool DistModel::PrepareFleetExe() { | ||
task_node_.reset(new TaskNode(program_.get(), config_.local_rank)); | ||
if (config_.local_rank - config_.mp_degree >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个判断pp是否是头尾的可以抽个函数出来,这样可以一目了然
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉可以,好几个地方都用到了😂
if (config_.local_rank + config_.mp_degree < config_.nranks) { | ||
task_node_->AddDownstreamTask(config_.local_rank + config_.mp_degree); | ||
} | ||
task_node_->SetType("Compute"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buffer size后面应该也需要设置一下,2。
后续仔细讨论一下inference Pipeline的流程,和目前我们的训练流程有些区别。
一方面要考虑吞吐,另一方面还需要考虑延时
} | ||
|
||
TaskNode::TaskNode(paddle::framework::ProgramDesc* program, int64_t rank) | ||
: program_(program), rank_(rank), task_id_(rank) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续我们可能得考虑如何对接预测的那些pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass啥没关系,这个都放在dist model的init函数里,所有pass都跑完了,再用最终program初始化TaskNode就行
PR types
Others
PR changes
Others
Describe
Init fleet exe and prepare feed&fetch