-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PASS] Improve graph fusion #286
Conversation
@@ -218,6 +221,7 @@ nnvm::Graph GraphFuse(nnvm::Graph g) { | |||
nnvm::Op::GetAttr<FTVMCompute>("FTVMCompute"); | |||
static auto& fschedule = | |||
nnvm::Op::GetAttr<FTVMSchedule>("FTVMSchedule"); | |||
std::unordered_map<uint32_t, std::vector<Operation>> group_ops; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space between >>
for vs compilers
@@ -68,6 +68,7 @@ nnvm::Graph GraphPartition(nnvm::Graph g) { | |||
std::vector<int> master_vec(idx.num_nodes(), -1); | |||
// Operator pattern | |||
static auto& op_pattern = nnvm::Op::GetAttr<TOpPattern>("TOpPattern"); | |||
auto same_shape = [&] (uint32_t leid, uint32_t reid) { return shape_vec[leid] == shape_vec[reid]; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too short that maybe we don't need this lambda
case kBroadcast: | ||
if (master_vec[e.node_id] == -1) { | ||
fuse_vec[e.node_id] = FuseRule::kFuseToMaster; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will result in a bug when we first have input that is ewise, then another that is complex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, ignore my previous comment, this seems to be fine, as the input can also be fused, please add comment here to indicate this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always put break outside if block
chosen_master = master_vec[e.node_id]; | ||
fuse_vec[e.node_id] = FuseRule::kFuseToMaster; | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relies on pass through behavior, which is not quite intuitive. Always put break in outside, and add an else case
case kComplex: | ||
if (chosen_master == -1 && same_shape(idx.entry_id(nid, 0), idx.entry_id(e))) { | ||
chosen_master = master_vec[e.node_id]; | ||
fuse_vec[e.node_id] = FuseRule::kFuseToMaster; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the pattern is complex, the final output pattern need to be kComplex
case kElemWise: | ||
case kBroadcast: | ||
if (master_vec[e.node_id] == -1) { | ||
fuse_vec[e.node_id] = FuseRule::kFuseToMaster; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to rely on master flag here>
ref_count
should consider graph outputs. If an intermediate tensor is an input of another node also the graph output, it should not be fused.