Update local-execution to newest task-spec#1625
Conversation
… equivalent dataflow, expand.
Refactor computation graph instance as DTG. Cut out more stuff that doesn't compile yet.
…taflow_graph_from_cg.
elliottslaughter
left a comment
There was a problem hiding this comment.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
elliottslaughter
left a comment
There was a problem hiding this comment.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
elliottslaughter
left a comment
There was a problem hiding this comment.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
@elliottslaughter resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 20 unresolved discussions (waiting on @lockshaw).
lockshaw
left a comment
There was a problem hiding this comment.
@lockshaw reviewed 4 files and all commit messages, and made 1 comment.
Reviewable status: 68 of 81 files reviewed, 20 unresolved discussions (waiting on @elliottslaughter).
lib/local-execution/include/local-execution/computation_graph_instance/computation_graph_instance.h line 30 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Generally we make all constructors
explicitunless there's a clear reason not to
Testing publishing a response
elliottslaughter
left a comment
There was a problem hiding this comment.
@elliottslaughter made 10 comments.
Reviewable status: 68 of 81 files reviewed, 20 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
@elliottslaughter made 1 comment.
Reviewable status: 68 of 81 files reviewed, 20 unresolved discussions (waiting on @lockshaw).
lib/local-execution/include/local-execution/computation_graph_instance/computation_graph_instance.h line 30 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Testing publishing a response
Test reply
elliottslaughter
left a comment
There was a problem hiding this comment.
@elliottslaughter made 1 comment.
Reviewable status: 68 of 81 files reviewed, 20 unresolved discussions (waiting on @lockshaw).
elliottslaughter
left a comment
There was a problem hiding this comment.
@elliottslaughter made 6 comments and resolved 1 discussion.
Reviewable status: 68 of 81 files reviewed, 19 unresolved discussions (waiting on @lockshaw).
lib/local-execution/include/local-execution/task_execution.h line 17 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Any reason for deleting all of the argument names?
I went back and forth on this one because the rest of the code seems to be inconsistent. What do you prefer?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 6 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
I'm not sure this is the most consistent, particularly considering the parallel case of tensors.
I feel like the two consistent options are:
layer/parallel_layer/loss, orcg_layer/pcg_layer/loss
We used option 1 for tensors so that feels the most consistent to me, but let me know if you disagree.
lockshaw
left a comment
There was a problem hiding this comment.
@lockshaw reviewed 13 files, made 7 comments, and resolved 11 discussions.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @elliottslaughter).
lib/local-execution/include/local-execution/task_execution.h line 17 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
I went back and forth on this one because the rest of the code seems to be inconsistent. What do you prefer?
I have moved over to including the argument names because it enables us to use clang-tidy to check the argument name comments (i.e., the /*my_argument=*/... syntax), and because it enables adding doxygen docstrings (I don't think you can document arguments without names in doxygen, at least not easily)
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
You can also just use a prose name, like get_tensor for read-only forward tensor. I just don't want the subcase names to explicitly contradict the signature
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 6 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
I'm not sure this is the most consistent, particularly considering the parallel case of tensors.
I feel like the two consistent options are:
layer/parallel_layer/loss, orcg_layer/pcg_layer/lossWe used option 1 for tensors so that feels the most consistent to me, but let me know if you disagree.
Yeah, either is fine, I just don't like layer by itself because it's ambiguous (unfortunately we don't have a good name for a non-parallel layer that's shorter than "non_parallel_layer", hence "cg_layer"). Either parallel_layer or pcg_layer is fine to me
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
See if the new version in the updated diff makes more sense.
Yup, makes much more sense. Thanks!
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
Makes sense. I'd say either (1) include loss attrs in the graph, or (2) remove the loss_attrs argument. Either is fine with me, but having a loss_attrs argument that just isn't used feels weird.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
Honestly, I don't know. As far as I'm aware everything downstream uses the
ParallelTensorShape, but if you want this represented I can put it in.
Nope, if it's working for now that's fine, just wanted to raise the issue to make sure it got considered/kept in mind as we go forward. We can always change it if we discover a case where it is necessary.
lib/local-execution/src/local-execution/device_state_initialization.cc line 85 at r5 (raw file):
}); // FIXME: this assert fails because not all kinds of nodes require // initialization ASSERT(all_nodes_are_initialized(dg));
Got it, in that case it seems like having an all_nodes_are_initialized function doesn't make a whole lot of sense as it doesn't tell us anything, so maybe the answer is just to remove that function. Another option would be to keep it, but to have it incorporate what types of nodes don't need to be initialized. Either is fine, just trying to understand (and if convenient, add a check in the code) the postcondition of this function
elliottslaughter
left a comment
There was a problem hiding this comment.
The remaining items should be resolved now. Let me know if there is anything else.
@elliottslaughter made 7 comments and resolved 3 discussions.
Reviewable status: 69 of 90 files reviewed, 6 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/device_state_initialization.cc line 85 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Got it, in that case it seems like having an
all_nodes_are_initializedfunction doesn't make a whole lot of sense as it doesn't tell us anything, so maybe the answer is just to remove that function. Another option would be to keep it, but to have it incorporate what types of nodes don't need to be initialized. Either is fine, just trying to understand (and if convenient, add a check in the code) the postcondition of this function
I tried a couple different approaches, but I ultimately discovered that even if a node has an initializer task impl defined, it may nevertheless not result in any device state. Therefore, I think it's unreasonable to attempt to define this function, and I have removed it entirely.
lib/local-execution/src/local-execution/task_execution.cc line 53 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I think this can be converted to a
binary_merge_disjoint_mapsof the result of a composition of amap_keysandmap_values(it might actually be a good idea to make amap_keys_and_valuesthat is simply a composition of the two, as iirc composingmap_keysandmap_valuesis a reasonable common operation in the codebase.
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
You can also just use a prose name, like
get_tensor for read-only forward tensor. I just don't want the subcase names to explicitly contradict the signature
Done.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Makes sense. I'd say either (1) include loss attrs in the graph, or (2) remove the
loss_attrsargument. Either is fine with me, but having aloss_attrsargument that just isn't used feels weird.
I decided to put it into the graph. Rationale being: for Realm, I'm going to need to transport these anyway. Might as well put as much as I can into one place.
See if you like the new code and be sure to check out dynamic graph too.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 66 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Especially when you're passing back tuples with multiple values of the same type, it's usually preferred to make a dtgen struct so that the fields have names, which makes code that uses this function safer to write and easier to read
Done.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Nope, if it's working for now that's fine, just wanted to raise the issue to make sure it got considered/kept in mind as we go forward. We can always change it if we discover a case where it is necessary.
Done.
lockshaw
left a comment
There was a problem hiding this comment.
@lockshaw reviewed 21 files and all commit messages, made 2 comments, and resolved 6 discussions.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @elliottslaughter).
lib/task-spec/include/task-spec/dynamic_graph/dynamic_node_attrs.dtg.toml line 43 at r7 (raw file):
[[fields]] name = "loss_attrs" type = "std::optional<::FlexFlow::LossAttrs>"
It seems like only one of op_attrs or loss_attrs will ever be defined at any given time. Is this correct and, if so, is there any reason to combine the two fields into a variant of some kind to enforce this property?
lib/utils/include/utils/containers/map_keys_and_values.h line 1 at r7 (raw file):
#ifndef _FLEXFLOW_LIB_UTILS_INCLUDE_UTILS_CONTAINERS_MAP_KEYS_AND_VALUES_H
Every .h file should have a corresponding .cc file. In this case, the contents would be explicit specializations to emulate typechecking generics, e.g., https://github.com/flexflow/flexflow-train/blob/f71c384ce5decd5a30859e39f75f1418cb76d91c/lib/utils/src/utils/containers/binary_merge_maps_with.cc
|
Previously, lockshaw (Colin Unger) wrote…
What would you call it? It's not obvious to me what the union of op and loss attrs should be called. |
|
Previously, elliottslaughter (Elliott Slaughter) wrote…
You could go with |
elliottslaughter
left a comment
There was a problem hiding this comment.
@elliottslaughter made 2 comments.
Reviewable status: 75 of 92 files reviewed, 2 unresolved discussions (waiting on @lockshaw).
lib/task-spec/include/task-spec/dynamic_graph/dynamic_node_attrs.dtg.toml line 43 at r7 (raw file):
Previously, lockshaw (Colin Unger) wrote…
You could go with
TrainingNodeAttrs? In the past I've used theTrainingprefix to denote the presence of loss, as the loss function only makes sense in the context of training the model
Done.
lib/utils/include/utils/containers/map_keys_and_values.h line 1 at r7 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Every
.hfile should have a corresponding.ccfile. In this case, the contents would be explicit specializations to emulate typechecking generics, e.g., https://github.com/flexflow/flexflow-train/blob/f71c384ce5decd5a30859e39f75f1418cb76d91c/lib/utils/src/utils/containers/binary_merge_maps_with.cc
Done.
lockshaw
left a comment
There was a problem hiding this comment.
@lockshaw reviewed 17 files and all commit messages, and resolved 2 discussions.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @elliottslaughter).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1625 +/- ##
==============================
==============================
🚀 New features to boost your workflow:
|
Runs end-to-end tests, including loss function. Do not include cost estimator, that will be a separate PR.
Changes to
local-execution:ComputationGraphInstancenow captures all the necessary state to run a computationInitializedComputationGraphInstancehas been removedComputationGraphInstanceruns all passes, allocates and initializes the graphlocal_task_registry.his now static: no dynamic map is retainedChanges to
task-spec:DynamicValueAttrsnow stores aDynamicTensorAccessor(variant ofGenericTensorAccessorRandGenericTensorAccessorW)labelled_open_kwarg_dataflow_graph_from_dynamic_open_dataflow_graphmake_dynamic_open_dataflow_graph_from_cgperform_update_insertionFollow-up PRs:
This change is