You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Related is the flat net construction logic (#992). However, I think the current implementation of the flat net construction logic is too difficult and too messy, using exceptions to fill the queue of layers to construct.
Rather, I think in the layer transform_config_dict function, the get_layer would return always a template layer (maybe the existing _TemplateLayer class), and at the same time make an entry in the layer construction queue. Then in the most outer loop we would get the next layer from the queue, and repeat until the queue is empty. Then we have build up the complete layer graph.
Remember that transform_config_dict is mostly there to resolve the dependencies, and do little preparation of the layer config dict.
Also remember that this also goes into subnetworks (SubnetworkLayer, RecLayer, CondLayer) which makes it more difficult.
This builds up the layer graph. However, the actual shapes etc are all irrelevant at this point. But now that we have the layer graph, we also know in what order we need to start the actual construction.
Note that circular dependencies via transform_config_dict, e.g. due to "prev:..." in the rec layer, are not really a problem at this point. If we see that some layer is already in the construction queue, or the template has been constructed before, we would simply skip over it.
We should try to avoid calling transform_config_dict multiple times, as this might have created bigger sub structures such as SubnetworkRecCell or Subnetwork. But when actually getting to the point of construction the layer, we would replace the template layers by real instances.
It's a bit unclear when to handle get_out_data_from_opts. We could do a second pass, still purely based on the templates, to resolve that. Here we get to the problem of circular dependencies in the rec layer, and we need similar heuristics as we have currently. Simplifying these heuristics for the rec layer subnet template construction is a topic on its own, and maybe the redesign here does not really influence this much (I'm not sure; separate issue: #1129). For the rec layer, we also need to have called all get_out_data_from_opts of its subnet such that we know the shapes of rec state. So this second pass through the network (now forward instead of backward) would call all get_out_data_from_opts and fill in the Data to the template layers.
So far, this is all on template logic, and no actual TF operation has been created or touched. Actually this logic is completely independent of the backend (Theano, TF, PyTorch). So when we do this implementation, maybe we can do it directly backend independent (thus related for PyTorch, #1120).
Then, a third pass would actually construct the layers.
The recent problem with too slow net construction (#1127) lead to this issue, although maybe #1127 would be solved in a different way, as such redesign as proposed here would probably be a larger undertaking.
The text was updated successfully, but these errors were encountered:
Related is the flat net construction logic (#992). However, I think the current implementation of the flat net construction logic is too difficult and too messy, using exceptions to fill the queue of layers to construct.
Rather, I think in the layer
transform_config_dict
function, theget_layer
would return always a template layer (maybe the existing_TemplateLayer
class), and at the same time make an entry in the layer construction queue. Then in the most outer loop we would get the next layer from the queue, and repeat until the queue is empty. Then we have build up the complete layer graph.Remember that
transform_config_dict
is mostly there to resolve the dependencies, and do little preparation of the layer config dict.Also remember that this also goes into subnetworks (
SubnetworkLayer
,RecLayer
,CondLayer
) which makes it more difficult.This builds up the layer graph. However, the actual shapes etc are all irrelevant at this point. But now that we have the layer graph, we also know in what order we need to start the actual construction.
Note that circular dependencies via
transform_config_dict
, e.g. due to"prev:..."
in the rec layer, are not really a problem at this point. If we see that some layer is already in the construction queue, or the template has been constructed before, we would simply skip over it.We should try to avoid calling
transform_config_dict
multiple times, as this might have created bigger sub structures such asSubnetworkRecCell
orSubnetwork
. But when actually getting to the point of construction the layer, we would replace the template layers by real instances.It's a bit unclear when to handle
get_out_data_from_opts
. We could do a second pass, still purely based on the templates, to resolve that. Here we get to the problem of circular dependencies in the rec layer, and we need similar heuristics as we have currently. Simplifying these heuristics for the rec layer subnet template construction is a topic on its own, and maybe the redesign here does not really influence this much (I'm not sure; separate issue: #1129). For the rec layer, we also need to have called allget_out_data_from_opts
of its subnet such that we know the shapes of rec state. So this second pass through the network (now forward instead of backward) would call allget_out_data_from_opts
and fill in theData
to the template layers.So far, this is all on template logic, and no actual TF operation has been created or touched. Actually this logic is completely independent of the backend (Theano, TF, PyTorch). So when we do this implementation, maybe we can do it directly backend independent (thus related for PyTorch, #1120).
Then, a third pass would actually construct the layers.
The recent problem with too slow net construction (#1127) lead to this issue, although maybe #1127 would be solved in a different way, as such redesign as proposed here would probably be a larger undertaking.
The text was updated successfully, but these errors were encountered: