Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign of network construction logic #1128

Open
albertz opened this issue Sep 22, 2022 · 0 comments
Open

Redesign of network construction logic #1128

albertz opened this issue Sep 22, 2022 · 0 comments

Comments

@albertz
Copy link
Member

albertz commented Sep 22, 2022

Related is the flat net construction logic (#992). However, I think the current implementation of the flat net construction logic is too difficult and too messy, using exceptions to fill the queue of layers to construct.

Rather, I think in the layer transform_config_dict function, the get_layer would return always a template layer (maybe the existing _TemplateLayer class), and at the same time make an entry in the layer construction queue. Then in the most outer loop we would get the next layer from the queue, and repeat until the queue is empty. Then we have build up the complete layer graph.

Remember that transform_config_dict is mostly there to resolve the dependencies, and do little preparation of the layer config dict.

Also remember that this also goes into subnetworks (SubnetworkLayer, RecLayer, CondLayer) which makes it more difficult.

This builds up the layer graph. However, the actual shapes etc are all irrelevant at this point. But now that we have the layer graph, we also know in what order we need to start the actual construction.

Note that circular dependencies via transform_config_dict, e.g. due to "prev:..." in the rec layer, are not really a problem at this point. If we see that some layer is already in the construction queue, or the template has been constructed before, we would simply skip over it.

We should try to avoid calling transform_config_dict multiple times, as this might have created bigger sub structures such as SubnetworkRecCell or Subnetwork. But when actually getting to the point of construction the layer, we would replace the template layers by real instances.

It's a bit unclear when to handle get_out_data_from_opts. We could do a second pass, still purely based on the templates, to resolve that. Here we get to the problem of circular dependencies in the rec layer, and we need similar heuristics as we have currently. Simplifying these heuristics for the rec layer subnet template construction is a topic on its own, and maybe the redesign here does not really influence this much (I'm not sure; separate issue: #1129). For the rec layer, we also need to have called all get_out_data_from_opts of its subnet such that we know the shapes of rec state. So this second pass through the network (now forward instead of backward) would call all get_out_data_from_opts and fill in the Data to the template layers.

So far, this is all on template logic, and no actual TF operation has been created or touched. Actually this logic is completely independent of the backend (Theano, TF, PyTorch). So when we do this implementation, maybe we can do it directly backend independent (thus related for PyTorch, #1120).

Then, a third pass would actually construct the layers.

The recent problem with too slow net construction (#1127) lead to this issue, although maybe #1127 would be solved in a different way, as such redesign as proposed here would probably be a larger undertaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants