-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely slow network construction (or maybe infinite loop, unclear) #1127
Comments
To get an impression of the (maybe endless, or very long) loop, in print(" ", self, "-" * len(ConstructCtx.layers), name) You get some output like:
Repeating a lot... |
Maybe there is actually some bug in the config, and some exception occurs, however, due to the template construction logic, it keeps trying. |
Maybe the flat net construction (#992) would also solve this. |
The masked computation layer is not involved. I simplified it further (updated config is in the main post) and remove that part. Now the debug output:
|
I was thinking about a complete redesign of the net construction (#1128). But the main problem here is actually in the rec layer subnet template construction heuristic (for resolving circular dependencies). This is somewhat independent. Separate issue on redesigning the rec layer subnet template construction heuristic: #1129 |
My assumption is that some (actual correct) exception occurs in some layer (e.g. We should try to detect this somehow. But how? |
There is a bug in the original code: batch_dims = data.batch_dims_ordered(data_spatial_dim) Should have been: batch_dims = data.batch_dims_ordered((data_spatial_dim, data.feature_dim)) But this is maybe then also a problem for returnn-common, that there was no real error at that point. |
This caused the slow net construction problem in the first place: rwth-i6/returnn#1127
At the same time, memory increases all the time (slowly), so after some minutes, it would crash due to out-of-memory.
This is via a transducer search config generated via returnn-common, this config, but the problem also occurs when I dump it as pure RETURNN net dict, as in this config. Simplified further, this config.
In the log you see this as some of the last messages:
When looking at the stack trace in a debugger, rec template construction is involved.
The text was updated successfully, but these errors were encountered: