-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify rec ...Step
variants with on-seq variants?
#81
Comments
Note that |
Maybe we should rename Or just have unified Or have unified |
We have that decorator now, I like the |
Note there is also But using that instead of having a separate function ( Or would we have those two outputs (and also inputs for initial state) anyway always? |
As |
I'm continuing to think about this (see #31 for some earlier discussion), at least for modules, not necessarily functions. The reasoning we have two separate variants for rec modules like
LSTM
andLSTMStep
, orSelfAttention
andSelfAttentionStep
is to have it very explicit whether it operates on a single step or on a sequence. This is again the argument on explicitness.One issue though is that when the user wants to use the same module twice, once on a single step, and once for the whole sequence. From a conceptual point, nothing speaks against this. However, with this strict separation, this is not possible.
Or maybe the user wants to prepare a model which has multiple modes, and in one mode it operates per step, while in another mode it operates on a whole seq. It would be somewhat annoying to have a branch like
self.lstm = (LSTMStep if per_step else LSTM)(...)
and similar for every module. Also, this would have the intention that params matches in both cases. But there would not really be much error checking because nothing really guarantees that params fromLSTM
are compatible to the params ofLSTMStep
. We just would hope that they would match.So, I'm questioning whether it is the best design to have two separate modules for each cases.
Some alternatives, all using a single module:
per_step: bool
or so, which defines the mode (per step vs on sequence).single_step_dim
(via Axis option for per-step execution returnn#847).module.step(...)
function would operate per step. This basically would work already except that it would not get wrapped through the__call__
logic inside an own subnetwork. But we could probably do that. Maybe we can also introduce a generic function decorator which does that, like@module_call def step(...)
. (Such decorator could also be useful in other context. And maybe the user would even not defineforward
but just@module_call def __call__(...)
instead...)iterate(module, inputs..., axis...)
which logically wraps a module in aLoop()
. Some modules could choose to have an own specific fast implementation for this (either here on returnn-common side, although RETURNN itself should probably handle most cases already).(This would not work for all cases though, where the on seq variant maybe has other modes which are not possible per step, like
SelfAttention
or BLSTMs.)__call__
(forward
) at all but just two separate functionsstep
andon_seq
or so, both maybe via such a@module_call
decorator. There could be a defaulton_seq
implementation. Maybe via some mixinRec
class or so.Maybe there are also other further options. Suggestions are welcome.
The text was updated successfully, but these errors were encountered: