Description
I'm continuing to think about this (see #31 for some earlier discussion), at least for modules, not necessarily functions. The reasoning we have two separate variants for rec modules like LSTM
and LSTMStep
, or SelfAttention
and SelfAttentionStep
is to have it very explicit whether it operates on a single step or on a sequence. This is again the argument on explicitness.
One issue though is that when the user wants to use the same module twice, once on a single step, and once for the whole sequence. From a conceptual point, nothing speaks against this. However, with this strict separation, this is not possible.
Or maybe the user wants to prepare a model which has multiple modes, and in one mode it operates per step, while in another mode it operates on a whole seq. It would be somewhat annoying to have a branch like self.lstm = (LSTMStep if per_step else LSTM)(...)
and similar for every module. Also, this would have the intention that params matches in both cases. But there would not really be much error checking because nothing really guarantees that params from LSTM
are compatible to the params of LSTMStep
. We just would hope that they would match.
So, I'm questioning whether it is the best design to have two separate modules for each cases.
Some alternatives, all using a single module:
- Have an option to the module, like
per_step: bool
or so, which defines the mode (per step vs on sequence). - Use the special
single_step_dim
(via Axis option for per-step execution returnn#847). - The module call itself would do one variant (maybe on sequence?) and a separate
module.step(...)
function would operate per step. This basically would work already except that it would not get wrapped through the__call__
logic inside an own subnetwork. But we could probably do that. Maybe we can also introduce a generic function decorator which does that, like@module_call def step(...)
. (Such decorator could also be useful in other context. And maybe the user would even not defineforward
but just@module_call def __call__(...)
instead...) - Maybe we would have the generic convention that we define all rec modules always per step, and then have a generic functional wrapper like
iterate(module, inputs..., axis...)
which logically wraps a module in aLoop()
. Some modules could choose to have an own specific fast implementation for this (either here on returnn-common side, although RETURNN itself should probably handle most cases already).
(This would not work for all cases though, where the on seq variant maybe has other modes which are not possible per step, likeSelfAttention
or BLSTMs.) - Extending from the last two options, maybe no
__call__
(forward
) at all but just two separate functionsstep
andon_seq
or so, both maybe via such a@module_call
decorator. There could be a defaulton_seq
implementation. Maybe via some mixinRec
class or so.
Maybe there are also other further options. Suggestions are welcome.