Skip to content

Unify rec ...Step variants with on-seq variants? #81

Closed
@albertz

Description

@albertz

I'm continuing to think about this (see #31 for some earlier discussion), at least for modules, not necessarily functions. The reasoning we have two separate variants for rec modules like LSTM and LSTMStep, or SelfAttention and SelfAttentionStep is to have it very explicit whether it operates on a single step or on a sequence. This is again the argument on explicitness.

One issue though is that when the user wants to use the same module twice, once on a single step, and once for the whole sequence. From a conceptual point, nothing speaks against this. However, with this strict separation, this is not possible.

Or maybe the user wants to prepare a model which has multiple modes, and in one mode it operates per step, while in another mode it operates on a whole seq. It would be somewhat annoying to have a branch like self.lstm = (LSTMStep if per_step else LSTM)(...) and similar for every module. Also, this would have the intention that params matches in both cases. But there would not really be much error checking because nothing really guarantees that params from LSTM are compatible to the params of LSTMStep. We just would hope that they would match.

So, I'm questioning whether it is the best design to have two separate modules for each cases.

Some alternatives, all using a single module:

  • Have an option to the module, like per_step: bool or so, which defines the mode (per step vs on sequence).
  • Use the special single_step_dim (via Axis option for per-step execution returnn#847).
  • The module call itself would do one variant (maybe on sequence?) and a separate module.step(...) function would operate per step. This basically would work already except that it would not get wrapped through the __call__ logic inside an own subnetwork. But we could probably do that. Maybe we can also introduce a generic function decorator which does that, like @module_call def step(...). (Such decorator could also be useful in other context. And maybe the user would even not define forward but just @module_call def __call__(...) instead...)
  • Maybe we would have the generic convention that we define all rec modules always per step, and then have a generic functional wrapper like iterate(module, inputs..., axis...) which logically wraps a module in a Loop(). Some modules could choose to have an own specific fast implementation for this (either here on returnn-common side, although RETURNN itself should probably handle most cases already).
    (This would not work for all cases though, where the on seq variant maybe has other modes which are not possible per step, like SelfAttention or BLSTMs.)
  • Extending from the last two options, maybe no __call__ (forward) at all but just two separate functions step and on_seq or so, both maybe via such a @module_call decorator. There could be a default on_seq implementation. Maybe via some mixin Rec class or so.

Maybe there are also other further options. Suggestions are welcome.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions