Higher-level encoder decoder interfaces for transducer, attention, LM, ILM, etc

The encoder interface is quite trivial, basically just any `[LayerRef] -> LayerRef` function, although the interface also should imply the tensor format {B,T,D} or so.

The idea was to have a generic interface for the decoder which allows to define both a transducer (in its most generic form, including RNN-T, RNA, etc) either time-sync or alignment-sync, and a standard attention-based label-sync decoder.

The interface should allow for easy integration of an external LM, and also allow for integration of ILM estimation and subtraction.

A current draft is [here](https://github.com/rwth-i6/returnn_common/blob/main/nn/decoder/base.py).

---

We should implement some attention-based encoder-decoder and some transducer example using external LM + ILM estimation and subtraction as example.

`Transformer` should then also be refactored to make use of this interface.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Higher-level encoder decoder interfaces for transducer, attention, LM, ILM, etc #49

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Higher-level encoder decoder interfaces for transducer, attention, LM, ILM, etc #49

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions