Skip to content

Higher-level encoder decoder interfaces for transducer, attention, LM, ILM, etc #49

@albertz

Description

@albertz

The encoder interface is quite trivial, basically just any [LayerRef] -> LayerRef function, although the interface also should imply the tensor format {B,T,D} or so.

The idea was to have a generic interface for the decoder which allows to define both a transducer (in its most generic form, including RNN-T, RNA, etc) either time-sync or alignment-sync, and a standard attention-based label-sync decoder.

The interface should allow for easy integration of an external LM, and also allow for integration of ILM estimation and subtraction.

A current draft is here.


We should implement some attention-based encoder-decoder and some transducer example using external LM + ILM estimation and subtraction as example.

Transformer should then also be refactored to make use of this interface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions