Skip to content

SelfAttention misses Linear after attention, wrong for Conformer, Transformer #221

Closed
@albertz

Description

@albertz

image

There is a linear projection after the attention.

ESPNet MultiHeadedAttention has it.
PyTorch torch.nn.MultiheadAttention does not have it.
Keras tf.keras.layers.MultiHeadAttention has it.
torchaudio.models.wav2vec2.components.SelfAttention has it.
Fairseq MultiheadAttention has it.

Our nn.GenericSelfAttention (and thus nn.SelfAttention) does not have it.
The RETURNN SelfAttentionLayer also does not have it.

But then we also don't have it in ConformerEncoderLayer, so it's clearly missing.
Also we don't have it in our Transformer, so it is missing there as well.

So, should we change nn.GenericSelfAttention? Always include it? Or optionally include it? Make it a required argument that there is no confusion about it, like out_dim: Optional[nn.Dim] (without default). In case the user sets None, no linear transformation at the end, otherwise there is.

If we don't change nn.GenericSelfAttention, we must fix the Transformer and Conformer.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions