Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conformer misses relative pos encoding #132

Closed
albertz opened this issue Apr 22, 2022 · 6 comments
Closed

Conformer misses relative pos encoding #132

albertz opened this issue Apr 22, 2022 · 6 comments
Assignees
Milestone

Comments

@albertz
Copy link
Member

albertz commented Apr 22, 2022

There is self attention without any pos encoding.

Related is Transformer rel pos #74.

There probably should be a nn.RelPosSelfAttention, similar as nn.SelfAttention but with rel pos encoding. It should use the Transformer XL rel pos encoding. (We can separately test other rel pos encodings later.)

@albertz
Copy link
Member Author

albertz commented Oct 12, 2022

See here for an example: https://github.com/espnet/espnet/blob/4138010fb66ad27a43e8bee48a4932829a0847ae/espnet2/asr/encoder/conformer_encoder.py#L132
(Up-to-date 2022-10-17.)

RelPositionMultiHeadedAttention

RelPositionalEncoding

The RelPositionalEncoding is then also part of the frontend (e.g. Conv2dSubsampling6) (also see #219).
Specifically, at the end of the frontend, there is another Linear to the model dim, and then the pos enc (here).

RelPositionalEncoding must be used together with RelPositionMultiHeadedAttention.

@albertz
Copy link
Member Author

albertz commented Oct 17, 2022

Note that all our existing Conformer recipes in RETURNN currently use the RETURNN RelativePositionalEncodingLayer (example). I.e. that is not the Transformer XL variant, as it is standard now. We probably should directly implement the Transformer XL variant now.

But for direct comparisons to old setups, we might also need to implement the old variant.

@albertz
Copy link
Member Author

albertz commented Oct 17, 2022

Interestingly, the Fairseq RelPositionalEncoding looks almost identical to the ESPnet implementation? It looks like the code was copied. But there is no mention at all about this. Fairseq is MIT, ESPnet is Apache licence. (@sravyapopuri388 @sw005320 @pengchengguo maybe you know the history?)

@albertz
Copy link
Member Author

albertz commented Oct 17, 2022

@pengchengguo
Copy link

Hi @albertz,

I think the Fairseq RelPositionalEncoding is based on ESPnet as they distinguish different RPE/Attention types between Fairseq and ESPnet here: https://github.com/facebookresearch/fairseq/blob/b7b7928065ec90bef8f9a489b0128a4dce560d57/fairseq/modules/conformer_layer.py#L187.

@sravyapopuri388
Copy link

@albertz The fairseq implementation is based on the Espnet one for relative positional encoding. Seems like we are indeed missing some citations. Will make sure to add them soon. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants