Open
Description
Paper
Link: https://arxiv.org/pdf/1904.11660.pdf
Year: 2020
Summary
- replacing the sinusoidal positional embedding for transformers with convolutionally learned input representations
- fixed learning rate of 1.0 and no warmup steps
Methods
- 2 parts
- learning local relationships within a small context with convolutional layers
- learning global sequential structure of the input with transformer layers
- use conv to learn an acoustic language model over the bag of discovered acoustic units as it goes deeper in the encoder
code: github.com/pytorch/fairseq/tree/master/examples/speech recognition