Skip to content

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations #32

Open
@jinglescode

Description

@jinglescode

Paper

Link: https://arxiv.org/pdf/2006.11477v2.pdf
Year: 2020

Summary

a framework for self-supervised learning of representations from raw audio data, wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations

Methods

image

  1. Feature encoder. composed of a multi-layer convolutional feature encoder, which takes
    as input raw audio and outputs latent speech representations
  • encodes speech audio via a multi-layer convolutional neural network and then masks spans of the resulting latent speech representations
  1. Contextualized representations with Transformers. then fed to a Transformer to build representations capturing information from the entire sequence
  • latent representations are fed to a Transformer network to build contextualized representations and the model is trained via a contrastive task where the true latent is to be distinguished from distractors
  • consists of several blocks containing a temporal convolution followed by layer normalization and a GELU activation function
  • Instead of fixed positional embeddings which encode absolute positional information, we use a convolutional layer with kernel size 128 and 16 groups which acts as relative positional embedding
  • add the output of the convolution followed by a GELU to the inputs and then apply layer normalization
  1. Quantization module. learn discrete linguistic units via a gumbel softmax to represent the latent representations in the contrastive task
  • Gumbel softmax enables choosing discrete codebook entries in a fully differentiable way

Results

  • ultra-low resource speech recognition: when using only 10 minutes of labeled data, our approach achieves word error rate (WER) 5.2/8.6 on the clean/noisy test sets of Librispeech

Code

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions