Open
Description
Paper
Link: https://arxiv.org/pdf/2005.09629v1.pdf
Year: 2020
Summary
- adapt and improve noisy student training for automatic speech
recognition (noisy student training is an iterative self-training method that leverages augmentation to improve network performance)
Methods
- employ (adaptive) SpecAugment, an augmentation method for ASR that directly acts on the spectrogram of the input audio, for noisy student training
- use shallow fusion with a language model on the teacher network to generate better transcripts for the student network to train on
- propose a normalized filtering score for transcripts generated by teacher networks given as a function of the fusion score and number of tokens
- use a variant of sub-modular sampling to weigh the utterance-transcript pairs generated by the teacher network to balance the token statistics of the dataset to be passed on to the student