Skip to content

oleges1/quartznet-pytorch

Repository files navigation

quartznet-pytorch

Automatic Speech Recognition (ASR) on pytorch. Re-implementation on pytorch of Nvidia's Quartznet.

Features:

  • Youtokentome tokenization with BPE dropout
  • Augmentations: custom and audiomentations
  • 3 datasets support: CommonVoice, Librispeech and LJSpeech
  • Weights & Biases logging
  • CTC beam search interation
  • GPU-based MelSpectrogram

Trained models:

dataset wer using dummy decoder wer with ctc beam search wer finetuned dummy decoder wer finetuned ctc beam search
LJspeech 36.66 34.45 28.41 27.19

W&B Logs: