Automatic Speech Recognition (ASR) on pytorch. Re-implementation on pytorch of Nvidia's Quartznet.
- Youtokentome tokenization with BPE dropout
- Augmentations: custom and audiomentations
- 3 datasets support: CommonVoice, Librispeech and LJSpeech
- Weights & Biases logging
- CTC beam search interation
- GPU-based MelSpectrogram
dataset | wer using dummy decoder | wer with ctc beam search | wer finetuned dummy decoder | wer finetuned ctc beam search |
---|---|---|---|---|
LJspeech | 36.66 | 34.45 | 28.41 | 27.19 |