Our implementation is based on the DeepSpeech2 PyTorch implementation, folked at this commit.
The instructions are obtained from the original repo.
Install PyTorch if you haven't already.
Install this fork for Warp-CTC bindings:
git clone https://github.com/SeanNaren/warp-ctc.git
cd warp-ctc; mkdir build; cd build; cmake ..; make
export CUDA_HOME="/usr/local/cuda"
cd ../pytorch_binding && python setup.py install
Install NVIDIA apex:
git clone --recursive https://github.com/NVIDIA/apex.git
cd apex && pip install .
If you want decoding to support beam search with an optional language model, install ctcdecode:
git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .
Finally clone this repo and run this within the repo:
pip install -r requirements.txt
pip install -e . # Dev install
Pre-process LibriSpeech dataset.
cd data/
python librispeech.py
Run
python prune_train.py +experiment=librispeech_prune
Distributed training not tested.
SpecAug and Noise injection functions are the same as the base repo.
Edit the config files experiment/librispeech_prune_eval_clean.yaml
or experiment/librispeech_prune_eval_other.yaml
.
Then, run
python prune_test.py +experiment=librispeech_prune_eval_clean
Model | Remaining weight (%) | test-clean/test-other n-gram WER (%) |
---|---|---|
Dense | 100.0% | 7.9 / 21.0 |
Extreme | 21.0% | 7.9 / 20.5 |
Best | 51.8% | 7.1 / 19.2 |
3-gram grapheme LM: here
@inproceedings{ding2021audio,
title={Audio lottery: Speech recognition made ultra-lightweight, noise-robust, and transferable},
author={Ding, Shaojin and Chen, Tianlong and Wang, Zhangyang},
booktitle={International Conference on Learning Representations},
year={2021}
}
@inproceedings{amodei2016deep,
title={Deep speech 2: End-to-end speech recognition in english and mandarin},
author={Amodei, Dario and Ananthanarayanan, Sundaram and Anubhai, Rishita and Bai, Jingliang and Battenberg, Eric and Case, Carl and Casper, Jared and Catanzaro, Bryan and Cheng, Qiang and Chen, Guoliang and others},
booktitle={International conference on machine learning},
pages={173--182},
year={2016},
organization={PMLR}
}