Name		Name	Last commit message	Last commit date
parent directory ..
data		data
deepspeech_pytorch		deepspeech_pytorch
experiment		experiment
kubernetes		kubernetes
tests		tests
.gitignore		.gitignore
README.md		README.md
benchmark.py		benchmark.py
labels.json		labels.json
noise_inject.py		noise_inject.py
prune_test.py		prune_test.py
prune_train.py		prune_train.py
random_ticket_train.py		random_ticket_train.py
requirements.txt		requirements.txt
search_lm_params.py		search_lm_params.py
select_lm_params.py		select_lm_params.py
server.py		server.py
setup.py		setup.py
test.py		test.py
train.py		train.py
train_subnetwork.py		train_subnetwork.py
transcribe.py		transcribe.py
transfer_train.py		transfer_train.py

README.md

Audio lottery on CNN+LSTM

Our implementation is based on the DeepSpeech2 PyTorch implementation, folked at this commit.

Installation

The instructions are obtained from the original repo.

Install PyTorch if you haven't already.

Install this fork for Warp-CTC bindings:

git clone https://github.com/SeanNaren/warp-ctc.git
cd warp-ctc; mkdir build; cd build; cmake ..; make
export CUDA_HOME="/usr/local/cuda"
cd ../pytorch_binding && python setup.py install

Install NVIDIA apex:

git clone --recursive https://github.com/NVIDIA/apex.git
cd apex && pip install .

If you want decoding to support beam search with an optional language model, install ctcdecode:

git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .

Finally clone this repo and run this within the repo:

pip install -r requirements.txt
pip install -e . # Dev install

Training

Datasets

Pre-process LibriSpeech dataset.

cd data/
python librispeech.py

Training a Model

Run

python prune_train.py +experiment=librispeech_prune

Distributed training not tested.

SpecAug and Noise injection functions are the same as the base repo.

Testing/Inference

Inference with 3-gram LM

Edit the config files experiment/librispeech_prune_eval_clean.yaml or experiment/librispeech_prune_eval_other.yaml.

Then, run

python prune_test.py +experiment=librispeech_prune_eval_clean

LibriSpeech Performance

Model	Remaining weight (%)	test-clean/test-other n-gram WER (%)
Dense	100.0%	7.9 / 21.0
Extreme	21.0%	7.9 / 20.5
Best	51.8%	7.1 / 19.2

3-gram grapheme LM: here

Reference

@inproceedings{ding2021audio,
  title={Audio lottery: Speech recognition made ultra-lightweight, noise-robust, and transferable},
  author={Ding, Shaojin and Chen, Tianlong and Wang, Zhangyang},
  booktitle={International Conference on Learning Representations},
  year={2021}
}
@inproceedings{amodei2016deep,
  title={Deep speech 2: End-to-end speech recognition in english and mandarin},
  author={Amodei, Dario and Ananthanarayanan, Sundaram and Anubhai, Rishita and Bai, Jingliang and Battenberg, Eric and Case, Carl and Casper, Jared and Catanzaro, Bryan and Cheng, Qiang and Chen, Guoliang and others},
  booktitle={International conference on machine learning},
  pages={173--182},
  year={2016},
  organization={PMLR}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNN_LSTM

CNN_LSTM

README.md

Audio lottery on CNN+LSTM

Installation

Training

Datasets

Training a Model

Testing/Inference

Inference with 3-gram LM

LibriSpeech Performance

Reference

Files

CNN_LSTM

Directory actions

More options

Directory actions

More options

Latest commit

History

CNN_LSTM

Folders and files

parent directory

README.md

Audio lottery on CNN+LSTM

Installation

Training

Datasets

Training a Model

Testing/Inference

Inference with 3-gram LM

LibriSpeech Performance

Reference