Implementation of QuartzNet ASR model in PyTorch
To launch and inference in nvidia-docker container follow these instructions:
- Install nvidia-docker
- Run
./docker-build.sh
To launch training follow these instructions:
- Set preferred configurations in
config/config.yaml. In particular you might want to setdataset: it can be eithernumbersorlibrispeech - In
docker-run.shchangememory,memory-swap,shm-size,cpuset-cpus,gpus, and datavolumeto desired values - Set WANDB_API_KEY environment variable to your wandb key
- Run
./docker-train.sh
All outputs including models will be saved to outputs dir.
To launch inference run the following command:
./docker-inference.sh model_path device bpe_path input_path
Where:
model_pathis a path to .pth model filedeviceis the device to inference on: either 'cpu', 'cuda' or cuda device numberbpe_pathis a path to yttm bpe model .model fileinput_pathis a path to input audio file to parse text from
Predicted output will be printed to stdout and saved into a file in inferenced folder
My currently best model trained on librispeech and the respective config can be downloaded here.
It is not very good however because I only trained it to ~59 WER on librispeech