Code for the paper Learning Efficient Representations for Keyword Spotting with Triplet Loss
by Roman Vygon(roman.vygon@gmail.com) and Nikolay Mikhaylovskiy(nickm@ntr.ai).
To train a triplet encoder run:
python TripletEncoder.py --name=test_encoder --manifest=MANIFEST --model=MODEL
To train a no-triplet model, or to train a classifier based on the triplet encoder run:
python TripletClassifier.py --name=test_classifier --manifest=MANIFEST --model=MODEL
You can use --help
to view the description of arguments.
Training was performed on a single Tesla K80 12GB.
Model | Batch Size | VRAM |
---|---|---|
Res15 | 35*4 | 11GB |
Res8 | 35*10 | 4GB |
To test a triplet encoder run:
python infer_train.py --name=test_encoder --manifest=MANIFEST --model=MODEL --enc_step=ENCODER_TRAINING_STEP
To test a classifier-head model run:
python infer_notl.py --name=test_encoder --cl_name=test_classifier --manifest=MANIFEST --model=MODEL --enc_step=ENCODER_TRAINING_STEP --cl_step=CLASSIFIER_TRAINING_STEP
You can use --help
to view the description of arguments.
This project is licensed under the MIT License - see the LICENSE.md file for details.
You can download the test-clean-360 here: http://www.openslr.org/12. If the site doesn't load see this code for direct links to the files.
Use this notebook to download and prepare the Google Speech Commands dataset.
Data manifests, librispeech alignments and distance measures can be found here. You'll need to update the manifests.json
file with the dataset path. You can convert LibriWords manifests with convert_path_prefix.ipynb
The files sadly went missing, I'll try to recover them, if anyone had a chance to download them please contact me.