This is a recipe for extracting acoustic word embeddings for a subset of the Switchboard corpus. The models are described in detail in Kamper et al., 2015:
- H. Kamper, W. Wang, and K. Livescu, "Deep convolutional acoustic word embeddings using word-pair side information," in Proc. ICASSP, 2016.
Please cite this paper if you use this code. All the neural networks are implemented in the package couscous.
-
Install all dependencies (below).
-
Clone couscous into the appropriate directory:
mkdir ../src git clone https://github.com/kamperh/couscous.git ../src/couscous
-
Run the steps in kaldi_features/run.sh.
-
Run the steps in cnn_wordembeds/readme.md.
-
If you run the steps correctly above, then if you execute the following:
cd cnn_wordembeds/ ./apply_layers.py models/siamese_triplets_cnn.1/ test ./eval_samediff.py \ models/siamese_triplets_cnn.1/swbd.test.layer_-1.npz
Then the evaluation should show the following output:
Average precision: 0.537404372048 Precision-recall breakeven: 0.542724052097
The average precision (AP) of 0.537 is used for the number reported in Table 1, row 9 of Kamper et al., 2015.