STC-ProtoNet

A PyTorch Implementation of Prototypical Networks for Few-shot Spoken Term Classification with Varying Classes and Examples

This repository presents an extended-ProtoNet approach to address the user-defined spoken term classification task.

Our implementation is based on a PyTorch implementation of an integrated testbed of few-shot classification https://github.com/wyharveychen/CloserLookFewShot.

Prerequisites

python: 3.x
PyTorch: 1.0+
librosa: 0.8

Dataset - Google Speech Commands dataset v2

We use the raw data from the dataset which contains 35 keywords: 'yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go', 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'cat', 'tree', 'house', 'bird', 'visual', 'backward', 'follow', 'forward', 'learn', 'sheila', 'bed', 'dog', 'happy', 'marvin', 'wow'.
We choose 20 keywords to form normal source data: 'yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go', 'cat', 'tree', 'house', 'bird', 'visual', 'backward', 'follow', 'forward', 'learn', 'sheila'; 10 keywords to form target data: 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine'; 5 keywords to form the unknown class: 'bed', 'dog', 'happy', 'marvin', 'wow'.
We split the dataset into different parts to satisfy our experimental setting:

data/
├──speech_commands/
    ├──yes
        ├──00f0204f_nohash_0.wav
        ├──d962e5ac_nohash_1.wav
        ...
    ...
    ├──unknown
        ├──happy
           ├──299c14b1_nohash_2.wav
           ...
filelists/
├──base.json
├──base_unk.json
├──base_sil.json
├──val.json
├──val_unk.json
├──val_sil.json
├──novel.json
├──novel_unk.json
├──novel_sil.json

Train and test

Run python craft_MMCenters.py to generate the hard points.
Run python train.py followed by a series of arguments:

--dataset
--model
--train_n_way
--test_n_way
--train_n_shot
--test_n_shot
--fixed_way
--train_max_way
--train_min_way
--test_max_way
--test_min_way
--max_shot
--min_shot
...

Run python save_features.py to generate embeddings of the testing examples of a training method by choosing the arguments.
Run python test.py to do evaluation.

Cite our paper

If the code and the work is useful to you, please cite it:

@inproceedings{chen21u_interspeech,
  author={Yangbin Chen and Tom Ko and Jianping Wang},
  title={{A Meta-Learning Approach for User-Defined Spoken Term Classification with Varying Classes and Examples}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={4224--4228},
  doi={10.21437/Interspeech.2021-147}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
filelists		filelists
methods		methods
LICENSE		LICENSE
README.md		README.md
backbone.py		backbone.py
centroid.py		centroid.py
configs.py		configs.py
craft_MMCenters.py		craft_MMCenters.py
io_utils.py		io_utils.py
meanvar1_featuredim576_class22.mat		meanvar1_featuredim576_class22.mat
save_features.py		save_features.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

STC-ProtoNet

Prerequisites

Dataset - Google Speech Commands dataset v2

Train and test

Cite our paper

About

Uh oh!

Releases

Packages

Languages

License

Codelegant92/STC-ProtoNet

Folders and files

Latest commit

History

Repository files navigation

STC-ProtoNet

Prerequisites

Dataset - Google Speech Commands dataset v2

Train and test

Cite our paper

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages