AI Research into Spoken Keyword Spotting. Collection of PyTorch implementations of Spoken Keyword Spotting presented in research papers. Model architectures will not always mirror the ones proposed in the papers, but I have chosen to focus on getting the core ideas covered instead of getting every layer configuration right.
Speech Commands DataSet is a set of one-second .wav audio files, each containing a single spoken English word. These words are from a small set of commands, and are spoken by a variety of different speakers. The audio files are organized into folders based on the word they contain, and this dataset is designed to help train simple machine learning models.
We use the Google Speech Commands Dataset (GSC) as the training data. By running the script, you can download the training data:
cd <ROOT>/dataset
python process_speech_commands_data.py \
--data_root=<absolute path to where the data should be stored> \
--data_version=<either 1 or 2, indicating version of the dataset>\
--class_split=<either "all" or "sub", indicates whether all 30/35 classes should be used, or the 10+2 split should be used> \
--rebalance \
--log
Temporal Convolution for Real-time Keyword Spotting on Mobile Devices [Paper] [Code]
Broadcasted Residual Learning for Efficient Keyword Spotting [Paper] [Code]
MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition [Paper] [Code]
ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting [Paper] [Code]
Keyword transformer: A self-attention model for keyword spotting [Paper] [Code]