A repos for USTH Digital Signal Processing 2020 Group 3 project. It's quite obvious in the title.
What is digital signal processing
This project harness the power of function mfcc from python_speech_features
and model gmm from sklearn
.
Read more about Mel frequency cepstrum coefficients and Gaussian Mixture model.
This is the datasets. Remember to read AudioInfo.txt in Sunday datasets
before processing.
135 .wav files of each person are 135 lines in transcripts/random_sentences.txt
.
Note that Friday datasets
is just an archive of Sunday datasets
. Please use Sunday datasets
.
Each Sunday_datasets/mix
, Sunday_datasets/low
, Sunday_datasets/high
, I take 100 out of 135 .wav files of each person then I fit these files into a model which will represent that person's unique voice features. The rest 35 .wav files of each person are used to test the system of models.
100 .wav files are be shuffled to show that order of files is not important.
Plan:
- Train models with
Sunday_datasets/mix
folder. - Train models with
Sunday_datasets/low
folder. - Train models with
Sunday_datasets/high
folder. - Then test each system of models on
Sunday_datasets/mix
,Sunday_datasets/low
,Sunday_datasets/high
folders.
Read our report for more details.
To have clear view of folders and files
+--venv/
|
+--transcripts/
| +--usth.txt
| +--random_sentences.txt
|
|--datasets/
| +--mix/
| | +AudioInfo.txt
| |
| +--low/
| | +AudioInfo.txt
| |
| +--high/
| +AudioInfo.txt
|
|--source_code/
| +--Friday_script_models/ # Ignorable
| +--models/ # Where models are saved as binary files
| +--mfcc_gmm_func.py # Script of functions to call mfcc and gmm
| +--requirements.txt # pip install -r requirements.txt
| +--train_models.py
| +--try_models.py
|
+--LICENSE
+--README.md
+--.gitignore