Voice based gender recognition using:
- The Free ST American English Corpus dataset (SLR45)
- Mel-frequency cepstrum coefficients (MFCC)
- Gaussian mixture models (GMM)
The The Free ST American English Corpus dataset (SLR45) can be found on SLR45. It is a free American English corpus by Surfingtech, containing utterances from 10 speakers (5 females and 5 males). Each speaker has about 350 utterances.
The Mel-Frequency Cepstrum Coefficients (MFCC) are used here, since they deliver the best results in speaker verification. MFCCs are commonly derived as follows:
- Take the Fourier transform of (a windowed excerpt of) a signal.
- Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.
- Take the logs of the powers at each of the mel frequencies.
- Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
- The MFCCs are the amplitudes of the resulting spectrum.
According to D. Reynolds in Gaussian_Mixture_Models: A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. GMMs are commonly used as a parametric model of the probability distribution of continuous measurements or features in a biometric system, such as vocal-tract related spectral features in a speaker recognition system. GMM parameters are estimated from training data using the iterative Expectation-Maximization (EM) algorithm or Maximum A Posteriori(MAP) estimation from a well-trained prior model.
-
For a more detailed explanation, please refer to this blog that I have written.
This script require the follwing modules/libraries:
Libs can be installed as follows:
pip install -r requirements.txt
- Run.py : This is the main script and it will run the whole cycle (Data management > Models training > Genders identification)
- DataManager.py: This script is responsible for the extracting and strcturing the data.
- ModelsTrainer.py:This script is responsible for training the Gaussian Mixture Models (GMM).
- GenderIdentifier.py:This script is responsible for Testing the system by identifying the genders of the testing set.
- FeaturesExtractor.py:This script is responsible for extracting the MFCC features from the .wav files.
- The system results in a 95% accuracy of gender detection.
- The code can be further optimized using multi-threading, acceleration libs and multi-processing.
- The accuracy can be further improved using GMM normalization aka a UBM-GMM system.