This project uses voice as a metric to authorise login based on a Gausian Mixture Model (GMM) model trained on coefficients obtained from Mel Frequency Cepstral coefficients (MFCC)
The voice sample is first cleaned to get rid of unnecessary noise. MFCC is calculated for each sample and the data is utilized by a GMM to cluster the voice samples based on the MFCC values.
The project runs on Django. The web interface prompts for the user to speak. The recorded voice is then matched with the trained GMM model to find the best cluster it fits into. If the match is above a certain threshold value (say 90%) then the user is authorised.
- Noise reduction must be taken care
- Distinction of an actual voice from a recording
- A better understanding of GMM and the clusters made ( visualization of the clusters)
- Enhanced training of the model (with more dataset)