Multi-Speaker-Identification

Define your product mission

Speaker recognition is defined as identifying a person based on his/her voice characteristics. This is useful in applications for authentication to identify authorized users, for example, enable access control using voice of an individual. Most of times there are scenarios where multiple speakers speak simultaneously, single speaker identification systems fail to handle such audio signals. Therefore, it is significant to make the speaker recognition systems to handle multi-speaker audoi files and classify them.

The user stories

As a user, I want to identify a person based on his/her voice characteristics.
As a user, I want to identify multi-speaker and classify them.
As a user, I want to identify the particular speaker within multi-speaker evironment.
As a user, I don't want strangers (both intentionally and unintentionally) activating and giving voice commands to my personal smart device.
As a user who lives in a multiperson household, I want my smart device to play music from my personal playlist and not my roommates' playlist and vice versa.

Define MVP

Tensorflow Platform.
A python development environment.
A device that can record voices.
Software that can process recorded audio and identify when someone is speaking.
Software that can recognize who is speaking based on what it has seen previously.

Technologies to evaluate

Tensorflow

This is open source machine learning platform for developing and training machine learning models. We chose this as a potential option because many approaches we saw in literature utilized some form of machine learning in their approach and we want to try exploring these types of techniques as well.

Some things to consider:

Most of the existing voice recognition is implemented on the server, which will bring the following two problems:

When the network is poor, it will cause a large delay and bring poor user experience.
When there is a large visit volume, a large amount of server resources will be occupied.

To solve the above two problems, we choose to implement the function of voice recognition on the client. There is a way to use machine learning to identify human voices. The framework used is Google's tensorflowLite framework, which is as compact as its name shows. While ensuring the accuracy, the size of the frame is only about 300KB, and the model produced after compression is one-fourth that of the TensorFlow model

The usage scenarios of voice recognition are as follows:

Audio and video quality analysis: identify whether there are phenomena such as human voice, silent call, howling, background noise, etc.
Identify specific voice: Identify whether it is a specific voice, used for voice unlock, remote identity authentication, etc.
Identify emotions: It is used to judge the speaker's emotions and states. The combination of voice print content and emotional information can effectively prevent voice print counterfeiting and physical coercion.
Gender recognition: Male voice or female voice can be identified.

The algorithm

In order for a computer to recognize audio data, we must first transfer the audio data from the time domain to the frequency domain, and then extract the features. Mel Frequency Cepstral Coefficients (MFCC) is widely used for this step.

Project Structure

The goal for the final product will be a mobile application that utilizes a trained model to predict the person who is speaking based on their recorded speech. This will be split up into multiple subtasks listed below:

Develop a mobile application that can record the user's voice
Build and train a model
Integrate model into mobile development environment
Debug and combine application with model
Evaluate and test

Application Structure

Interface

Changelog

Sprint 1

Created github project and initial directories for progress tracking
Defined MVP and user stories
Started exploring SincNet and Tensorflow for model creation, training, and evaluation

Sprint 2

Implemented examples from SincNet and Tensorflow in python

Defined architecture using tensorflow as platform of choice

Sprint 3

Decided to implement project as a mobile application using Android Studio
Started building voice recorder sub-application for capturing user input
Trained tensorflow model in python and converted to tensorflow lite model
Created test sub-application to verify and evaluate model

Sprint 4

Completed working voice recorder with playback

Started module for audio processing
Started integration of sub-applications

Sprint 5

Trained new model using user voice samples

Completed module to perform FFT on raw audio and format result for model prediction

Initial integration of all sub-applications into one application

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.ipynb_checkpoints		.ipynb_checkpoints
16000_pcm_speeches		16000_pcm_speeches
HelloWorld		HelloWorld
Recorder		Recorder
android_model		android_model
examples		examples
images		images
outputs/SincNet		outputs/SincNet
resources		resources
testModel2		testModel2
tf_model		tf_model
user-recordings		user-recordings
README.md		README.md
block_diagram.png		block_diagram.png
cnn_id.ipynb		cnn_id.ipynb
cnn_train.ipynb		cnn_train.ipynb
ffts_Hubert.txt		ffts_Hubert.txt
ffts_Yuhan.txt		ffts_Yuhan.txt
load_model.py		load_model.py
model.h5		model.h5
speakerid_app.PNG		speakerid_app.PNG
speech.h5		speech.h5
speech.tflite		speech.tflite
team6_poster.pdf		team6_poster.pdf
team6_video.mp4		team6_video.mp4
user_model.h5		user_model.h5
user_speech.h5		user_speech.h5
user_speech.tflite		user_speech.tflite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Speaker-Identification

Define your product mission

The user stories

Define MVP

Technologies to evaluate

Project Structure

Application Structure

Interface

Changelog

Sprint 1

Sprint 2

Sprint 3

Sprint 4

Sprint 5

Reference

About

Releases

Packages

Contributors 2

Languages

EC601TeamProject06/Multi-Speaker-Identification

Folders and files

Latest commit

History

Repository files navigation

Multi-Speaker-Identification

Define your product mission

The user stories

Define MVP

Technologies to evaluate

Project Structure

Application Structure

Interface

Changelog

Sprint 1

Sprint 2

Sprint 3

Sprint 4

Sprint 5

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages