Voice classifier
- Voice classifier is Artificial Neural Network based classifier who's goal is to classify different voices or sound against provided labeled training dataset which consist of doctored RAW audio file.
- For training all RAW Audio files must be of same size preferably atleast 30 second duration
- RAW audio training data must not contain any silence
- This model is trained and tested in 8-bit unsigned PCM RAW audio format.
- RAW audio format is sampled at 44100Hz which means amplitude of sound wave is taken 44100 times every second.
- Amplitude is divided into 256 parts (in 8-bit PCM format) and stored in RAW audio file
- Every voice (say human voice) have distinct spectrum of frequency of harmonices (Hz) and loudness (dB)
- It is observed that around 1024 sample is optimum for distinguishing several voices.
- But because of sampling rate 44100Hz sample size should be multiple of 44100 because if we multiply
sample rate with time t. we get total sample which is divided to get total training set size which should be integer
(44100t)/n = total_tarining_set - Nearest integer is 882