Speaker Identification

Project for Speech Processing 2023. Credit to Dr. Marie Roch for model generation code and project architecture.

Background

The King corpus is comprised of 51 male speakers recorded by The International Telephone Corporation in San Diego, CA and Nutley, NJ.

Most speakers have 10 recording sessions if a speaker has less than 10 we claim the speaker does not have enough samples to be identified and remove them from the experiment.

There are two versions of this corpus, a high fidelity one that is sampled at 16 kHz referred to as the wide-band corpus, and speech that has been transmitted through a public telephone network and samples at 8 kHz; the narrow-band corpus.

For this experiment we only work with the 25 wide-band samples from the 25 Nutley, NJ speakers.

Training

Spectragrams are dervied from the first 5 recordings of each of the 25 speaers, with a 10 ms advance and a 20 ms length.

A Gaussian mixture model is used to identify areas of speech activity and noise. The mean noise of each frequency of the spectrogram is computed from the noise frames and subtracted from all frames. Noise frames are discarded and we are left with a speech-only spectrogram.

Neural Networks are built using a a 90/10 training/validation and area kept shallow depth due to limited data.

Experiments

Model	Error Rate
Base Model	25%
Narrowing Nodes Model	22%
Wide Model	19%
Intermittant Dropout Model	21%

Best Performing Model

Wide Model Confusion Matrix

References

Higgins, A. and Vermilyea, D. (1995). King Speaker Verification, (ed. I. T. a. T. Corporation). 3600 Market Street, Suite 810, Philadelphia, PA 19104-2653: Linguistic Data Consortium.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
experiment_images		experiment_images
library		library
.gitignore		.gitignore
README.md		README.md
architecture.py		architecture.py
classifier.py		classifier.py
driver.py		driver.py
feature_extraction.py		feature_extraction.py
info_theory.py		info_theory.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker Identification

Background

Training

Experiments

Best Performing Model

References

About

Languages

SkittyWitty/speech_recognition

Folders and files

Latest commit

History

Repository files navigation

Speaker Identification

Background

Training

Experiments

Best Performing Model

References

About

Resources

Stars

Watchers

Forks

Languages