🎙️ Speaker Change Detection using Deep Learning

A Jupyter notebook implementation of speaker change detection using LSTM-based deep learning models on the IEMOCAP dataset.

📋 Overview

This project implements a speaker change detection system using LSTM networks in a Jupyter notebook format. The system processes audio features (MFCC and F0) to identify points in a conversation where speaker transitions occur.

🔧 Prerequisites

Python 3.8+
Jupyter Notebook/Lab
TensorFlow 2.x
librosa
parselmouth
numpy
pandas
matplotlib
scikit-learn
seaborn

📦 Setup

Clone the repository:

git clone https://github.com/danishayman/Speaker-Change-Detection.git
cd Speaker-Change-Detection

Install required packages:

pip install -r requirements.txt

Download the IEMOCAP dataset:
- The dataset can be obtained from Kaggle
- Place the downloaded dataset in your working directory

📓 Notebook Structure

The project is contained in a single Jupyter notebook with the following sections:

Import Libraries: Setting up necessary Python packages
Feature Extraction:
- Loading audio files
- Extracting MFCC and F0 features
- Defining sliding window parameters
Data Preprocessing:
- RTTM parsing
- Label generation
- Dataset splitting
Model Development:
- Building LSTM model
- Training with different window sizes
- Performance evaluation
Results and Analysis:
- Visualization of results
- Confusion matrix analysis
- Comprehensive performance metrics

🚀 Features

🎵 Audio feature extraction (MFCC and F0)
🪟 Sliding window analysis with various sizes (3, 5, 7, 9 frames)
🤖 LSTM-based architecture with batch normalization
📊 Comprehensive evaluation metrics and visualizations
📈 Experiment analysis with different window sizes

💻 Usage

Open the Jupyter notebook:

jupyter notebook speaker_change_detection.ipynb

Ensure your IEMOCAP dataset path is correctly set in the notebook:

base_path = "path/to/your/IEMOCAP/dataset"

Run all cells sequentially to:
- Extract features
- Process data
- Train models
- Visualize results

📊 Results

The model's performance across different window sizes:

Best Window Size: 7 frames
Peak Accuracy: 66.94%
Precision: 0.0047
Recall: 0.6593
F1-Score: 0.0093

🔄 Model Architecture

Sequential([
    Input(shape=input_shape),
    LSTM(128, return_sequences=True),
    BatchNormalization(),
    Dropout(0.3),
    LSTM(64),
    BatchNormalization(),
    Dense(32, activation='relu'),
    Dropout(0.2),
    Dense(1, activation='sigmoid')
])

🛠️ Future Improvements

Implement data augmentation techniques
Explore attention mechanisms
Add residual connections
Implement curriculum learning
Experiment with additional acoustic features
Optimize batch size and training epochs with better hardware

📚 Citation

@article{busso2008iemocap,
    title     = {IEMOCAP: Interactive emotional dyadic motion capture database},
    author    = {Busso, Carlos and Bulut, Murtaza and Lee, Chi-Chun and 
                 Kazemzadeh, Abe and Mower, Emily and Kim, Samuel and 
                 Chang, Jeannette and Lee, Sungbok and Narayanan, Shrikanth S},
    journal   = {Speech Communication},
    volume    = {50},
    number    = {11},
    pages     = {1150--1162},
    year      = {2008},
    publisher = {Elsevier}
}

⚠️ Note

The current implementation faces challenges with class imbalance and computational constraints. Future improvements should focus on addressing these limitations for better performance.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
Speaker-Change-Detection.ipynb		Speaker-Change-Detection.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Speaker Change Detection using Deep Learning

📋 Overview

🔧 Prerequisites

📦 Setup

📓 Notebook Structure

🚀 Features

💻 Usage

📊 Results

🔄 Model Architecture

🛠️ Future Improvements

📚 Citation

⚠️ Note

About

Languages

danishayman/Speaker-Change-Detection

Folders and files

Latest commit

History

Repository files navigation

🎙️ Speaker Change Detection using Deep Learning

📋 Overview

🔧 Prerequisites

📦 Setup

📓 Notebook Structure

🚀 Features

💻 Usage

📊 Results

🔄 Model Architecture

🛠️ Future Improvements

📚 Citation

⚠️ Note

About

Topics

Resources

Stars

Watchers

Forks

Languages