Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environments - Submitted to EUSIPCO 2025

🔥 Official repository PyTorch-Lightning code for the paper:
"Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environments"
This work is an extension of our previous paper "Multi-Microphone Speech Emotion Recognition using the Hierarchical Token-semantic Audio Transformer Architecture", incorporating multi-modal learning to further improve robustness.
📄 Read the Paper

🔍 Overview

Human emotions are conveyed through speech and facial expressions, making multi-modal emotion recognition (MER) crucial for robust emotion classification. This work introduces a multi-microphone and multi-modal system combining:

HTS-AT Transformer for multi-channel audio processing
R(2+1)D ResNet CNN for video-based emotion recognition
Late fusion concatenation to combine audio & video features

🔮 Key Features

✔️ Multi-Microphone Audio Processing: Robust against reverberation
✔️ Multi-Modal Learning: Combines speech and facial cues
✔️ Tested on RAVDESS convolved with Real-World RIRs (ACE Database)
✔️ Pretrained Models Available for Fine-Tuning and testing

📸 Model Architecture

Our approach consists of two main components:

HTS-AT Transformer (Audio Modality):
- Processes multi-channel mel-spectrograms
- Uses Patch-Embed Summation & Averaging strategies
- Extracts deep features for robust emotion classification
R(2+1)D CNN (Video Modality):
- Extracts spatiotemporal features from facial expressions
- Pretrained on Kinetics dataset, fine-tuned for MER
Feature Fusion & Classification:
- Late fusion via concatenation of extracted embeddings
- Fully connected layers for final emotion classification

Multi-channel Multi-modal Architecture	Multi-Microphone Audio	Video-Based Recognition

🔧 Getting Started

1️⃣ Installation

The base way to run the code is with Docker Container.

Pull Docker Image

docker pull ohadico97/mer:v1

After having the image, run:

docker run -it --gpus all --shm-size 20G -v $HOME:$HOME --name <container name> ohadico97/mer:v1

Clone the Repository

git clone https://github.com/OhadCohen97/Multi-Microphone-Multi-Modal-Emotion-Recognition-in-Reverberant-Environments.git
cd Multi-Microphone-Multi-Modal-Emotion-Recognition-in-Reverberant-Environments

Use the Virtual Environment

Python 3.8.13 ('base')

2️⃣ Dataset Setup

The previous paper evaluated the models on three datasets: RAVDESS, IEMOCAP, and CREMA-D. In this work, we focus exclusively on the RAVDESS dataset. The training and validation splits are reverberated synthetically using the 'gpuRIR' Python library, while the test sets are reverberated with real-world ACE RIRs recorded in various acoustic environments. You can choose which modality to fine-tune using the multimodal flag in config.py.

🔗 Dataset & Pre-trained Models

Note that the first work fine-tuned on different amounts of samples and splits.

Place datasets inside data/ folder:

MER/
  │── data/
        │── <dataset name & type>/
                   │── train.npy
                   │── val.npy
                   │── test.npy
│── ACE/
      │── <dataset name>
            │── MP1/
                  │── Lecture_room_1_508
                        │── train.npy
                        │── val.npy
                        │── test.npy
                  │── Lecture_room_2_403a
                        │── train.npy
                        │── val.npy
                        │── test.npy
                  │── lobby
                     ...
                  │── Meeting room_2 _611
                  │── Office_1_502
                  │── Office_2_803

3️⃣ Training & Evaluation

Preprocess Data

For preprocess the data see the notebooks.

Fine-tune the Models

First, in config.py, make sure you have the HTS-AT AudioSet pre-trained model and the path to it. Set the root path to the dataset you wish to use. To train:

CUDA_VISIBLE_DEVICES=0,1,2,3,4 python3 main.py train

Evaluate Performance

To evaluate the models, set the paths for the npy files. See in main.py test_ace (lines 219,236,253) .

CUDA_VISIBLE_DEVICES=0,1,2,3,4 python3 main.py test_ace

For a single test run on one npy file:

CUDA_VISIBLE_DEVICES=0,1,2,3,4 python3 main.py test

📊 Results

🏆 Citation

If you use this work, please cite:

@article{cohen2024multi,
  title={Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environment},
  author={Cohen, Ohad and Hazan, Gershon and Gannot, Sharon},
  journal={arXiv preprint arXiv:2409.09545},
  year={2024}
}

🌟 Acknowledgments

This research was supported by the European Union’s Horizon 2020 Program and the Audition Project, Data Science Program, Israel.

👤 Contact

For questions or collaborations, feel free to reach out:
📧 ohad.cohen@biu.ac.il

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
images		images
model		model
README.md		README.md
config.py		config.py
data_generator.py		data_generator.py
main.py		main.py
sed_model.py		sed_model.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environments - Submitted to EUSIPCO 2025

🔍 Overview

🔮 Key Features

📸 Model Architecture

🔧 Getting Started

1️⃣ Installation

Pull Docker Image

Clone the Repository

Use the Virtual Environment

2️⃣ Dataset Setup

3️⃣ Training & Evaluation

Preprocess Data

Fine-tune the Models

Evaluate Performance

📊 Results

🏆 Citation

🌟 Acknowledgments

👤 Contact

About

Releases

Packages

Languages

OhadCohen97/Multi-Microphone-Multi-Modal-Emotion-Recognition-in-Reverberant-Environments

Folders and files

Latest commit

History

Repository files navigation

Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environments - Submitted to EUSIPCO 2025

🔍 Overview

🔮 Key Features

📸 Model Architecture

🔧 Getting Started

1️⃣ Installation

Pull Docker Image

Clone the Repository

Use the Virtual Environment

2️⃣ Dataset Setup

3️⃣ Training & Evaluation

Preprocess Data

Fine-tune the Models

Evaluate Performance

📊 Results

🏆 Citation

🌟 Acknowledgments

👤 Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages