Pytorch implementation of the paper "Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition"
@inproceedings{delbrouck-etal-2020-modulated,
title = "Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition",
author = "Delbrouck, Jean-Benoit and
Tits, No{\'e} and
Dupont, St{\'e}phane",
booktitle = "Proceedings of the First International Workshop on Natural Language Processing Beyond Text",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.nlpbt-1.1",
doi = "10.18653/v1/2020.nlpbt-1.1",
pages = "1--10",
abstract = "This paper aims to bring a new lightweight yet powerful solution for the task of Emotion Recognition and Sentiment Analysis. Our motivation is to propose two architectures based on Transformers and modulation that combine the linguistic and acoustic inputs from a wide range of datasets to challenge, and sometimes surpass, the state-of-the-art in the field. To demonstrate the efficiency of our models, we carefully evaluate their performances on the IEMOCAP, MOSI, MOSEI and MELD dataset. The experiments can be directly replicated and the code is fully open for future researches.",
}
Create a 3.6 python environement with:
torch 1.2.0
torchvision 0.4.0
numpy 1.18.1
We use GloVe vectors from space. This can be installed to your environement using the following commands :
wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
pip install en_vectors_web_lg-2.1.0.tar.gz
Create a data folder and get the data:
mkdir -p data
cd data
wget -O data.zip https://www.dropbox.com/s/tz25q3xxfraw2r3/data.zip?dl=1
unzip data.zip
Here is an example to train a MAT model on IEMOCAP:
mkdir -p ckpt
for i in {1..10}
do
python main.py --dataset IEMOCAP \
--model Model_MAT \
--multi_head 4 \
--ff_size 1024 \
--hidden_size 512 \
--layer 2 \
--batch_size 32 \
--lr_base 0.0001 \
--dropout_r 0.1 \
--dropout_o 0.5 \
--name mymodel
done
Checkpoints will be stored in folder ckpt/mymodel
You can evaluate a model by typing :
python ensembling.py --name mymodel --sets test
The task settings are defined in the checkpoint state dict, so the evaluation will be carried on the dataset you trained your model on.
By default, the script globs all the training checkpoints inside the folder and ensembling will be performed
To show further details of the evaluation from a specific ensembling, you can use the --index
argument:
python ensembling.py --name mymodel --sets test --index 5
We release pre-trained models to replicate the results as shown in the paper. Models should be placed in the ckpt
folder.
mkdir -p ckpt
python ensembling.py --name IEMOCAP_pretrained --index 5 --sets test
precision recall f1-score support
0 0.70 0.66 0.68 384
1 0.68 0.75 0.71 278
2 0.79 0.71 0.75 194
3 0.78 0.81 0.79 229
accuracy 0.73 1085
macro avg 0.74 0.73 0.73 1085
weighted avg 0.73 0.73 0.73 1085
Max ensemble w-accuracies for test : 72.53456221198157
python ensembling.py --name MOSEI_pretrained --index 9 --sets test
precision recall f1-score support
0 0.75 0.57 0.65 1350
1 0.84 0.92 0.88 3312
accuracy 0.82 4662
macro avg 0.80 0.75 0.77 4662
weighted avg 0.82 0.82 0.81 4662
Max ensemble w-accuracies for test : 82.15358215358215
python ensembling.py --name MOSI_pretrained --index 2 --sets test
precision recall f1-score support
0 0.77 0.91 0.84 379
1 0.84 0.63 0.72 277
accuracy 0.79 656
macro avg 0.81 0.77 0.78 656
weighted avg 0.80 0.79 0.79 656
Max ensemble w-accuracies for test : 79.26829268292683
python ensembling.py --name MELD_pretrained --index 9 --sets test
precision recall f1-score support
0 0.64 0.52 0.58 1256
1 0.36 0.58 0.45 281
2 0.08 0.18 0.11 50
3 0.23 0.25 0.24 208
4 0.44 0.47 0.46 402
5 0.23 0.24 0.23 68
6 0.31 0.27 0.29 345
accuracy 0.45 2610
macro avg 0.33 0.36 0.34 2610
weighted avg 0.48 0.45 0.46 2610