Acoustic-to-articulatory inversion of dysarthric speech by using cross-corpus acoustic-articulatory data (ICASSP 2021)

This repository houses the official Keras implementation of our paper accepted to 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). If you find the paper useful or reproduce any of the code's modules for your research, please consider citing our paper.

@inproceedings{maharana2021acoustic,
  title={Acoustic-to-Articulatory Inversion for Dysarthric Speech by Using Cross-Corpus Acoustic-Articulatory Data},
  author={Maharana, Sarthak Kumar and Illa, Aravind and Mannem, Renuka and Belur, Yamini and Shetty, Preetie and Kumar, Veeramani Preethish and Vengalil, Seena and   Polavarapu, Kiran and Atchayaram, Nalini and Ghosh, Prasanta Kumar},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6458--6462},
  year={2021}
}

All the modules have been well-documented and are self-explanatory. However, for any clarifications regarding the code published here, please feel free to reach out to me at sarthakmaharana9811@gmail.com with the subject line as "(GitHub) ALS-AAI", for consideration. To get access to the datasets we have used and any other discussions pertaining to potential research, please contact Dr. Prasanta Kumar Ghosh at prasantg@iisc.ac.in

Abstract

In this work, we focus on estimating articulatory movements from acoustic features, known as acoustic-to-articulatory inversion (AAI), for dysarthric patients with amyotrophic lateral sclerosis (ALS). Unlike healthy subjects, there are two potential challenges involved in AAI on dysarthric speech. Due to speech impairment, the pronunciation of dysarthric patients is unclear and inaccurate, which could impact the AAI performance. In addition, acoustic-articulatory data from dysarthric patients is limited due to the difficulty in the recording. These challenges motivate us to utilize cross-corpus acoustic-articulatory data. In this study, we propose an AAI model by conditioning speaker information using x-vectors at the input, and multi-target articulatory trajectory outputs for each corpus separately. Results reveal that the proposed AAI model shows relative improvements of the Pearson correlation coefficient (CC) by ∼13.16% and ∼16.45% over a randomly initialized baseline AAI model trained with only dysarthric corpus in seen and unseen conditions, respectively. In the seen conditions, the proposed AAI model outperforms the three baseline AAI models, that utilize the cross-corpus, by ∼3.49%, ∼6.46%, and ∼4.03% in terms of CC.

Installation

$ git clone https://github.com/sarthaxxxxx/AAI-ALS.git
$ cd AAI-ALS/
$ pip install requirements.txt

Training

Configuring cfg.yaml

To train the xSC and xMC AAI models, set x_vectors: True, and change the name of the model to xSC/xMC. Set subject conditions to "seen" or "unseen" to reproduce the results as in the paper. To train any other AAI models, set x_vectors: False and change the model's name (RI/MC).

To begin training (single GPU)

python3 train.py --config (path to config file in your system) --gpu (gpu_id)

Best models and the respective weights to be saved at ./ckpt/

Regarding the GBM and GBM-FT AAI models

Using ./utils/Get_SPIRE_EMA_data_Full.py, extract training and validation data from the cross-corpus. Train the RI AAI model on this data (experiment by varying the BLSTM units). The resulting best model (256 units) is the GBM AAI. Fine-tune the best weights of the GBM on the dysarthric data and retrain to obtain the GBM-FT AAI model.

Testing

To print out speech task based results for healthy controls and patients:

python3 test.py --config (path to config file in your system)

Presentation and Poster

Presentation
Poster

License

MIT License

Acknowledgement

We thank all the subjects who participated in the EMA recordings. We also thank the Department of Science and Technology, Government of India, for their support.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ckpt		ckpt
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cfg.yaml		cfg.yaml
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Acoustic-to-articulatory inversion of dysarthric speech by using cross-corpus acoustic-articulatory data (ICASSP 2021)

Abstract

Installation

Training

Configuring cfg.yaml

To begin training (single GPU)

Regarding the GBM and GBM-FT AAI models

Testing

Presentation and Poster

License

Acknowledgement

About

Releases 1

Packages

Languages

License

sarthaxxxxx/AAI-ALS

Folders and files

Latest commit

History

Repository files navigation

Acoustic-to-articulatory inversion of dysarthric speech by using cross-corpus acoustic-articulatory data (ICASSP 2021)

Abstract

Installation

Training

Configuring cfg.yaml

To begin training (single GPU)

Regarding the GBM and GBM-FT AAI models

Testing

Presentation and Poster

License

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages