Skip to content

mbzuai-nlp/sttatts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

STTATTS

This repository contains the implementation of the paper:

STTATTS: Unified Speech-To-Text and Text-ToSpeech Model

MBZUAI  
EMNLP 2024 (findings)
  • Oct 2024: release preprint in arxiv

Checkpoints

Finetuned checkpoints are available for Arabic and English-small. To finetune on your dataset, download pretrained checkpoints,tokenizer and dict from ArTST and SpeechT5.

Finetune, Installation and Inference

See finetune scripts here. Installation and Inference follows ArTST repo.

Acknowledgements

STTATTS is built on ArTST and SpeechT5. If you use any of STTATTS models, please cite the papers:

@misc{toyin2024sttattsunifiedspeechtotexttexttospeech,
      title={STTATTS: Unified Speech-To-Text And Text-To-Speech Model}, 
      author={Hawau Olamide Toyin and Hao Li and Hanan Aldarmaki},
      year={2024},
      eprint={2410.18607},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.18607}, 
}

@inproceedings{toyin2023artst,
  title={ArTST: Arabic Text and Speech Transformer},
  author={Toyin, Hawau and Djanibekov, Amirbek and Kulkarni, Ajinkya and Aldarmaki, Hanan},
  booktitle={Proceedings of ArabicNLP 2023},
  pages={41--51},
  year={2023}
}

@article{ao2021speecht5,
  title={Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing},
  author={Ao, Junyi and Wang, Rui and Zhou, Long and Wang, Chengyi and Ren, Shuo and Wu, Yu and Liu, Shujie and Ko, Tom and Li, Qing and Zhang, Yu and others},
  journal={arXiv preprint arXiv:2110.07205},
  year={2021}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published