Speech Translation Resources

This page contains a curated list of papers and resources for Speech Translation, with a focus on end-to-end systems. This list should be considered as a starting point for anyone with an interest in Speech Translation, not a definitive guide.

Papers

###Overview Speech Translation and the End-to-End Promise: Taking Stock of Where We Are; ACL 2020; Paper

Speech-to-text Translation

Multilingual Speech Translation with Efficient Finetuning of Pretrained Models; ACL 2020; Paper

AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation; ACL 2021; Paper

Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?; ACL 2021; Paper

Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus; ACL 2020; Paper

Highland Puebla Nahuatl–Spanish Speech Translation Corpus for Endangered Language Documentation; ACL 2021; Paper

Self-Training for End-to-End Speech Translation; Interspeech 2020; Paper

Towards Unsupervised Speech-to-text Translation; ICASSP 2019; Paper

Fluent Translations from Disfluent Speech in End-to-End Speech Translation; NAACL 2019; Paper

Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation; AAAI 2020; Paper

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation; ICML 2021; Paper

Speech-to-speech Translation

Direct speech-to-speech translation with a sequence-to-sequence model (Translatotron 1); Interspeech 2019; Paper Google AI blog post; Google Research Audio Samples

Translatotron 2: Robust direct speech-to-speech translation; 2021 on ArXiV; Paper; Google Research Audio Samples

Assessing Evaluation Metrics for Speech-to-Speech Translation; 2021 ASRU; Paper

Transformer-based Direct Speech-to-speech Translation with Transcoder; 2021 SLT; Paper

Direct speech-to-speech translation with discrete units; 2021 on ArXiV; Paper

Direct simultaneous speech to speech translation; 2021 on ArXiV; Paper

Speech-to-speech Translation between Untranscribed Unknown Languages; 2019 ASRU; Paper

Datasets

###CoVoST: All data at Facebook Research Github Repo

CoVoST 2: 21 X->En, 15 En->X speech-to-text language pairs; 2880 hours; Paper; MetaAI Announcement; HuggingFace Dataset

CoVoST 1: 11 X->En speech-to-text language pairs; 700 hours; Paper

Other speech-to-text datasets:

MTedX: 11 languages into some of En, Es, Fr, It, Pt; 765 hours; Paper; Dataset

Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates; with v1.1 all pairs of 9 European Languages = 72 directions; 1642 hours; Paper; Dataset

MaSS (Multilingual corpus of Sentence-aligned Spoken utterances): 8 languages => 56 directions; 172 hours; Paper; Datset

BSTC (Baidu Speech Translation Corpus): 50 hours Zh->En; Paper; Baidu page

Fisher and Callhome Spanish-English Speech Translation: 160 hours Es->En; Paper; Dataset

Speech-to-speech datasets:

MaSS (Multilingual corpus of Sentence-aligned Spoken utterances): 8,130 parallel spoken utterances across 8 languages (56 language pairs); also provides text; Paper; Datset

Most speech-to-speech datasets, however, are largely produced through speech synthesis of speech-to-text translation datasets such as the Fisher, Conversational, or CoVoST 2 dataset (as is done for Translatotron).

Other Resources

See End-to-End Speech Translation Progress for more papers and datasets by Changhan Wang.

See Awesome speech translation for a very comprehensive set of papers (including Pipeline ST, streaming ST, and other ST problems) compiled by the Chinese Academy of Sciences & ByteDance AI Lab.

See ST Tutorial for a great introduction to Speech Translation with slides and resources, which were presented as at EACL 2021.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Speech Translation Resources

Papers

Speech-to-text Translation

Speech-to-speech Translation

Datasets

Other speech-to-text datasets:

Speech-to-speech datasets:

Other Resources

If there are any suggestions, errors caught, or anything else, feel free to submit a pull request!

Files

README.md

Latest commit

History

README.md

File metadata and controls

Speech Translation Resources

Papers

Speech-to-text Translation

Speech-to-speech Translation

Datasets

Other speech-to-text datasets:

Speech-to-speech datasets:

Other Resources

If there are any suggestions, errors caught, or anything else, feel free to submit a pull request!