Skip to content

Latest commit

 

History

History
84 lines (43 loc) · 5.84 KB

README.md

File metadata and controls

84 lines (43 loc) · 5.84 KB

Speech Translation Resources

This page contains a curated list of papers and resources for Speech Translation, with a focus on end-to-end systems. This list should be considered as a starting point for anyone with an interest in Speech Translation, not a definitive guide.

Papers

###Overview Speech Translation and the End-to-End Promise: Taking Stock of Where We Are; ACL 2020; Paper

Speech-to-text Translation

Multilingual Speech Translation with Efficient Finetuning of Pretrained Models; ACL 2020; Paper

AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation; ACL 2021; Paper

Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?; ACL 2021; Paper

Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus; ACL 2020; Paper

Highland Puebla Nahuatl–Spanish Speech Translation Corpus for Endangered Language Documentation; ACL 2021; Paper

Self-Training for End-to-End Speech Translation; Interspeech 2020; Paper

Towards Unsupervised Speech-to-text Translation; ICASSP 2019; Paper

Fluent Translations from Disfluent Speech in End-to-End Speech Translation; NAACL 2019; Paper

Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation; AAAI 2020; Paper

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation; ICML 2021; Paper

Speech-to-speech Translation

Direct speech-to-speech translation with a sequence-to-sequence model (Translatotron 1); Interspeech 2019; Paper Google AI blog post; Google Research Audio Samples

Translatotron 2: Robust direct speech-to-speech translation; 2021 on ArXiV; Paper; Google Research Audio Samples

Assessing Evaluation Metrics for Speech-to-Speech Translation; 2021 ASRU; Paper

Transformer-based Direct Speech-to-speech Translation with Transcoder; 2021 SLT; Paper

Direct speech-to-speech translation with discrete units; 2021 on ArXiV; Paper

Direct simultaneous speech to speech translation; 2021 on ArXiV; Paper

Speech-to-speech Translation between Untranscribed Unknown Languages; 2019 ASRU; Paper

Datasets

###CoVoST: All data at Facebook Research Github Repo

CoVoST 2: 21 X->En, 15 En->X speech-to-text language pairs; 2880 hours; Paper; MetaAI Announcement; HuggingFace Dataset

CoVoST 1: 11 X->En speech-to-text language pairs; 700 hours; Paper

Other speech-to-text datasets:

MTedX: 11 languages into some of En, Es, Fr, It, Pt; 765 hours; Paper; Dataset

Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates; with v1.1 all pairs of 9 European Languages = 72 directions; 1642 hours; Paper; Dataset

MaSS (Multilingual corpus of Sentence-aligned Spoken utterances): 8 languages => 56 directions; 172 hours; Paper; Datset

BSTC (Baidu Speech Translation Corpus): 50 hours Zh->En; Paper; Baidu page

Fisher and Callhome Spanish-English Speech Translation: 160 hours Es->En; Paper; Dataset

Speech-to-speech datasets:

MaSS (Multilingual corpus of Sentence-aligned Spoken utterances): 8,130 parallel spoken utterances across 8 languages (56 language pairs); also provides text; Paper; Datset

Most speech-to-speech datasets, however, are largely produced through speech synthesis of speech-to-text translation datasets such as the Fisher, Conversational, or CoVoST 2 dataset (as is done for Translatotron).

Other Resources

See End-to-End Speech Translation Progress for more papers and datasets by Changhan Wang.

See Awesome speech translation for a very comprehensive set of papers (including Pipeline ST, streaming ST, and other ST problems) compiled by the Chinese Academy of Sciences & ByteDance AI Lab.

See ST Tutorial for a great introduction to Speech Translation with slides and resources, which were presented as at EACL 2021.

If there are any suggestions, errors caught, or anything else, feel free to submit a pull request!