Open
Description
Paper
Link: https://arxiv.org/pdf/1703.10135.pdf
Year: 2017
Summary
- Tacotron, an end-to-end generative text-to-speech model that synthesizes speech directly from characters
- train from <text, audio> pairs, model takes characters as input and outputs raw spectrogram
- this is the first of all Tacotron development - https://google.github.io/tacotron/
Methods
- seq2seq encoder and decoder, consist of conv, attention, and GRU
Results
- outperforming in terms of naturalness
- substantially faster than sample-level autoregressive methods