Skip to content

Tacotron: Towards End-to-End Speech Synthesis #24

Open
@jinglescode

Description

@jinglescode

Paper

Link: https://arxiv.org/pdf/1703.10135.pdf
Year: 2017

Summary

  • Tacotron, an end-to-end generative text-to-speech model that synthesizes speech directly from characters
  • train from <text, audio> pairs, model takes characters as input and outputs raw spectrogram
  • this is the first of all Tacotron development - https://google.github.io/tacotron/

image

Methods

  • seq2seq encoder and decoder, consist of conv, attention, and GRU

Results

  • outperforming in terms of naturalness
  • substantially faster than sample-level autoregressive methods

Comments

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions