Skip to content

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions #74

@jinglescode

Description

@jinglescode

Paper

Link: https://arxiv.org/abs/1712.05884
Year: 2018

Summary

  • text to speech synthesis, sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize time-domain waveforms from those spectrograms

Methods

image

network is composed of an encoder and a decoder with attention

  • encoder converts a character sequence into representations
  • decoder consumes to predict a spectrogram
  • attention network which summarizes the full encoded sequence as a fixed-length context vector for each decoder output step
  • concatenation of the LSTM output and the attention context vector is projected through a linear transform to predict the target spectrogram frame

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions