Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

# Paper
Link: https://arxiv.org/abs/1712.05884
Year: 2018

# Summary
- text to speech synthesis, sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize time-domain waveforms from those spectrograms

### Methods

![image](https://user-images.githubusercontent.com/1694368/123219816-e776be80-d4ff-11eb-8288-a3057a469c6d.png)

network is composed of an encoder and a decoder with attention
- encoder converts a character sequence into representations
- decoder consumes to predict a spectrogram
- attention network which summarizes the full encoded sequence as a fixed-length context vector for each decoder output step
- concatenation of the LSTM output and the attention context vector is projected through a linear transform to predict the target spectrogram frame


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions #74

Paper

Summary

Methods

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions #74

Description

Paper

Summary

Methods

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions