WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

# Paper
Link: https://arxiv.org/pdf/2106.09660.pdf
Year: 2021

# Summary
- text-to-speech synthesis, synthesizes the waveform directly without using hand-designed intermediate features (e.g., spectrograms)

### Methods
3 modules
- encoder: sequence input, extracts representations
- resampling: match input to output
- decoder: generate waveform

encoder:
- 3 conv + batchnorm + dropout
- LSTM
- zoneout regularization

resampling
- Gaussian upsampling introduced in the non-attentive Tacotron

decoder
- consist upsampling blocks and downsampling blocks

### Results
- tradeoff between fidelity and speed by varying the number of refinement steps
- experiments demonstrate that WaveGrad 2 is capable of generating high fidelity audio, comparable to strong baselines


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis #73

Paper

Summary

Methods

Results

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis #73

Description

Paper

Summary

Methods

Results

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions