Skip to content

Parallel wavenet: Fast high-fidelity speech synthesis #23

Open
@jinglescode

Description

@jinglescode

Paper

Link: https://arxiv.org/pdf/1711.10433.pdf
Year: 2017

Summary

  • high-fidelity speech synthesis based on WaveNet using Probability Density Distillation

Contributions and Distinctions from Previous Works

  • generating high-fidelity speech samples at more than 20 times faster than real-time compared to the original WaveNet with no significant difference in quality

Methods

  • modify WaveNet for parallel training with inverse-autoregressive flows
  • uses an already trained WaveNet as a ‘teacher’ from which a parallel WaveNet ‘student’ can efficiently learn
  • 'student' cooperates by attempting to match the teacher’s probabilities
  • to minimise the KL-divergence between its distribution and that of the teacher by maximising the log-likelihood of its samples under the teacher and maximising its own entropy at the same time
  • introduce 3 loss terms, power loss, perceptual loss, contrastive loss

Results

  • used in Google Assistant queries
  • modelling a sample rate of 24kHz instead of 16kHz

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions