Skip to content

SampleRNN: An unconditional end-to-end neural audio generation model #37

Open
@jinglescode

Description

@jinglescode

Paper

Link: https://arxiv.org/pdf/1612.07837.pdf
Year: 2017

Summary

  • able to capture underlying sources of variations in the temporal sequences over very long time spans
  • using a hierarchy of modules, each operating at a different temporal resolution. lowest module processes individual samples, and each higher module operates on an increasingly longer timescale and a lower temporal resolution

Contributions and Distinctions from Previous Works

Hence, our contribution is threefold:

  1. We present a novel method that utilizes RNNs at different scales to model longer term dependencies in audio waveforms while training on short sequences which results in memory efficiency during training.
  2. We extensively explore and compare variants of models achieving the above effect.
  3. We study and empirically evaluate the impact of different components of our model on
    three audio datasets. Human evaluation also has been conducted to test these generative models.

Methods

  • frame level modules
  • sample level module
    • output quantization
    • conditionally independent sample outputs
    • truncated BPTT

Results

  • outperformed RNN and WaveNet

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions