Open
Description
Paper
Link: https://arxiv.org/pdf/1612.07837.pdf
Year: 2017
Summary
- able to capture underlying sources of variations in the temporal sequences over very long time spans
- using a hierarchy of modules, each operating at a different temporal resolution. lowest module processes individual samples, and each higher module operates on an increasingly longer timescale and a lower temporal resolution
Contributions and Distinctions from Previous Works
Hence, our contribution is threefold:
- We present a novel method that utilizes RNNs at different scales to model longer term dependencies in audio waveforms while training on short sequences which results in memory efficiency during training.
- We extensively explore and compare variants of models achieving the above effect.
- We study and empirically evaluate the impact of different components of our model on
three audio datasets. Human evaluation also has been conducted to test these generative models.
Methods
- frame level modules
- sample level module
- output quantization
- conditionally independent sample outputs
- truncated BPTT
Results
- outperformed RNN and WaveNet