Skip to content

Music transformer: Generating music with long-term structure #42

Open
@jinglescode

Description

@jinglescode

Paper

Link: https://openreview.net/forum?id=rJe4ShAcF7
Year: 2018

Summary

  • relative attention is very well-suited for generative modeling of symbolic music
  • relative attention to much longer sequences such as long texts or even audio waveforms

Contributions and Distinctions from Previous Works

  • Transformers unable to perform long sequence like music

Methods

  • take a language-modeling approach to training generative models for symbolic music. Hence we represent music as a sequence of discrete tokens, with the vocabulary determined by the dataset. Datasets in different genres call for different ways of serializing polyphonic music into a single stream and also discretizing time
  • perform "skewing" for a memory efficient implementation of relative position based attention

Results

  • relative self-attention mechanism, dramatically reducing its memory
    requirements from O(L^2D) to O(LD). For example, the memory consumption per layer is reduced from 8.5 GB to 4.2 MB (per head from 1.1 GB to 0.52 MB) for a sequence of length L = 2048 and hidden-state size D = 512
  • perceived as more coherent than the baseline Transformer model
  • generalize and generate in consistent fashion beyond the length it was trained on

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions