Skip to content

Transformer-xl: Attentive language models beyond a fixed-length context #39

Open
@jinglescode

Description

@jinglescode

Paper

Link: https://arxiv.org/abs/1901.02860
Year: 2019

Summary

  • enables learning dependency beyond a fixed length without disrupting temporal coherence
  • resolves the context fragmentation problem

Contributions and Distinctions from Previous Works

  • Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.

Methods

  • main technical contributions include introducing the notion of recurrence in a purely selfattentive model and deriving a novel positional encoding scheme

Results

  • TransformerXL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers
  • achieves better performance on both short and long sequences
  • experiments on enwiki8, Transformer-XL is up to 1,800+ times faster than the vanilla model during evaluation
  • able to generate relatively coherent long text articles with thousands of tokens
  • first self-attention model that achieves substantially better results than RNNs on both character-level and word-level language modeling

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions