Skip to content

Transformers are rnns: Fast autoregressive transformers with linear attention #49

Open
@jinglescode

Description

@jinglescode

Paper

Link: http://proceedings.mlr.press/v119/katharopoulos20a.html
Year: 2020

Summary

  • reformulates the attention mechanism in terms of kernel functions and obtains a linear formulation, which reduces these requirements. Surprisingly, this formulation also surfaces an interesting connection between autoregressive transformers and RNNs

Contributions and Distinctions from Previous Works

  • from O(N^2) to O(N) both time and memory

Results

  • in terms of performance, outperform on some tasks but not on some
  • definitely faster than vanilla transformer and slightly faster than reformer

image

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions