Skip to content

Are Transformers universal approximators of sequence-to-sequence functions? #56

Open
@jinglescode

Description

@jinglescode

Paper

Link: https://arxiv.org/abs/1912.10077
Year: 2020

Summary

  • multi-head self-attention layers can indeed compute contextual mappings of the input sequences
  • Transformers can represent any sequence-to-sequence functions, Transformers are universal approximators of continuous and permutation equivariant sequence-to-sequence functions with compact support

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions