Open
Description
Paper
Link: http://proceedings.mlr.press/v119/katharopoulos20a.html
Year: 2020
Summary
- reformulates the attention mechanism in terms of kernel functions and obtains a linear formulation, which reduces these requirements. Surprisingly, this formulation also surfaces an interesting connection between autoregressive transformers and RNNs
Contributions and Distinctions from Previous Works
- from O(N^2) to O(N) both time and memory
Results
- in terms of performance, outperform on some tasks but not on some
- definitely faster than vanilla transformer and slightly faster than reformer