Open
Description
Paper
Link: https://arxiv.org/abs/2006.04768
Year: 2020
Summary
- self-attention mechanism can be approximated by a low-rank matrix, reduces the overall self-attention complexity from O(n^2) to O(n) in both time and space.
Contributions and Distinctions from Previous Works
- standard self-attention mechanism of the Transformer uses O(n^2) time and
space with respect to sequence length
Methods
- if we can estimate the attention weights, we can reduce the number needed
Results
- for transformer, increasing sequence length will increase inference time, but linformer, stays constant, only increasing k will increase inference time.
- results shows that, linformer can perform well on some tasks, and not on some other tasks. some linformer does better than some other linformer varient