Open
Description
Paper
Link: https://arxiv.org/pdf/2010.11929.pdf
Year: 2020
Summary
- global image attention by patches
- learn to attend to patches further away at the lower layers which convnet cannot
Contributions and Distinctions from Previous Works
- like transformers have replaced LSTM in NLP tasks they work attempt to replace conv with attentions
- conv net has good inductive priors (or inductive bias), where it has good feature extractions with immediate neighbors, which is sensible for image data
Methods
- split images into patches, unroll the patches as a sequence of patches,
Results
- require lesser computation power to train than huge convnet
- learns similar filters to traditional conv filters