Open
Description
Paper
Link: https://arxiv.org/pdf/2101.11605.pdf
Year: 2021
Summary
- incorporates self-attention in ResNet's bottleneck blocks, improves instance segmentation and object detection while reducing the parameters.
- convolution and self-attention can beat ImageNet benchmark, pure attention ViT models struggle in small data regime, but shine in large data regime.
Methods
uses relative position encodings, seeing gains over the absolute position encodings