This is a list of awesome attention mechanisms used in computer vision, as well as a collection of plug and play modules. Due to limited ability and energy, many modules may not be included. If you have any suggestions or improvements, welcome to submit an issue or PR.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021, ViT
If you know of any awesome attention mechanism in computer vision resources, please add them in the PRs or issues.
Additional article papers and corresponding code links are welcome in the issue.
Thanks to @dedekinds for pointing out the problem in the DIANet description.