Skip to content

Commit e8f4554

Browse files
Yinxiaolitensorflower-gardener
authored andcommitted
Internal change
PiperOrigin-RevId: 528905375
1 parent f8ebcc2 commit e8f4554

File tree

1 file changed

+15
-1
lines changed

1 file changed

+15
-1
lines changed

official/projects/maxvit/README.md

+15-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,8 @@
99
[MaxViT](https://arxiv.org/abs/2204.01697) is a family of hybrid (CNN + ViT)
1010
vision backbone models, that achieves better performances across the board
1111
for both parameter and FLOPs efficiency than both state-of-the-art ConvNets and
12-
Transformers. They can also scale well on large dataset sizes like ImageNet-21K.
12+
Transformers ([Blog](https://ai.googleblog.com/2022/09/a-multi-axis-approach-for-vision.html)).
13+
They can also scale well on large dataset sizes like ImageNet-21K.
1314
Notably, due to the linear-complexity of the grid attention used, MaxViT scales
1415
well on tasks requiring large image sizes, such as object detection and
1516
segmentation.
@@ -99,3 +100,16 @@ MaxViT-Base | 896x896 | 28x28 | 200 | 54.31 (+0.91) | 53.4
99100
MaxViT-Large | 896x896 | 28x28 | 200 | 54.69 | - | 46.59 | [config](configs/experiments/coco_maxvitl_i896_crcnn.yaml)
100101

101102
</section>
103+
104+
### Citation
105+
106+
Should you find this repository useful, please consider citing:
107+
108+
```
109+
@article{tu2022maxvit,
110+
title={MaxViT: Multi-Axis Vision Transformer},
111+
author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
112+
journal={ECCV},
113+
year={2022},
114+
}
115+
```

0 commit comments

Comments
 (0)