diff --git a/README.md b/README.md index 4ba480f..d11685c 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,4 @@ -# Contents - -- [Contents](#contents) - - [Focus-DETR Description](#focus-detr-description) - - [Model architecture](#model-architecture) - - [Dataset](#dataset) - - [Environment Requirements](#environment-requirements) - - [Eval process](#eval-process) - - [Usage](#usage) - - [Launch](#launch) - - [Result](#result) - - [ModelZoo Homepage](#modelzoo-homepage) - -## [Focus-DETR Description](#contents) +# [Focus-DETR](#contents) Focus-DETR is a model that focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Compared with the state-of-the-art sparse transformed-based detector under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO. @@ -19,6 +6,7 @@ our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO. > [Paper](https://openreview.net/pdf?id=iuW96ssPQX): Less is More: Focus Attention for Efficient DETR. > Dehua Zheng*, Wenhui Dong*, Hailin Hu, Xinghao Chen, Yunhe Wang. + ## [Model architecture](#contents) Our Focus-DETR comprises a backbone network, a Transformer encoder, and a Transformer decoder. We design a foreground token selector (FTS) based on top-down score modulations across multi-scale features. And the selected tokens by a multi-category score predictor and foreground tokens go through the Pyramid Encoder to remedy the limitation of deformable attention in distant information mixing.