This code is licensed for non-commerical research purpose only.
Existing salient object detection (SOD) methods mainly rely on CNN-based U-shaped structures with skip connections to combine the global contexts and local spatial details that are crucial for locating salient objects and refining object details, respectively. Despite great successes, the ability of CNN in learning global contexts is limited. Recently, the vision transformer has achieved revolutionary progress in computer vision owing to its powerful modeling of global dependencies. However, directly applying the transformer to SOD is suboptimal because the transformer lacks the ability to learn local spatial representations. To this end, this paper explores the combination of transformer and CNN to learn both global and local representations for SOD. We propose a transformer-based Asymmetric Bilateral U-Net (ABiU-Net). The asymmetric bilateral encoder has a transformer path and a lightweight CNN path, where the two paths communicate at each encoder stage to learn complementary global contexts and local spatial details, respectively. The asymmetric bilateral decoder also consists of two paths to process features from the transformer and CNN encoder paths, with communication at each decoder stage for decoding coarse salient object locations and fine-grained object details, respectively. Such communication between the two encoder/decoder paths enables AbiU-Net to learn complementary global and local representations,taking advantage of the natural properties of transformer and CNN, respectively. Hence, ABiU-Net provides a new perspective for transformer-based SOD. Extensive experiments demonstrate that ABiU-Net performs favorably against previous state-of-the-art SOD methods.
Fig. 1. Illustration of various encoder-decoder architectures. (a) ∼ (e) indicate the architectures of Hypercolumn, U-shape, BiSeNet, DSS, and our ABiU-Net, respectively.
Fig. 2. Framework of the proposed Asymmetric Bilateral U-Net (ABiU-Net).
If you are using the code/model provided here in a publication, please consider citing:
@article{qiu2021boosting,
title={Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net},
author={Qiu, Yu and Liu, Yun and Zhang, Le and Xu, Jing},
journal={arXiv preprint arXiv:2108.07851},
year={2021}
}
The code is built with the following dependencies:
- Python 3.6 or higher
- CUDA 10.0 or higher
- PyTorch 1.2 or higher
The Saliency dataset is organized into the following tree structure:
dataset
│
└───DUTS-TR
└───DUTS-TR.lst
└───SOD
└───SOD.lst
└───HKU-IS
└───HKU-IS.lst
'''
Run the following scripts to test the model:
CUDA_VISIBLE_DEVICES=0 python test.py [--model_name 'ABiU-Net']
[--savedir 'outputs']
[--pretrained './result_epoch50/ABiU_Net_50.pth']
The output saliency maps can be downloaded:
Run the following scripts to evaluate the model:
python evaluate.py
The pretrained PVT-Tiny can be downloaded:
Run the following scripts to train the model:
CUDA_VISIBLE_DEVICES=0 python train.py [--model_name 'ABiU-Net']
[--max_epochs 50]
[--batch_size 16]
[--base_lr 5e-5]
[----img_size 384]
For any questions, please contact me via e-mail: yqiu@mail.nankai.edu.cn.