Source-free domain adaptive semantic segmentation aims to adapt a pre-trained source model to the unlabeled target domain without accessing the private source data. Previous methods usually fine-tune the entire network, which suffers from expensive parameter tuning. To avoid this problem, we propose to utilize visual prompt tuning for parameter-efficient adaptation. However, the existing visual prompt tuning methods are unsuitable for source-free domain adaptive semantic segmentation due to the following two reasons: (1) Commonly used visual prompts like input tokens or pixel-level perturbations cannot reliably learn informative knowledge beneficial for semantic segmentation. (2) Visual prompts require sufficient labeled data to fill the gap between the pre-trained model and downstream tasks. To alleviate these problems, we propose a universal unsupervised visual prompt tuning (Uni-UVPT) framework, which is applicable to various transformer-based backbones. Specifically, we first divide the source pre-trained backbone with frozen parameters into multiple stages, and propose a lightweight prompt adapter for progressively encoding informative knowledge into prompts and enhancing the generalization of target features between adjacent backbone stages. Cooperatively, a novel adaptive pseudo-label correction strategy with a multiscale consistency loss is designed to alleviate the negative effect of target samples with noisy pseudo labels and raise the capacity of visual prompts to spatial perturbations.
Please check your CUDA version and install the requirements with:
pip install -r requirements.txt
pip install mmcv-full==1.7.0 -f https://download.openmmlab.com/mmcv/dist/{cu_version}/torch1.10.0/index.html
git clone https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch.git
sh install.sh
Cityscapes: Please, download leftImg8bit_trainvaltest.zip and
gt_trainvaltest.zip from here
and extract them to data/cityscapes
.
GTA: Download all image and label packages from
here and extract
them to data/gta
.
Synthia: Please, download SYNTHIA-RAND-CITYSCAPES from
here and extract it to data/synthia
.
Data Preprocessing: Finally, please run the following commands to convert the label IDs to the train IDs:
python tools/convert_datasets/gta.py data/gta --nproc 8
python tools/convert_datasets/cityscapes.py data/cityscapes --nproc 8
python tools/convert_datasets/synthia.py data/synthia/ --nproc 8
(1) Download models pre-trained on ImageNet-1K and put them in model/
:
Swin-B
Mit-B5
(2) Then, a pre-training job can be launched as follows:
python pretrain.py <config_dir>
Please refer to launcher_pretrain.py
for all pretraining jobs
and all source models could be downloaded from baidu and Google Drive, and must be placed in model/
.
Then we can generate pseudo labels using:
python generate_pseudo_label.py --config <config_dir> --checkpoint <source_model_dir> --pseudo_label_dir <citiscapes_dir>/pretrain/<source_model_name>/train/
Please refer to launcher_pseudo_label.py
for all jobs.
After all preparations, we can train the final model by running:
python train.py <config_dir>
Please refer to launcher_train.py
for all training jobs.
We have provided all checkpoints for fast evaluations.
The checkpoints could be downloaded baidu and Google Drive, and must be placed in model/
.
python test.py <config_dir> <checkpoint_dir> --eval mIoU
Please refer to launcher_test.py
for all testing jobs.
Model | Pretraining | Backbone | GTA5 -> Cityscapes (mIoU19) | Synthia -> Cityscapes (mIoU16) | Synthia -> Cityscapes (mIoU13) |
---|---|---|---|---|---|
Ours | Standard Single Source | Swin-B | 56.2 | 52.6 | 59.4 |
Ours | Standard Single Source | MiT-B5 | 54.2 | 52.6 | 59.3 |
Ours | Source-GtA | Swin-B | 56.9 | 53.8 | 60.4 |
Ours | Source-GtA | MiT-B5 | 56.1 | 53.8 | 60.1 |
If this codebase is useful to you, please cite our work:
@article{ma2023uniuvpt,
title={When Visual Prompt Tuning Meets Source-Free Domain Adaptive Semantic Segmentation},
author={Xinhong Ma and Yiming Wang and Hao Liu and Tianyu Guo and Yunhe Wang},
journal={Advances in Neural Information Processing Systems},
year={2023},
}