Shuguo Jiang1,Fang Xu2,Chuandong Liu 1,Hong Tan3,4,Shengyang Li3,4,Lei Yu2,Wen Yang5,Sen Jia6,Gui-Song Xia2
1School of Computer Science, Wuhan University 2School of Artificial Intelligence, Wuhan University 3Technology and Engineering Center for Space Utilization, Chinese Academy of Science 4Key Laboratory of Space Utilization, Chinese Academy of Science 5School of Electronic Information, Wuhan University 6College of Computer Science and Software Engineering, Shenzhen University
Remote sensing change detection based on a map reference and an up-to-date image boosts timely observation of the Earth's surface when earlier images are lacking for comparison. However, the semantic gap between high-level map categories and low-level image details hinders the extraction of homogeneous features for robust temporal association in change detection.
Unlike conventional approaches that either compare pixel-level visual similarity or propagate segmentation errors, we propose LaVIDE, a novel language-vision discriminator that bridges the semantic gap between high-level map categories and low-level image details by leveraging language as an intermediary. Specifically, we introduce {\it restricted prompt learning} to generate context-aware textual prompts that align map semantics with image content, and an {\it object-aware embedding enhancement} strategy to integrate object-level attributes (e.g., shape, boundary) into map representations. These components enable robust cross-modal alignment within a unified language-vision feature space. Extensive experiments on four benchmarks—DynamicEarthNet, HRSCD, BANDON, and SECOND—demonstrate that LaVIDE outperforms state-of-the-art methods by significant margins, achieving
- Python >= 3.9
- See
requirements.txt
We provide all scripts for pre-processing on DynamicEarthNet, HRSCD, BANDON, and SECOND.
Please download and place all datasets in the ./data folder.
- DynamicEarthNet
python tools/convert_datasets/create_dynearthnet_tiles.py --data_dir ./data/DynamicEarthNet --out_dir ./data/DynamicEarthNet/tile512 --tile_size 512
- HRSCD
python tools/convert_datasets/create_hrscd_tiles.py --data_dir ./data/HRSCD --out_dir ./data/HRSCD/tile512 --tile_size 512
- BANDON
python tools/convert_datasets/create_bandon_tiles.py --data_dir ./data/BANDON --out_dir ./data/BANDON/tile512 --tile_size 512
- SECOND
python tools/convert_datasets/create_second_tiles.py --data_dir ./data/SECOND --out_dir ./data/SECOND/tile512 --tile_size 512
- Training
python ./tools/train.py configs/cross_modal_bcd/dynamicearthnet/lavide.yaml --work-dir runs/cross_modal_bcd/dynamicearthnet/lavide
- Testing
python tools/test.py configs/cross_modal_bcd/dynamicearthnet/lavide.yaml --checkpoint ./path/to/checkpoint.pth --eval BC BC_precision BC_recall SC SCS mIoU --samples-per-gpu=1
LaVIDE is built on the top of several outstanding open-source projects. We are extremely grateful for the contributions of these projects and their communities, whose hard work has greatly propelled the development of the field and enabled our work to be realized.

