Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation

Zhe Dong¹, Yuzhe Sun¹, Yanfeng Gu¹ Tianzhu Liu¹,

¹

🗓️ TODO

Release code and models of our methods.
[2024.10.11] We release the RISBench, a large-scale Vision-Language Benchmark for Referring Remote Sensing Image Segmentation.

📖 Abstract

Given a natural language expression and a remote sensing image, the goal of referring remote sensing image segmentation (RRSIS) is to generate a pixel-level mask of the target object identified by the referring expression. In contrast to natural scenarios, expressions in RRSIS often involve complex geospatial relationships, with target objects of interest that vary significantly in scale and lack visual saliency, thereby increasing the difficulty of achieving precise segmentation. To address the aforementioned challenges, a novel RRSIS framework is proposed, termed the cross-modal bidirectional interaction model (CroBIM). Specifically, a context-aware prompt modulation (CAPM) module is designed to integrate spatial positional relationships and task-specific knowledge into the linguistic features, thereby enhancing the ability to capture the target object. Additionally, a language-guided feature aggregation (LGFA) module is introduced to integrate linguistic information into multi-scale visual features, incorporating an attention deficit compensation mechanism to enhance feature aggregation. Finally, a mutual-interaction decoder (MID) is designed to enhance cross-modal feature alignment through cascaded bidirectional cross-attention, thereby enabling precise segmentation mask prediction. To further forster the research of RRSIS, we also construct RISBench, a new large-scale benchmark dataset comprising 52,472 image-language-label triplets. Extensive benchmarking on RISBench and two other prevalent datasets demonstrates the superior performance of the proposed CroBIM over existing state-of-the-art (SOTA) methods.

📗Datasets

RISBench is a large-scale Vision-Language Benchmark for Referring Remote Sensing Image Segmentation. It comprises 52,472 high-quality image-language label triplets. Each image in RISBench is uniformly sized at 512x512 pixels, maintaining consistency across the dataset. The spatial resolution of the images spans from 0.1m to 30m, encompassing a diverse range of scales and details. The semantic labels are categorized into 26 distinct classes, each annotated with 8 attributes, thereby facilitating a comprehensive and nuanced semantic segmentation analysis.

The dataset can be downloaded from Baidu Netdisk (access code: wnxg).

🍺 Visualizations

❤️Licensing Information

The dataset is released under the CC-BY-4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

📜 Citation

if you find it helpful, please cite

@article{dong2024cross,
  title={Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation},
  author={Zhe Dong, Yuzhe Sun, Yanfeng Gu and Tianzhu Liu},
  journal={arXiv:2410.08613},
  year={2024}
}

🙏 Acknowledgement

Our RISBench dataset is built based on VRSBench, DOTA-v2 and DIOR datasets.

We are thankful to LAVT, and RMSIN for releasing their models and code as open-source contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation

Zhe Dong¹, Yuzhe Sun¹, Yanfeng Gu¹ Tianzhu Liu¹,

🗓️ TODO

📖 Abstract

📗Datasets

🍺 Visualizations

❤️Licensing Information

📜 Citation

🙏 Acknowledgement

About

Uh oh!

Releases

Packages

License

HIT-SIRS/CroBIM

Folders and files

Latest commit

History

Repository files navigation

Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation

Zhe Dong1, Yuzhe Sun1, Yanfeng Gu1 Tianzhu Liu1,

🗓️ TODO

📖 Abstract

📗Datasets

🍺 Visualizations

❤️Licensing Information

📜 Citation

🙏 Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Zhe Dong¹, Yuzhe Sun¹, Yanfeng Gu¹ Tianzhu Liu¹,

Packages