Zhe Dong1, Yuzhe Sun1, Yanfeng Gu1 Tianzhu Liu1,
-
1Harbin Institute of Technology
- Release code and models of our methods.
- [2024.10.11] We release the RISBench, a large-scale Vision-Language Benchmark for Referring Remote Sensing Image Segmentation.
RISBench is a large-scale Vision-Language Benchmark for Referring Remote Sensing Image Segmentation. It comprises 52,472 high-quality image-language label triplets. Each image in RISBench is uniformly sized at 512x512 pixels, maintaining consistency across the dataset. The spatial resolution of the images spans from 0.1m to 30m, encompassing a diverse range of scales and details. The semantic labels are categorized into 26 distinct classes, each annotated with 8 attributes, thereby facilitating a comprehensive and nuanced semantic segmentation analysis.
The dataset can be downloaded from Baidu Netdisk (access code: wnxg).
The dataset is released under the CC-BY-4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
if you find it helpful, please cite
@article{dong2024cross,
title={Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation},
author={Zhe Dong, Yuzhe Sun, Yanfeng Gu and Tianzhu Liu},
journal={arXiv:2410.08613},
year={2024}
}Our RISBench dataset is built based on VRSBench, DOTA-v2 and DIOR datasets.
We are thankful to LAVT, and RMSIN for releasing their models and code as open-source contributions.