Linrui Xu, Ling Zhao, Wang Guo, Qiujun Li, Kewang Long, Kaiqi Zou, Yuhan Wang, Haifeng Li☨
We will be releasing the complete dataset, scripts, and model weights soon!
- [2024/06/18]: 🔥 Our paper now is available at arxiv.
- [2024/06/25]: 🔥 Our data will be released to huggingface soon.
- [2024/07/01]: 🔥 Our data has been released to onedrive.
RS-GPT4V integrates advanced tasks using both vision and language data. The dataset facilitates complex reasoning and detailed understanding of remote sensing images through multimodal instruction-following formats. Below are visual representations of the dataset's principles and structure:
Evolution from simple remote sensing tasks to complex instruction-based tasks using multimodal data.
Illustrates the dataset's design principles focusing on unity, diversity, correctness, complexity, richness, and robustness.
The construction process follows a structured approach integrating data collection, instruction-response generation, and instruction-annotation adaptation.
If you find RS-GPT4V useful for your research and applications, please cite using this BibTeX:
@ARTICLE{10197260,
author={Xu, Linrui and Guo, Wang and Li, Qiujun and Long, Kewang and Zou, Kaiqi and Wang, Yuhan and Li, Haifeng},
title={RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding},
year={2024},
volume={},
number={},
pages={1-14},
journal={arXiv},
doi={https://arxiv.org/abs/2406.12479}
}