🔥🔥🔥 Multimodal Large Language Models for Remote Sensing: A Survey
[Project Page]This Page |
School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University
✨✨✨ Behold our meticulously curated trove of RS-MLLMs resources!!!
🎉🚀💡 The website will be updated in real-time to track the latest state of RS-MLLMs!!!
📑📚🔍 Feast your eyes on an assortment of model architecture, training pipelines, datasets, comprehensive evaluation benchmarks, intelligent agents for remote sensing, techniques for instruction tuning, and much more.
🌟🔥📢 A collection of remote sensing multimodal large language model papers focusing on the vision-language domain.
In this repository, we will collect and document researchers and their outstanding work related to remote sensing multimodal large language model (vision-language).
- The list will be continuously updated 🔥🔥
- 📦 coming soon! 🚀
- May-22-2024: The first RS-MLLMs review manuscript has been submitted for review. 🔥🔥
Table of Contents
- Awesome Papers
- Awesome Datasets
- Latest Evaluation Benchmarks for Remote Sensing Vision-Language Tasks
Title | Venue | Date | Code | Note |
---|---|---|---|---|
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents W. Xu, Z. Yu, Y. Wang, J. Wang, and M. Peng. |
arXiv | 2024-06-11 | - | - |
GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots S. Singh, M. Fore, D. Stamoulis, and D. Group. |
arXiv | 2024-04-23 | - | - |
Evaluating Tool-Augmented Agents in Remote Sensing Platforms S. Singh, M. Fore, and D. Stamoulis. |
arXiv | 2024-04-23 | - | - |
Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis C. Liu, K. Chen, H. Zhang, Z. Qi, Z. Zou, and Z. Shi. |
arXiv | 2024-04-01 | Github | - |
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models H. Guo, X. Su, C. Wu, B. Du, L. Zhang, and D. Li. |
arXiv | 2024-01-17 | Github | - |
Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis S. Du, S. Tang, W. Wang, X. Li, and R. Guo. |
arXiv | 2023-10-07 | - | - |
Title | Venue | Date | Code | Note |
---|---|---|---|---|
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Z. Zhang, T. Zhao, Y. Guo, and J. Yin. |
arXiv | 2024-01-02 | Github | - |
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing F. Liu, D. Chen, Z. Guan, X. Zhou, J. Zhu, and J. Zhou. |
T-GRS | 2024-04-18 | Github | arXiv |
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment U. Mall, C. P. Phoo, M. K. Liu, C. Vondrick, B. Hariharan, and K. Bala. |
ICLR | 2024-01-16 | Project | arXiv |
RS-CLIP: Zero Shot Remote Sensing Scene Classification via Contrastive Vision-Language Supervision X. Li, C. Wen, Y. Hu, and N. Zhou. |
JAG | 2023-09-18 | Github | - |
Parameter-Efficient Transfer Learning for Remote Sensing Image–Text Retrieval Y. Yuan, Y. Zhan, and Z. Xiong. |
T-GRS | 2023-08-28 | Github | arXiv |
Title | Venue | Date | Code | Note |
---|---|---|---|---|
Towards Vision-Language Geo-Foundation Model: A Survey Y. Zhou, L. Feng, Y. Ke, X. Jiang, J. Yan, and W. Zhang. |
arXiv | 2024-06-13 | Github | arXiv |
Vision-Language Models in Remote Sensing: Current progress and future trends X. Li, C. Wen, Y. Hu, Z. Yuan, and X. X. Zhu. |
MGRS | 2024-04-22 | - | - |
Language Integration in Remote Sensing: Tasks, datasets, and future directions L. Bashmal, Y. Bazi, F. Melgani, M. M. Al Rahhal, and M. A. Al Zuair. |
MGRS | 2023-10-11 | - | - |
Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey L. Jiao et al. |
JSTARS | 2023-09-18 | - | - |
Title | Venue | Date | Code | Note |
---|---|---|---|---|
On the Foundations of Earth and Climate Foundation Models X. X. Zhu et al. |
arXiv | 2024-05-07 | Github | - |
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications C. Tan et al. |
arXiv | 2023-12-23 | - | - |
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs J. Roberts, T. Lüddecke, R. Sheikh, K. Han, and S. Albanie. |
arXiv | 2023-11-24 | Github | - |
The Potential of Visual ChatGPT for Remote Sensing L. P. Osco, E. L. de Lemos, W. N. Gonçalves, A. P. M. Ramos, and J. Marcato Junior. |
Remote Sensing | 2023-06-22 | - | - |
Title | Venue | Date | Code | Note |
---|---|---|---|---|
ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing Z. Yuan, Z. Xiong, L. Mou, and X. X. Zhu. |
arXiv | 2024-02-17 | - | - |
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Z. Zhang, T. Zhao, Y. Guo, and J. Yin. |
arXiv | 2024-01-02 | Github | - |
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing Z. Wang, R. Prabha, T. Huang, J. Wu, and R. Rajagopal. |
AAAI | 2024-03-24 | Github | arXiv |
If you have any questions about this project, please feel free to contact zhanyangnwpu@gmail.com.