SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University
This is the official repository for paper "SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model". [paper] [SkyEye-968k]
You can focus on remote sensing multimodal large language model (Vision-Language) here
This is an ongoing project. We will be working on improving it.
- 📦 Chatbot, codebase, datasets, and models coming soon! 🚀
- Jun-12-2024: RS instruction dataset SkyEye-968k is released. [huggingface] 🔥🔥
- Jan-18-2024: paper is released. 🔥🔥
- Jan-17-2024: A curated list about remote sensing multimodal large language model (Vision-Language) is created. 🔥🔥
The online demo will be released.
The model and checkpoint are coming soon! 🚀
The download link of the unified remote sensing vision-language instruction dataset is here! 🚀
Download link: https://huggingface.co/datasets/ZhanYang-nwpu/SkyEye-968k
@misc{zhan2024skyeyegpt,
title={SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model},
author={Yang Zhan and Zhitong Xiong and Yuan Yuan},
year={2024},
eprint={arXiv:2401.09712},
archivePrefix={arXiv}
}
Our code is based on MiniGPT-4, shikra, and MiniGPT-v2. We sincerely appreciate their contributions and authors for releasing source codes. We are thankful to EVA and LLaMA2 for releasing their models as open-source contributions. I would like to thank Xiong zhitong and Yuan yuan for helping the manuscript. I also thank the School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University for supporting this work.
If you have any questions about this project, please feel free to contact zhanyangnwpu@gmail.com.