This repository provides the official PyTorch source code for our paper:
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
🔗 Project page: https://lisat-bair.github.io/LISAt/
Authors:
Jerome Quenum*, Wen-Han Hsieh*, Tsung-Han Wu, Ritwik Gupta, Trevor Darrell, David M. Chan
(* equal contribution, UC Berkeley)
Reading satellite images isn't just about identifying objects—it's about understanding their context, relationships, and sometimes even the absurdity of what humans ask AI to locate.
Enter LISAT, your AI-powered geospatial detective, trained to not only recognize but also reason about objects in satellite imagery. Whether it’s detecting urban expansion or identifying a suspiciously duck-shaped lake, LISAT delivers intelligent, nuanced segmentation and captioning from satellite views.
-
GRES (Geospatial Reasoning Segmentation):
27,615 segmentation annotations over 9,205 images. -
PreGRES:
A large-scale multimodal pretraining dataset with over 1 million QA pairs grounded in satellite imagery.
LISAT outperforms prior models like RS-GPT4V with:
- +10.04% improvement in BLEU-4 (image captioning)
- +143.36% improvement in gIoU (segmentation)
- 2025-03-22: Released training, evaluation, demo scripts, pretrained checkpoints, and full datasets.
- OS: Linux
- GPU: NVIDIA A100 recommended (for FlashAttention)
- Python: 3.9
# Step 1: Create Python environment
conda create -n lisat python=3.9
conda activate lisat
# Step 2: Install dependencies
pip install pybind11==2.11.1
# install torch, torchvision as best fit for your system
pip install -r requirements.txt
pip install flash-attn --no-build-isolation # Required for FlashAttention
# Step 3: Install evaluation metrics for image-captioning, vqa
# install https://pypi.org/project/pycocoevalcap/LISAT-7B is specifically trained for geospatial reasoning segmentation tasks. Below are gIoU & cIoU score of LISAT-7B.
| Model Name | LMM | HG-ckpt URL | gIoU | cIoU |
|---|---|---|---|---|
| LISAt-7B | LISAT-PRE | jquenum/LISAt-7b | 27.5 | 24.5 |
LISAT_PRE-7B is specifically trained for geospatial image-captioning & visual question answering tasks. Below are BLEU-4 score of LISAT_PRE-7B.
| Model Name | HG-ckpt URL | UCM-Captions | NWPU-Captions | Sydney-Captions | Sydney-Captions |
|---|---|---|---|---|---|
| LISAT_PRE-7B | jquenum/LISAt_PRE-7B | 72.3 | 65.8 | 54.2 | 36.1 |
RemoteCLIP is required for both LISAT-7B, LISAT_PRE-7B: wen-han/remote_clip_vit_l_14
Visit our Dataset page for more details.
bash train_lisat.sh [ReferSeg or ReasonSeg] [Deepspeed GPU Settings] [MASTERPORT]
# Example:
bash train_lisat.sh ReasonSeg localhost:0,1 15990bash merge_lora_weight.shbash eval_lisat.shbash pred_lisat_vqa.pycd eval_lisat_pre
bash eval_captioning.shLISAT builds upon foundational work from:
We thank the open-source community for its contributions.
If you use LISAT, its datasets, or any part of this repository in your work, please consider citing our paper:
@article{quenum2025lisat,
title={LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery},
author={Quenum, Jerome and Hsieh, Wen-Han and Wu, Tsung-Han and Gupta, Ritwik and Darrell, Trevor and Chan, David M},
journal={arXiv preprint arXiv:2505.02829},
year={2025}
}