Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data

This is the official repository for the paper "Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data".

Authors: Haozhe Si, Yuxuan Wan, Minh Do, Deepak Vasisht, Han Zhao, Hendrik F. Hamann.

Overview

This repository provides the implementation of the Low-rank Efficient Spatial-Spectral Vision Transformer (LESS ViT). LESS ViT is a scalable, efficient vision transformer designed specifically for multi-modal and hyperspectral geospatial raster data.

LESS ViT

GFM-Bench

We also constructed a comprehensive benchmark for geospatial raster data, GFM-Bench, which incorporates 3 classification datasets and 4 semantic segmentation datasets. For more detailed information about GFM-Bench, please see and also our GitHub repository .

Pre-training

We pre-trained the LESS ViT model using Hyper-MAE on the SSL4EO-S12 dataset for 300 epochs (on 8 × NVIDIA A6000 GPUs).

To launch pre-training, use launch_train.sh script by running:

bash launch_train.sh

and please refer to GeospatialFM/finetune/args.py(click here) for more detailed descriptions of arguments in the script.

Fine-tuning

We fine-tuned and evaluated our pre-trained LESS ViT on GFM-Bench. For detailed implementation of fine-tuning experiments, please refer to Appendix C section in our paper.

To launch fine-tuning experiments, run the following command:

python3 sweep_finetune.py \
    --dataset ${DATASET_NAME} \
    --root_dir ${ROOT_DIR}
    --modal ${MODAL} \

To launch linear probing experiments, run the following command:

python3 sweep_finetune.py \
    --dataset ${DATASET_NAME} \
    --root_dir ${ROOT_DIR}
    --modal ${MODAL} \
    --lp

--dataset ${DATASET_NAME}: Name of the dataset to use.
--root_dir ${ROOT_DIR}: Directory to save checkpoints and results.
--modal ${MODAL}: The fine-tuning modality (radar optical, or multi). Note: currently only BigEartnet and DFC2020 datasets support radar or multi fine-tuning in GFM-Bench.

Model Weights

We will be uploading pre-trained model checkpoints soon. Stay tuned! 😀

Citation

If you found our project helpful, please cite our paper:

@misc{si2025scalablefoundationmodelmultimodal,
      title={Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data}, 
      author={Haozhe Si and Yuxuan Wan and Minh Do and Deepak Vasisht and Han Zhao and Hendrik F. Hamann},
      year={2025},
      eprint={2503.12843},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.12843}, 
}

Contact Authors

Haozhe Si, Han Zhao

Name		Name	Last commit message	Last commit date
Latest commit History 318 Commits
.github/workflows		.github/workflows
GeospatialFM		GeospatialFM
assets		assets
data		data
.gitignore		.gitignore
README.md		README.md
collect_results.ipynb		collect_results.ipynb
launch_train.sh		launch_train.sh
pretrain_example.ipynb		pretrain_example.ipynb
pyrightconfig.json		pyrightconfig.json
results		results
sweep_finetune.py		sweep_finetune.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data

Overview

GFM-Bench

Pre-training

Fine-tuning

Model Weights

Citation

Contact Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

uiuctml/GeospatialFM

Folders and files

Latest commit

History

Repository files navigation

Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data

Overview

GFM-Bench

Pre-training

Fine-tuning

Model Weights

Citation

Contact Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages