Skip to content

uiuctml/GeospatialFM

Repository files navigation

Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data

Project Page arXiv Hugging Face GitHub

This is the official repository for the paper "Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data".

Authors: Haozhe Si, Yuxuan Wan, Minh Do, Deepak Vasisht, Han Zhao, Hendrik F. Hamann.

Overview

This repository provides the implementation of the Low-rank Efficient Spatial-Spectral Vision Transformer (LESS ViT). LESS ViT is a scalable, efficient vision transformer designed specifically for multi-modal and hyperspectral geospatial raster data.

workflow

LESS ViT

GFM-Bench

We also constructed a comprehensive benchmark for geospatial raster data, GFM-Bench, which incorporates 3 classification datasets and 4 semantic segmentation datasets. For more detailed information about GFM-Bench, please see Hugging Face and also our GitHub repository GitHub.

Pre-training

We pre-trained the LESS ViT model using Hyper-MAE on the SSL4EO-S12 dataset for 300 epochs (on 8 × NVIDIA A6000 GPUs).

To launch pre-training, use launch_train.sh script by running:

bash launch_train.sh

and please refer to GeospatialFM/finetune/args.py(click here) for more detailed descriptions of arguments in the script.

Fine-tuning

We fine-tuned and evaluated our pre-trained LESS ViT on GFM-Bench. For detailed implementation of fine-tuning experiments, please refer to Appendix C section in our paper.

To launch fine-tuning experiments, run the following command:

python3 sweep_finetune.py \
    --dataset ${DATASET_NAME} \
    --root_dir ${ROOT_DIR}
    --modal ${MODAL} \

To launch linear probing experiments, run the following command:

python3 sweep_finetune.py \
    --dataset ${DATASET_NAME} \
    --root_dir ${ROOT_DIR}
    --modal ${MODAL} \
    --lp
  • --dataset ${DATASET_NAME}: Name of the dataset to use.
  • --root_dir ${ROOT_DIR}: Directory to save checkpoints and results.
  • --modal ${MODAL}: The fine-tuning modality (radar optical, or multi). Note: currently only BigEartnet and DFC2020 datasets support radar or multi fine-tuning in GFM-Bench.

Model Weights

We will be uploading pre-trained model checkpoints soon. Stay tuned! 😀

Citation

If you found our project helpful, please cite our paper:

@misc{si2025scalablefoundationmodelmultimodal,
      title={Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data}, 
      author={Haozhe Si and Yuxuan Wan and Minh Do and Deepak Vasisht and Han Zhao and Hendrik F. Hamann},
      year={2025},
      eprint={2503.12843},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.12843}, 
}

Contact Authors

Haozhe Si, Han Zhao

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •