Di Wang1 ∗, Meiqi Hu1 ∗, Yao Jin1 ∗, Yuchun Miao1 ∗, Jiaqi Yang1 ∗, Yichu Xu1 ∗, Xiaolei Qin1 ∗, Jiaqi Ma1 ∗, Lingyu Sun1 ∗, Chenxing Li1 ∗, Chuan Fu2, Hongruixuan Chen3, Chengxi Han1 †, Naoto Yokoya3, Jing Zhang1 †, Minqiang Xu4, Lin Liu4, Lefei Zhang1, Chen Wu1 †, Bo Du1 †, Dacheng Tao5, Liangpei Zhang1 †
1 Wuhan University, 2 Chongqing University, 3 The University of Tokyo, 4 National Engineering Research Center of Speech and Language Information Processing, 5 Nanyang Technological University.
∗ Equal contribution, † Corresponding author
Update | Overview | Datasets | Pretrained Models | Usage | Statement
2024.10.22
-
Scripts for Image Super-Resolution.
-
Checkpoints for Image Denoising.
2024.07.18
-
Models can be downloaded from both Baidu Drive (百度网盘) and Hugging Face 🤗.
-
Datasets for HSI denoising have been released for research use only. Please check it here.
2024.06.18
- The paper is post on arxiv!(arXiv 2406.11519)
HyperSIGMA is the first billion-level foundation model specifically designed for HSI interpretation. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module.
Figure 1. Framework of HyperSIGMA.
Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA’s versatility and superior representational capability compared to current state-of-the-art methods. It outperforms advanced models like SpectralGPT, even those specifically designed for these tasks.
Figure 2. HyperSIGMA demonstrates superior performance across 16 datasets and 7 tasks, including both high-level and low-level hyperspectral tasks, as well as multispectral scenes.
To train the foundational model, we collected hyperspectral remote sensing image samples from around the globe, constructing a large-scale hyperspectral dataset named HyperGlobal-450K for pre-training. HyperGlobal-450K contains over 20 million three-band images, far exceeding the scale of existing hyperspectral datasets.
Figure 3. The distribution of HyperGlobal-450K samples across the globe, comprising 1,701 images (1,486 EO-1 and 215 GF-5B) with hundreds of spectral bands.
Pretrain | Backbone | Model Weights |
---|---|---|
Spatial_MAE | ViT-B | Baidu Drive & Hugging Face |
Spatial_MAE | ViT-L | Baidu Drive & Hugging Face |
Spatial_MAE | ViT-H | Baidu Drive & Hugging Face |
Spectral_MAE | ViT-B | Baidu Drive & Hugging Face |
Spectral_MAE | ViT-L | Baidu Drive & Hugging Face |
Spectral_MAE | ViT-H | Baidu Drive & Hugging Face |
We pretrain the HyperSIGMA with SLURM. This is an example of pretraining the large version of Spatial ViT:
srun -J spatmae -p xahdnormal --gres=dcu:4 --ntasks=64 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python main_pretrain_Spat.py \
--model 'spat_mae_l' --norm_pix_loss \
--data_path [pretrain data path] \
--output_dir [model saved patch] \
--log_dir [log saved path] \
--blr 1.5e-4 --batch_size 32 --gpu_num 64 --port 60001
Another example of pretraining the huge version of Spectral ViT:
srun -J specmae -p xahdnormal --gres=dcu:4 --ntasks=128 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python main_pretrain_Spec.py \
--model 'spec_mae_h' --norm_pix_loss \
--data_path [pretrain data path] \
--output_dir [model saved patch] \
--log_dir [log saved path] \
--blr 1.5e-4 --batch_size 16 --gpu_num 128 --port 60004 --epochs 1600 --mask_ratio 0.75 \
--use_ckpt 'True'
The training can be recovered by setting --resume
--resume [path of saved model]
Image Classification:
Please refer to ImageClassification-README.
Target Detection & Anomaly Detection:
Please refer to HyperspectralDetection-README.
Change Detection:
Please refer to ChangeDetection-README.
Spectral Unmixing:
Please refer to HyperspectralUnmixing-README.
Denoising:
Please refer to Denoising-README.
Super-Resolution:
Please refer to SR-README.
Multispectral Change Detection:
Please refer to MultispectralCD-README.
If you find HyperSIGMA helpful, please consider giving this repo a ⭐ and citing:
@article{hypersigma,
title={HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model},
author={Wang, Di and Hu, Meiqi and Jin, Yao and Miao, Yuchun and Yang, Jiaqi and Xu, Yichu and Qin, Xiaolei and Ma, Jiaqi and Sun, Lingyu and Li, Chenxing and Fu, Chuan and Chen, Hongruixuan and Han, Chengxi and Yokoya, Naoto and Zhang, Jing and Xu, Minqiang and Liu, Lin and Zhang, Lefei and Wu, Chen and Du, Bo and Tao, Dacheng and Zhang, Liangpei},
journal={arXiv preprint arXiv:2406.11519},
year={2024}
}
For any other questions please contact di.wang at gmail.com or whu.edu.cn, and chengxi.han at whu.edu.cn.
This project is based on MMCV, MAE, Swin Transformer, VSA, RVSA, DAT, HTD-IRN, GT-HAD, MSDformer, SST-Former, SST, CNNAEU and DeepTrans. Thanks for their wonderful work!