|🗺️ Climate Time Series Data | 🛰️ Satellite Image Data | 📖 Paper |
ClimateBench-M is the first multi-modal climate benchmark designed to support the development of artificial general intelligence (AGI) in climate applications. It aligns data across three critical modalities at a unified spatio-temporal resolution:
- Time-series climate variables from ERA5.
- Extreme weather event records from NOAA.
- Satellite imagery from NASA HLS.
To facilitate broad accessibility and reproducibility, ClimateBench-M is hosted on 🤗 Hugging Face Datasets. Due to its size and multi-modal nature, the dataset is split into two parts:
- ClimateBench-M-TS: Climate time series with extreme events aligned and labeled.
- ClimateBench-M-IMG: Satellite imagery data
This separation allows users to download only the modalities relevant to their tasks. To download, please ensure you have the Hugging Face CLI installed:
huggingface-cli login
python scripts/dataset_download.pyThe data is downloaded into Data/ folder by default. If you change the path to download data, please also change the data loading path for downstream tasks.
We provide a pre-built Docker image for ease of use. With Docker installed,
docker pull violet24k/climatebench-m-ts:latest
docker run --gpus all -it -v .:/workspace -w /workspace violet24k/climatebench-m-ts:latest bashAlternatively, you can set up the environment manually by
conda create -n climatebench-m-ts python=3.11.11
conda activate climatebench-m-ts
# install torch, example:
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
# install other dependencies
pip install pandas geopandas scikit-learn pyarrow rasterio matplotlib huggingface_hubThe previous download gives the raw data for more flexible use case. For the weather forecasting task, you can download our processed data from the downloaded raw data or process the data by yourself.
# download our processed data
python scripts/dataset_download_ts_processed.py
# OR process the data by yourself
python Task/weather_forecasting/weather_data_processing.pyThen, you can create your own model under Task/weather_forecasting/YOUR_MODEL_NAME. We provide a generative model baseline SGM_Time_Series for reference. Our generative model first discovers temporally fine-grained causal relationships from the processed data.
# Might take several hours to execute
python Task/weather_forecasting/SGM_Time_Series/finding_causality.py
# OR download our trained causality files
python scripts/dataset_download_ts_savedmodules.pyAfter the causality (0_K_time_conv.pt, 0_K_feature_encoder.pt, 0_K_feature_decoder.pt, 0_K_best_ELBO_graph_seq.npy) is stored in saved_modules folder, one can train and evaluate our SGM model by
python Task/weather_forecasting/SGM_Time_Series/forecasting.py
python Task/weather_forecasting/SGM_Time_Series/evaluation.pyThe anomaly detection task uses the same climate time series data as the weather forecasting task, with additional anomaly labels included. When you run Task/weather_forecasting/SGM_Time_Series/evaluation.py, it will evaluate performance on both weather forecasting and anomaly detection tasks.
For image tasks, following the NASA IMPACT repo, we need openmim, mmcv, and mmsegmentation. We also provide a pre-built Docker image for ease of use:
docker pull violet24k/climatebench-m-img:latest
docker run --gpus all -it -v .:/workspace -w /workspace violet24k/climatebench-m-img:latest bashAlternatively, to set up the environment manually,
cd Task/crop_segmentation/SGM_Image
conda create -n climatebench-m-img python==3.9
conda activate climatebench-m-img
pip install torch==1.11.0+cu115 torchvision==0.12.0+cu115 --extra-index-url https://download.pytorch.org/whl/cu115
pip install -e .
pip install -U openmim
mim install mmcv-full==1.6.2 -f https://download.openmmlab.com/mmcv/dist/cu115/torch1.11.0/index.html
pip install numpy==1.24.0For more details of the installed packages, please refer to PyTorch, openmim, MMCV, and MMSegmentation. Note: The original NASA IMPACT repo uses mmcv 1.x and mmsegmentation 0.x. While these are older versions, upgrading to newer releases like mmcv 2.x requires significant changes due to the introduction of MMEngine.
ClimateBench-M-IMG provides raw satellite image data by default for maximal flexibility. To generate train/validation splits and prepares inputs for model,
# in working directory Task/crop_segmentation/SGM_Image
python image_data_processing.pyTo improve performance, we recommend initializing the MAE-based generative model with pretrained weights from IBM’s Prithvi Foundation Model with
git clone https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-1.0-100MBefore running, please double-check Task/crop_segmentation/SGM_Image/configs/multi_temporal_crop_classification.py, especially lines 15, 49, 59, as mim may not properly resolve os.path.
To train and evaluate the MAE-backboned generative model,
# Train the model
mim train mmsegmentation configs/multi_temporal_crop_classification.py
# Evaluate the model (replace with actual checkpoint path)
mim test mmsegmentation configs/multi_temporal_crop_classification.py \
--checkpoint path_to_checkpoint_model.pth --eval "mIoU"If you find this repository useful in your research, please consider citing the following paper:
@article{fu2025climatebench,
title={ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method},
author={Fu, Dongqi and Zhu, Yada and Liu, Zhining and Zheng, Lecheng and Lin, Xiao and Li, Zihao and Fang, Liri and Tieu, Katherine and Bhardwaj, Onkar and Weldemariam, Kommy and others},
journal={arXiv preprint arXiv:2504.07394},
year={2025}
}
