ClimateBench-M: A Multi-modal Climate Data Benchmark

|🗺️ Climate Time Series Data | 🛰️ Satellite Image Data | 📖 Paper |

ClimateBench-M is the first multi-modal climate benchmark designed to support the development of artificial general intelligence (AGI) in climate applications. It aligns data across three critical modalities at a unified spatio-temporal resolution:

Time-series climate variables from ERA5.
Extreme weather event records from NOAA.
Satellite imagery from NASA HLS.

📦 Dataset Download

To facilitate broad accessibility and reproducibility, ClimateBench-M is hosted on 🤗 Hugging Face Datasets. Due to its size and multi-modal nature, the dataset is split into two parts:

ClimateBench-M-TS: Climate time series with extreme events aligned and labeled.
ClimateBench-M-IMG: Satellite imagery data

This separation allows users to download only the modalities relevant to their tasks. To download, please ensure you have the Hugging Face CLI installed:

huggingface-cli login
python scripts/dataset_download.py

The data is downloaded into Data/ folder by default. If you change the path to download data, please also change the data loading path for downstream tasks.

🌤️ Weather Forecasting

🛠 Environment Setup

We provide a pre-built Docker image for ease of use. With Docker installed,

docker pull violet24k/climatebench-m-ts:latest
docker run --gpus all -it -v .:/workspace -w /workspace violet24k/climatebench-m-ts:latest bash

Alternatively, you can set up the environment manually by

conda create -n climatebench-m-ts python=3.11.11
conda activate climatebench-m-ts
# install torch, example:
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
# install other dependencies
pip install pandas geopandas scikit-learn pyarrow rasterio matplotlib huggingface_hub

🧹 Data Preprocessing

The previous download gives the raw data for more flexible use case. For the weather forecasting task, you can download our processed data from the downloaded raw data or process the data by yourself.

# download our processed data
python scripts/dataset_download_ts_processed.py
# OR process the data by yourself
python Task/weather_forecasting/weather_data_processing.py

🔮 Our Generative Model For Weather Forecasting

Then, you can create your own model under Task/weather_forecasting/YOUR_MODEL_NAME. We provide a generative model baseline SGM_Time_Series for reference. Our generative model first discovers temporally fine-grained causal relationships from the processed data.

# Might take several hours to execute
python Task/weather_forecasting/SGM_Time_Series/finding_causality.py
# OR download our trained causality files
python scripts/dataset_download_ts_savedmodules.py

After the causality (0_K_time_conv.pt, 0_K_feature_encoder.pt, 0_K_feature_decoder.pt, 0_K_best_ELBO_graph_seq.npy) is stored in saved_modules folder, one can train and evaluate our SGM model by

python Task/weather_forecasting/SGM_Time_Series/forecasting.py
python Task/weather_forecasting/SGM_Time_Series/evaluation.py

⛈️ Anomaly Detection

The anomaly detection task uses the same climate time series data as the weather forecasting task, with additional anomaly labels included. When you run Task/weather_forecasting/SGM_Time_Series/evaluation.py, it will evaluate performance on both weather forecasting and anomaly detection tasks.

🌾 Crop Segmentation

🛠 Environment Setup

For image tasks, following the NASA IMPACT repo, we need openmim, mmcv, and mmsegmentation. We also provide a pre-built Docker image for ease of use:

docker pull violet24k/climatebench-m-img:latest
docker run --gpus all -it -v .:/workspace -w /workspace violet24k/climatebench-m-img:latest bash

Alternatively, to set up the environment manually,

cd Task/crop_segmentation/SGM_Image
conda create -n climatebench-m-img python==3.9
conda activate climatebench-m-img
pip install torch==1.11.0+cu115 torchvision==0.12.0+cu115 --extra-index-url https://download.pytorch.org/whl/cu115
pip install -e .
pip install -U openmim
mim install mmcv-full==1.6.2 -f https://download.openmmlab.com/mmcv/dist/cu115/torch1.11.0/index.html
pip install numpy==1.24.0

For more details of the installed packages, please refer to PyTorch, openmim, MMCV, and MMSegmentation. Note: The original NASA IMPACT repo uses mmcv 1.x and mmsegmentation 0.x. While these are older versions, upgrading to newer releases like mmcv 2.x requires significant changes due to the introduction of MMEngine.

📂 Data Split

ClimateBench-M-IMG provides raw satellite image data by default for maximal flexibility. To generate train/validation splits and prepares inputs for model,

# in working directory Task/crop_segmentation/SGM_Image
python image_data_processing.py

To improve performance, we recommend initializing the MAE-based generative model with pretrained weights from IBM’s Prithvi Foundation Model with

git clone https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-1.0-100M

⚠️ Important: After cloning, check that the .pt files are ~400MB. If they are tiny, they may be Git LFS pointers — in that case, install Git LFS and run git lfs pull.

⚙️ Training & Evaluation

Before running, please double-check Task/crop_segmentation/SGM_Image/configs/multi_temporal_crop_classification.py, especially lines 15, 49, 59, as mim may not properly resolve os.path.

To train and evaluate the MAE-backboned generative model,

# Train the model
mim train mmsegmentation configs/multi_temporal_crop_classification.py

# Evaluate the model (replace with actual checkpoint path)
mim test mmsegmentation configs/multi_temporal_crop_classification.py \
    --checkpoint path_to_checkpoint_model.pth --eval "mIoU"

📖 Cite

If you find this repository useful in your research, please consider citing the following paper:

@article{fu2025climatebench,
  title={ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method},
  author={Fu, Dongqi and Zhu, Yada and Liu, Zhining and Zheng, Lecheng and Lin, Xiao and Li, Zihao and Fang, Liri and Tieu, Katherine and Bhardwaj, Onkar and Weldemariam, Kommy and others},
  journal={arXiv preprint arXiv:2504.07394},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data		Data
Task		Task
assets		assets
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

ClimateBench-M: A Multi-modal Climate Data Benchmark

📦 Dataset Download

🌤️ Weather Forecasting

🛠 Environment Setup

🧹 Data Preprocessing

🔮 Our Generative Model For Weather Forecasting

⛈️ Anomaly Detection

🌾 Crop Segmentation

🛠 Environment Setup

📂 Data Split

⚙️ Training & Evaluation

📖 Cite

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

iDEA-iSAIL-Lab-UIUC/ClimateBench-M

Folders and files

Latest commit

History

Repository files navigation

ClimateBench-M: A Multi-modal Climate Data Benchmark

📦 Dataset Download

🌤️ Weather Forecasting

🛠 Environment Setup

🧹 Data Preprocessing

🔮 Our Generative Model For Weather Forecasting

⛈️ Anomaly Detection

🌾 Crop Segmentation

🛠 Environment Setup

📂 Data Split

⚙️ Training & Evaluation

📖 Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages