Skip to content

OpenDCAI/MorphoBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning

🤗 Dataset (Hugging Face) 📑 Paper (arXiv:2510.14265)

MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning

Xukai Wang*, Xuanbo Liu*, Mingrui Chen*, Haitian Zhong*, Xuanlin Yang*, Bohan Zeng, Jinbo Hu, Hao Liang, Junbo Niu, Xuchen Li, Ruitao Wu, Ruichuan An, Yang Shi, Liu Liu, Xu-Yao Zhang, Qiang Liu, Zhouchen Lin, Wentao Zhang, Bin Dong

📣 Overview

MorphoBench Overview

MorphoBench is an adaptive reasoning benchmark for large-scale models. It curates over 1,300 multidisciplinary questions and dynamically adjusts task difficulty based on model reasoning traces, providing a scalable and reliable framework for evaluating the reasoning performance of advanced models like o3 and GPT-5.

🎓 Dataset

The MorphoBench dataset is available on Hugging Face: OpenDCAI/MorphoBench

from datasets import load_dataset
dataset = load_dataset("OpenDCAI/MorphoBench")

After downloading, create a data/ folder inside your local project directory and place the datasets there:

MorphoBench/
├── adaption/
├── asset/
├── data/
│   ├── Morpho_P_Perturbed/
│   ├── Morpho_P_v0/
│   ├── Morpho_R_Complex/
│   ├── Morpho_R_Lite/
│   └── Morpho_R_v0/
├── scripts/
├── output/
└── ...

⚙️ Usage

Environment Setup

cd Morphobench
pip install -r requirements.txt

Run Inference

Generate model predictions for all datasets:

bash scripts/run_batch.sh

Predictions will be saved under:

output/infer_result/

Evaluate Model Results

Evaluate the reasoning performance:

bash scripts/evaluate_batch.sh

Evaluation metrics will be stored in:

output/eval_result/

📊 Evaluation Results

The following figure summarizes the evaluation results on MorphoBench

MorphoBench Evaluation Results

🙏 Acknowledgements

This repository adapts evaluation script from Humanity's Last Exam. We sincerely thank the authors for their valuable contributions to the research community.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published