Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving
π Accepted to Findings of EMNLP 2025
This repository will host the PhoPile dataset and benchmarking code for evaluating foundation models with retrieval-augmented generation (RAG) in Olympiad-level physics problem solving.
π Data and code will be released soon.
If you use this work, please cite:
@inproceedings{zheng2025phopile,
title = "Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving",
author = "Zheng, Shunfeng and Zhang, Yudi and Fang, Meng and Zhang, Zihan and Wu, Zhitan and Pechenizkiy, Mykola and Chen, Ling",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
year = "2025",
}