UQ provides resources to assess LLMs on unsolved questions: (1) UQ-Dataset provides curated unsolved questions; (2) UQ-Validators are LLM-based validation strategies to check answer-correctness (3) UQ-Platform is a website to engage with the questions and answers.
You can load the data from 🤗 uq-project/uq via:
# pip install -q datasets
from datasets import load_dataset
dataset = load_dataset("uq-project/uq", split="test")# Please specify your API key in `key.env`
python gen_answer.py --model_name o3UQ Validator is now available as a pip-installable package:
# Install from source
git clone https://github.com/uq-project/UQ.git
cd UQ
pip install -e .
# Or install from PyPI (coming soon)
pip install uq-validatorOnce you have your model predictions, you can use UQ-validators via:
# Each sample in input file should include question_id and answer
python validate.py --input_file your_answers --model o3 --strategy sequential --turns 3 \
--multi_turn_voting majority# Basic validation
uq-validate --input_file your_answers.jsonl --dataset questions.jsonl --strategy relevance
# Sequential validation with multiple strategies
uq-validate --input_file your_answers.jsonl --dataset questions.jsonl --strategy sequential \
--sequential_strategies relevance cycle_consistency factual_error final_answer
# Multi-sample validation with voting
uq-validate --input_file your_answers.jsonl --dataset questions.jsonl --strategy total_correctness \
--samples 3 --resampling_voting majority@misc{nie2025uqassessinglanguagemodels,
title={UQ: Assessing Language Models on Unsolved Questions},
author={Fan Nie and Ken Ziyu Liu and Zihao Wang and Rui Sun and Wei Liu and Weijia Shi and Huaxiu Yao and Linjun Zhang and Andrew Y. Ng and James Zou and Sanmi Koyejo and Yejin Choi and Percy Liang and Niklas Muennighoff},
year={2025},
eprint={2508.17580},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.17580},
}