Large Language Models as General Pattern Machines
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Validity Challenges in Machine Learning Benchmarks
GAIA:A BENCHMARK FOR GENERAL AI ASSISTANTS
A critical review of large language models: Sensitivity, bias, and the path toward specialized AI
Measuring General Intelligence with Generated Games
Maia-2: A Unified Model for Human-AI Alignment in Chess
Aligning Superhuman AI with Human Behavior: Chess as a Model System