Inference-time scaling for LLMs-as-a-judge.
-
Updated
Jul 15, 2025 - Jupyter Notebook
Inference-time scaling for LLMs-as-a-judge.
Official repository of the spotlight ICML 2025 paper, PokeChamp: an Expert-level Minimax Language Agent.
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Test-Time Memory Framework: Control Hallucinations in Foundation Models
Code for the paper "Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement"
Code for ICML 2025 How Do Large Language Monkeys Get Their Power (Laws)?
An experimental project using MCTS to refine LLM responses for better accuracy and decision-making.
A Framework Enabling Web Agents to Master Workflows From Human Demonstration
Add a description, image, and links to the test-time-compute topic page so that developers can more easily learn about it.
To associate your repository with the test-time-compute topic, visit your repo's landing page and select "manage topics."