Code for "Prediction-Powered Ranking of Large Language Models", NeurIPS 2024.
ranking-algorithm llm-eval llm-evaluation llm-evaluation-framework prediction-powered-inference rank-sets
-
Updated
Oct 28, 2024 - Jupyter Notebook