haizelabs / verdict Star 299 Code Issues Pull requests Inference-time scaling for LLMs-as-a-judge. reward-shaping llm llm-as-a-judge test-time-compute inference-time-compute llm-judge test-time-scaling Updated Sep 1, 2025 Jupyter Notebook
Anmolian / Prompt_Eval_LLM_Judge Star 1 Code Issues Pull requests Prompt Design & LLM Judge prompt-engineering llms few-shot-prompting one-shot-prompting zero-shot-prompting contrastive-cot-prompting cot-prompting llm-judge trec-rag-2024 self-consistency-prompting role-playing-prompting Updated Feb 10, 2025 Python
PabloCabaleiro / pondera Star 1 Code Issues Pull requests Pondera is a lightweight, YAML-first framework to evaluate AI models and agents with pluggable runners and an LLM-as-a-judge. python ai agents model-agnostic ai-evaluation llms llm-evaluation llm-evaluation-framework llm-judge agent-evaluation ai-evaluation-framework rubric-based-evaluation yaml-first Updated Sep 19, 2025 Python
DJMuRo4ever / Prompt_Eval_LLM_Judge Star 0 Code Issues Pull requests Prompt Design & LLM Judge prompt-engineering llms few-shot-prompting one-shot-prompting zero-shot-prompting contrastive-cot-prompting cot-prompting llm-judge trec-rag-2024 self-consistency-prompting role-playing-prompting Updated Sep 21, 2025