PhD in Electrical and Computer Engineer in Artificial Intelligence. Loves geek stuff and cooking.
-
DCA- FEEC - Unicamp
- Campinas, SP, Brazil
- @wandgibaut
- in/wandgibaut
Highlights
- Pro
Stars
LLM Evaluation
5 repositories
A unified evaluation framework for large language models
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
An extensible benchmark for evaluating large language models on planning
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM