wandgibaut

Wandemberg Gibaut wandgibaut

PhD in Electrical and Computer Engineer in Artificial Intelligence. Loves geek stuff and cooking.

Achievements

Stars

5 repositories

A unified evaluation framework for large language models

Python 2,577 190 Updated Feb 11, 2025

ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.

Scala 247 26 Updated Oct 16, 2024

An extensible benchmark for evaluating large language models on planning

PDDL 334 36 Updated Mar 27, 2025

An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]

SAS 297 31 Updated May 20, 2024

Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM

Jupyter Notebook 1,415 169 Updated Mar 21, 2025