This repository includes the slides and some of the notebooks that are used in my Evaluation workshops.
Some of the notebooks do require an OpenAI API key.
These notebooks are intended for explaining key points of the talk, please don't try to bring them to production use. If you want to dig deeper or have issues, go to the source for each of these projects. y Updated with my May 2025 ODSC workshop
Testing Properties of a System: Guidance AI
Langtest tutorials from John Snow Labs: Colab Notebooks
LLM Evaluation Harness from EleutherAI: Github or Colab notebook
Ragas showing Model as an evaluator: Github or Colab notebook
Ragas using LangFuse: Colab notebook
Evaluate LLMs and RAG a practical example using Langchain and Hugging Face: Github
MLFlow Automated Evaluation: Blog
LLM Grader on AWS: Video and Notebook
LLM AutoEval for RunPod by Maxime Labonne Colab
Agno and Langfuse with a Research Agent: Github
Generative AI Summit, Austin (Oct 2023) - Slides
ODSC West, San Francisco (Nov 2023) - Slides
Arize Holiday Conference (Dec 2023) - Slides
Data Innovation Conference (Apr 2024) - Slides
ODSC East, Boston (May 2025) - Slides
Evaluation for Large Language Models and Generative AI - A Deep Dive - YouTube
Constructing an Evaluation Approach for Generative AI Models - YouTube
Large Language Models (LLMs) Can Explain Their Predictions - YouTube & Slides
Practical Lessons in Building Generative AI: RAG and Text to SQL - YouTube
Unit Testing for Natural Language (LLMs) + LMUnit model - YouTube
Josh Tobin's Evaluation talk YouTube
LLM Evaluation Tooling Review