llm-evaluation

Star

Here are 6 public repositories matching this topic...

AntonioGr7 / pratical-llms

Star

A collection of hand on notebook for LLMs practitioner

quantization llm llm-serving genai llm-training llm-inference llm-evaluation

Updated Jan 13, 2025
Jupyter Notebook

DavidGir / LangChain-Familiarization

Star

For the purposes of familiarization and learning. Consists of utilizing LangChain framework, LangSmith for tracing, OpenAI LLM models, Pinecone serverless vectorDB using Jupyter Notebook and Python.

models prompt parsers pinecone rag llm langchain-python langchain-chains langchain-agent llm-evaluation llmchain

Updated Mar 29, 2024
Jupyter Notebook

Harshita1195 / Evaluating-Large-Language-Model-LLM-Metrics

Star

Notebooks for evaluating LLM outputs using various metrics, covering scenarios with and without known ground truth. Includes criteria such as correctness, coherence, relevance, and more, providing a comprehensive approach to assess LLM performance accurately and efficiently.

llm langchain llm-evaluation

Updated Oct 15, 2024
Jupyter Notebook

KristiArbo / biasbounty1_humaneintelligence

Star

This repo contains my coding notebook for the tutorial series I made for the beginner level bias bounty challenge hosted by Humane Intelligence. I am an AI Ethics Fellow at Humane Intelligence.

python nlp nlp-keywords-extraction bias-detection generative-ai llm-evaluation

Updated Sep 27, 2025
Jupyter Notebook

reddgr / chatbot-response-scoring-scbn-rqtl

Star

Scoring LLM chatbot responses from LMSYS Chatbot Arena with SCBN and RQTL metrics, unwrapping Chatbot Arena prompts, quick chatbot setup on Jupyter notebook, and more... all things chatbots fit in this repo.

nlp chatbots llm-evaluation

Updated Oct 6, 2025
Jupyter Notebook

felipevalencla / Anchoring-LLMs

Star

Code, datasets, and replication notebook for the preprint Anchors in the Machine: Behavioral and Attributional Evidence of Anchoring Bias in LLMs. The project replicates and extends Tversky & Kahneman’s classic anchoring experiments across six open-source LLMs (GPT-2, GPT-Neo-125M, Falcon-RW-1B, Phi-2, Gemma-2B, LLaMA-2-7B).

behavioral-economics llm-evaluation

Updated Sep 30, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-evaluation

Here are 6 public repositories matching this topic...

AntonioGr7 / pratical-llms

DavidGir / LangChain-Familiarization

Harshita1195 / Evaluating-Large-Language-Model-LLM-Metrics

KristiArbo / biasbounty1_humaneintelligence

reddgr / chatbot-response-scoring-scbn-rqtl

felipevalencla / Anchoring-LLMs

Improve this page

Add this topic to your repo