ModelMatch

🌐 Vision

ModelMatch exists to make AI model selection simple, transparent, and useful. Instead of endless benchmarks and confusing charts, we provide practical evaluations that show how models perform in the tasks people actually care about.

🎯 Aim

Help people find the right model for the right job
Bridge research and reality by testing models in real-world scenarios
Save time by showing what works best — and why

👥 Who is it for?

Students & researchers looking for the best summarizer or helper for projects
Professionals & teams needing models that won’t hallucinate or mislead
AI enthusiasts wondering, “Which model should I trust for this task?”

🧠 Our Frameworks

Summeval – Evaluates models on summarization tasks.

▶️ Try it – No Code
💻 Access Source Code

TherapyEval – Tests how models perform as conversational, empathetic “therapy-like” companions.

▶️ Try it – No Code
💻 Access Source Code

EmailEval – Evaluates model performance on professional and marketing email generation.

▶️ Try it – No Code
💻 Access Source Code

FinanceEval – Measures how models handle financial reasoning, forecasting, and analysis tasks.

▶️ Try it – No Code
💻 Access Source Code

HealthEval – Evaluates clinical and healthcare-related reasoning, medical advice accuracy, and ethical safety.

▶️ Try it – No Code
💻 Access Source Code

🔓 Both frameworks are fully open source and can be run either directly on Hugging Face (no code required) or locally via the GitHub source.

📊 What’s Next

Official Leaderboards – A single hub to see scores, rankings, and comparisons of models across tasks, so you instantly know which model is best.

📊 Our Results (Top 3 Models) from Open-Source Frameworks

🧩 General Purpose Models

Model	Score
Phi-3 Mini 4K Instruct	9.08
Mistral 7B Instruct v0.3	8.87
OpenHermes-2.5-Mistral-7B	8.79

📰 SummEval (Summarization)

Top Models	Scores
OpenHermes-2.5-Mistral-7B	9.69
Mistral 7B Instruct v0.3	9.50
Phi-3 Mini 4K Instruct	9.20

Metrics: Coverage, Intent Alignment, Hallucination Control, Topical Relevance, Bias & Toxicity

💬 TherapyEval

Top Models	Scores
Llama3-Med42-8B	8.60
Gemma-3 Medical (Fine-tune i1 GGUF)	8.55
Josiefied-Health-Qwen3-8B-Abliterated-v1	8.15

Metrics: Empathy & Rapport, Emotional Relevance, Boundary Awareness, Ethical Safety, Adaptability & Support

✉️ EmailEval

Top Models	Scores
Tulu-2-7B (AI2)	8.89
StarChat-Beta (Hugging Face H4)	8.54
LFM2-1.2B (Liquid AI)	8.44

Metrics: Clarity & Ask Framing, Length & Pacing, Spam & Deliverability Risk, Personalization Density, Tone & Hygiene

💹 FinanceEval

Top Models	Scores
Meta-Llama-3-70B Instruct	6.26
Meta-Llama-3.3-70B Instruct	5.87
Nemotron-70B Instruct	5.78

Metrics: Trust & Transparency, Competence & Accuracy, Explainability, Client-Centeredness, Risk Safety, Communication Clarity

🏥 HealthEval

Top Models	Scores
Qwen-UMLS-7B-Instruct	7.44
Phi-3 Mini 4K Instruct	7.43
Llama3-Med42-8B	7.18

Metrics: Evidence Transparency, Clinical Safety, Empathy, Clarity, Plan Quality, Trust & Agency

🌱 Community

ModelMatch is part of BrainDrive, an open-source movement for user-owned AI.
Join the conversation: community.braindrive.ai

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
EmailEval		EmailEval
FinanceEval		FinanceEval
HealthEval		HealthEval
IntegratedEval		IntegratedEval
Summeval		Summeval
TherapyEval		TherapyEval
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ModelMatch

🌐 Vision

🎯 Aim

👥 Who is it for?

🧠 Our Frameworks

📊 What’s Next

📊 Our Results (Top 3 Models) from Open-Source Frameworks

🧩 General Purpose Models

📰 SummEval (Summarization)

💬 TherapyEval

✉️ EmailEval

💹 FinanceEval

🏥 HealthEval

🌱 Community

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

BrainDriveAI/ModelMatch

Folders and files

Latest commit

History

Repository files navigation

ModelMatch

🌐 Vision

🎯 Aim

👥 Who is it for?

🧠 Our Frameworks

📊 What’s Next

📊 Our Results (Top 3 Models) from Open-Source Frameworks

🧩 General Purpose Models

📰 SummEval (Summarization)

💬 TherapyEval

✉️ EmailEval

💹 FinanceEval

🏥 HealthEval

🌱 Community

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages