Skip to content

BrainDriveAI/ModelMatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModelMatch

Open Source License: MIT BrainDrive Project


🌐 Vision

ModelMatch exists to make AI model selection simple, transparent, and useful. Instead of endless benchmarks and confusing charts, we provide practical evaluations that show how models perform in the tasks people actually care about.


🎯 Aim

  • Help people find the right model for the right job
  • Bridge research and reality by testing models in real-world scenarios
  • Save time by showing what works best — and why

👥 Who is it for?

  • Students & researchers looking for the best summarizer or helper for projects
  • Professionals & teams needing models that won’t hallucinate or mislead
  • AI enthusiasts wondering, “Which model should I trust for this task?”

🧠 Our Frameworks

Summeval – Evaluates models on summarization tasks.

TherapyEval – Tests how models perform as conversational, empathetic “therapy-like” companions.

EmailEval – Evaluates model performance on professional and marketing email generation.

FinanceEval – Measures how models handle financial reasoning, forecasting, and analysis tasks.

HealthEval – Evaluates clinical and healthcare-related reasoning, medical advice accuracy, and ethical safety.

🔓 Both frameworks are fully open source and can be run either directly on Hugging Face (no code required) or locally via the GitHub source.


📊 What’s Next

  • Official Leaderboards – A single hub to see scores, rankings, and comparisons of models across tasks, so you instantly know which model is best.

📊 Our Results (Top 3 Models) from Open-Source Frameworks

🧩 General Purpose Models

Model Score
Phi-3 Mini 4K Instruct 9.08
Mistral 7B Instruct v0.3 8.87
OpenHermes-2.5-Mistral-7B 8.79

📰 SummEval (Summarization)

Top Models Scores
OpenHermes-2.5-Mistral-7B 9.69
Mistral 7B Instruct v0.3 9.50
Phi-3 Mini 4K Instruct 9.20

Metrics: Coverage, Intent Alignment, Hallucination Control, Topical Relevance, Bias & Toxicity

💬 TherapyEval

Top Models Scores
Llama3-Med42-8B 8.60
Gemma-3 Medical (Fine-tune i1 GGUF) 8.55
Josiefied-Health-Qwen3-8B-Abliterated-v1 8.15

Metrics: Empathy & Rapport, Emotional Relevance, Boundary Awareness, Ethical Safety, Adaptability & Support

✉️ EmailEval

Top Models Scores
Tulu-2-7B (AI2) 8.89
StarChat-Beta (Hugging Face H4) 8.54
LFM2-1.2B (Liquid AI) 8.44

Metrics: Clarity & Ask Framing, Length & Pacing, Spam & Deliverability Risk, Personalization Density, Tone & Hygiene

💹 FinanceEval

Top Models Scores
Meta-Llama-3-70B Instruct 6.26
Meta-Llama-3.3-70B Instruct 5.87
Nemotron-70B Instruct 5.78

Metrics: Trust & Transparency, Competence & Accuracy, Explainability, Client-Centeredness, Risk Safety, Communication Clarity

🏥 HealthEval

Top Models Scores
Qwen-UMLS-7B-Instruct 7.44
Phi-3 Mini 4K Instruct 7.43
Llama3-Med42-8B 7.18

Metrics: Evidence Transparency, Clinical Safety, Empathy, Clarity, Plan Quality, Trust & Agency

🌱 Community

ModelMatch is part of BrainDrive, an open-source movement for user-owned AI.
Join the conversation: community.braindrive.ai

About

AI Model Evaluator - Helping people choose the best model for their use case.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages