evaluation
Here are 584 public repositories matching this topic...
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
-
Updated
Apr 30, 2025 - Python
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
-
Updated
May 4, 2025 - Python
Python package for the evaluation of odometry and SLAM
-
Updated
Mar 20, 2025 - Python
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
-
Updated
May 3, 2025 - Python
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
-
Updated
Mar 24, 2023 - Python
A unified evaluation framework for large language models
-
Updated
Apr 29, 2025 - Python
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
-
Updated
May 1, 2025 - Python
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
-
Updated
May 3, 2025 - Python
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
-
Updated
Aug 18, 2024 - Python
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
-
Updated
Jan 10, 2025 - Python
Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
-
Updated
Mar 11, 2025 - Python
☁️ 🚀 📊 📈 Evaluating state of the art in AI
-
Updated
May 1, 2025 - Python
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
-
Updated
Apr 3, 2024 - Python
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
-
Updated
May 1, 2025 - Python
Multi-class confusion matrix library in Python
-
Updated
Apr 28, 2025 - Python
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
-
Updated
Aug 20, 2024 - Python
XAI - An eXplainability toolbox for machine learning
-
Updated
Oct 30, 2021 - Python
FuzzBench - Fuzzer benchmarking as a service.
-
Updated
Feb 6, 2025 - Python
High-fidelity performance metrics for generative models in PyTorch
-
Updated
Jan 25, 2024 - Python
Improve this page
Add a description, image, and links to the evaluation topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the evaluation topic, visit your repo's landing page and select "manage topics."