benchmark

Here are 1,909 public repositories matching this topic...

zalandoresearch / fashion-mnist

A MNIST-like fashion product database. Benchmark 👇

benchmark machine-learning computer-vision deep-learning fashion dataset gan mnist convolutional-neural-networks zalando fashion-mnist

Updated Jun 13, 2022
Python

open-mmlab / mmpose

Star

OpenMMLab Pose Estimation Toolbox and Benchmark.

Updated Aug 4, 2025
Python

open-compass / opencompass

Star

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

benchmark evaluation openai llm chatgpt large-language-model llama2 llama3

Updated Mar 6, 2026
Python

erikbern / ann-benchmarks

Star

Benchmarks of approximate nearest neighbor libraries in Python

docker benchmark nearest-neighbors

Updated Jun 10, 2025
Python

open-mmlab / mmaction2

Star

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

benchmark deep-learning pytorch ava x3d action-recognition video-understanding video-classification tsm non-local i3d tsn slowfast temporal-action-localization spatial-temporal-action-detection openmmlab posec3d uniformerv2

Updated Aug 14, 2024
Python

SWE-bench / SWE-bench

Star

SWE-bench: Can Language Models Resolve Real-world Github Issues?

benchmark software-engineering language-model

Updated Feb 19, 2026
Python

CLUEbenchmark / CLUE

Star

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

benchmark tensorflow nlu glue corpus transformers pytorch dataset chinese pretrained-models language-model albert bert roberta chineseglue

Updated Feb 6, 2026
Python

MichaelGrupp / evo

Star

Python package for the evaluation of odometry and SLAM

benchmark robotics tum mapping metrics evaluation ros slam trajectory-analysis odometry trajectory ros2 kitti euroc trajectory-evaluation

Updated Feb 11, 2026
Python

baichuan-inc / Baichuan2

Star

A series of large language models developed by Baichuan Intelligent Technology

benchmark natural-language-processing artificial-intelligence chinese gpt huggingface ceval gpt-4 large-language-models chatgpt mmlu llama2

Updated Nov 8, 2024
Python

EvolvingLMMs-Lab / lmms-eval

Star

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

benchmark evaluation agi video-understanding vlm multimodal large-language-models vision-language-model llm-evaluation audio-evaluation multimodal-evaluation

Updated Mar 7, 2026
Python

RUC-NLPIR / FlashRAG

Star

⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

benchmark datasets large-language-models retrieval-augmented-generation

Updated Mar 1, 2026
Python

embeddings-benchmark / mteb

Star

MTEB: Massive Text Embedding Benchmark

benchmark information-retrieval retrieval text-classification clustering sts semantic-search reranking text-embedding multimodal neural-search sentence-transformers sbert multilingual-nlp low-resource-nlp bitext-mining mteb

Updated Mar 7, 2026
Python

Tencent / AI-Infra-Guard

Star

A full-stack AI Red Teaming platform securing AI ecosystems via AI Infra scan, MCP scan, Agent skills scan, and LLM jailbreak evaluation.

agent security benchmark skills mcp scanner jailbreak vulnerability-scanners security-tools ai-infra mcp-scan llm ai-red-team llm-security agentskills agent-scan skills-scanner openclaw-scan

Updated Mar 2, 2026
Python

baichuan-inc / Baichuan-13B

Star

A 13B large language model developed by Baichuan Intelligent Technology

benchmark natural-language-processing artificial-intelligence chinese huggingface ceval gpt-4 large-language-models chatgpt mmlu

Updated Sep 6, 2023
Python

microsoft / promptbench

Star

A unified evaluation framework for large language models

benchmark evaluation prompt robustness adversarial-attacks large-language-models prompt-engineering chatgpt

Updated Feb 20, 2026
Python

xlang-ai / OSWorld

Star

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

Updated Mar 7, 2026
Python

OpenGVLab / InternVideo

Star

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Updated Dec 15, 2025
Python

beir-cellar / beir

Star

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

nlp elasticsearch benchmark information-retrieval deep-learning retrieval pytorch dataset bert dpr passage-retrieval question-generation rag sentence-transformers sbert zero-shot-retrieval colbert retrieval-models llm

Updated Oct 16, 2025
Python

RoboTwin-Platform / RoboTwin

Star

RoboTwin 2.0 Offical Repo

benchmark robotics data-generator embodied-ai

Updated Mar 4, 2026
Python

logpai / logparser

Star

A machine learning toolkit for log parsing [ICSE'19, DSN'16]

benchmark log-analysis log log-parser log-mining anomaly-detection log-parsing

Updated Jun 10, 2025
Python

Improve this page

Add a description, image, and links to the benchmark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the benchmark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark

Here are 1,909 public repositories matching this topic...

zalandoresearch / fashion-mnist

open-mmlab / mmpose

open-compass / opencompass

erikbern / ann-benchmarks

open-mmlab / mmaction2

SWE-bench / SWE-bench

CLUEbenchmark / CLUE

MichaelGrupp / evo

baichuan-inc / Baichuan2

EvolvingLMMs-Lab / lmms-eval

RUC-NLPIR / FlashRAG

embeddings-benchmark / mteb

Tencent / AI-Infra-Guard

baichuan-inc / Baichuan-13B

microsoft / promptbench

xlang-ai / OSWorld

OpenGVLab / InternVideo

beir-cellar / beir

RoboTwin-Platform / RoboTwin

logpai / logparser

Improve this page

Add this topic to your repo