Skip to content
 
OpenCompass Website HOT      OpenCompass Toolkit TRY IT OUT
 

GitHub Org's stars

What is OpenCompass ? OpenCompass is a platform focused on understanding of the AGI, include Large Language Model and Multi-modality Model.

We aim to:

  • develop high-quality libraries to reduce the difficulties in evaluation
  • provide convincing leaderboards for improving the understanding of the large models
  • create powerful toolchains targeting a variety of abilities and tasks
  • build solid benchmarks to support the large model research
  • research on inference of Large Model(analysis, reasoning, prompt engineering.)

Toolkit

OpenCompass

VLMEvalKit

Benchmarks and Methods

Project Topic Paper

DevBench

Automated Software Development

DevBench: Towards LLMs based Automated Software Development

CriticBench

Critic Reasoning

CriticBench: Evaluating Large Language Models as Critic

ANAH

Hallucination Annotation

ANAH: Analytical Annotation of Hallucinations in Large Language Models

MathBench

Mathematical Reasoning

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark

T-Eval

Tool Utilization

T-Eval: Evaluating the Tool Utilization Capability Step by Step

MMBench

Multi Modality

MMBench: Is Your Multi-modal Model an All-around Player?

BotChat

Subjective Evaluation

BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues

LawBench

Domain Evaluation

LawBench: Benchmarking Legal Knowledge of Large Language Models

Pinned Loading

  1. opencompass opencompass Public

    OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

    Python 5.8k 642

  2. VLMEvalKit VLMEvalKit Public

    Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

    Python 2.9k 468

  3. CompassJudger CompassJudger Public

    The All-in-one Judge Models introduced by Opencompass

    109 5

  4. CompassVerifier CompassVerifier Public

    CompassVerifier: A Unified and Robust Verifier for Large Language Models

    Jupyter Notebook 28

  5. MMBench MMBench Public

    Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"

    238 12

  6. Creation-MMBench Creation-MMBench Public

    Assessing Context-Aware Creative Intelligence in MLLMs

    JavaScript 21

Repositories

Showing 10 of 35 repositories
  • VLMEvalKit Public

    Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

    open-compass/VLMEvalKit’s past year of commit activity
    Python 2,860 Apache-2.0 468 143 12 Updated Aug 8, 2025
  • CompassVerifier Public

    CompassVerifier: A Unified and Robust Verifier for Large Language Models

    open-compass/CompassVerifier’s past year of commit activity
    Jupyter Notebook 28 0 0 0 Updated Aug 6, 2025
  • GPassK Public

    [ACL 2025] Are Your LLMs Capable of Stable Reasoning?

    open-compass/GPassK’s past year of commit activity
    Python 30 2 2 0 Updated Aug 5, 2025
  • opencompass Public

    OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

    open-compass/opencompass’s past year of commit activity
    Python 5,818 Apache-2.0 641 327 (1 issue needs help) 65 Updated Aug 4, 2025
  • MMBench-GUI Public

    Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.

    open-compass/MMBench-GUI’s past year of commit activity
    Python 68 3 4 0 Updated Jul 28, 2025
  • Creation-MMBench Public

    Assessing Context-Aware Creative Intelligence in MLLMs

    open-compass/Creation-MMBench’s past year of commit activity
    JavaScript 21 0 0 0 Updated Jul 22, 2025
  • CompassJudger Public

    The All-in-one Judge Models introduced by Opencompass

    open-compass/CompassJudger’s past year of commit activity
    109 Apache-2.0 5 1 0 Updated Jul 15, 2025
  • SAGA Public
    open-compass/SAGA’s past year of commit activity
    5 0 0 0 Updated Jul 11, 2025
  • RaML Public

    [Preprint 2025] Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective

    open-compass/RaML’s past year of commit activity
    Jupyter Notebook 6 2 0 0 Updated May 27, 2025
  • BotChat Public

    Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.

    open-compass/BotChat’s past year of commit activity
    Jupyter Notebook 157 Apache-2.0 6 2 0 Updated May 22, 2025