Skip to content

How does MCPMark handle MCP server version changes (Notion / GitHub MCP)? #245

@caitianxiao

Description

@caitianxiao

Hi MCPMark team, thanks for the great benchmark!

While running MCPMark locally, I ran into some issues caused by MCP server version updates, and I’d like to ask about the intended way to handle this.

Observed issues

  • Notion MCP
    Recent updates changed the tools/list JSON schema.
    This causes tool-call format mismatches when running MCPMark tasks that assume an older schema.

  • GitHub MCP
    The current GitHub MCP exposes ~40 tools, while MCPMark documentation mentions 92 tools, suggesting the benchmark was evaluated with an earlier server version.

These differences make it hard to reproduce MCPMark results reliably.

Questions

  1. Server version pinning

    • Does MCPMark recommend pinning MCP server versions or commits?
    • Is there a reference to the exact MCP server versions used for the official scores?
  2. Benchmark updates

    • Will MCPMark be updated as MCP servers evolve, or is server behavior intended to be “frozen” for benchmarking?
  3. Official scores

    • Will the published MCPMark model scores be re-evaluated when MCP servers change, or are they tied to a fixed snapshot?
  4. Reproducibility

    • Are there plans to provide Docker images, version manifests, or other guidance to ensure reproducible runs?

Clarification on how MCPMark balances live MCP ecosystem changes vs. benchmark reproducibility would be very helpful.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions