-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Hi MCPMark team, thanks for the great benchmark!
While running MCPMark locally, I ran into some issues caused by MCP server version updates, and I’d like to ask about the intended way to handle this.
Observed issues
-
Notion MCP
Recent updates changed thetools/listJSON schema.
This causes tool-call format mismatches when running MCPMark tasks that assume an older schema. -
GitHub MCP
The current GitHub MCP exposes ~40 tools, while MCPMark documentation mentions 92 tools, suggesting the benchmark was evaluated with an earlier server version.
These differences make it hard to reproduce MCPMark results reliably.
Questions
-
Server version pinning
- Does MCPMark recommend pinning MCP server versions or commits?
- Is there a reference to the exact MCP server versions used for the official scores?
-
Benchmark updates
- Will MCPMark be updated as MCP servers evolve, or is server behavior intended to be “frozen” for benchmarking?
-
Official scores
- Will the published MCPMark model scores be re-evaluated when MCP servers change, or are they tied to a fixed snapshot?
-
Reproducibility
- Are there plans to provide Docker images, version manifests, or other guidance to ensure reproducible runs?
Clarification on how MCPMark balances live MCP ecosystem changes vs. benchmark reproducibility would be very helpful.
Thanks!