How does MCPMark handle MCP server version changes (Notion / GitHub MCP)?

Hi MCPMark team, thanks for the great benchmark!

While running MCPMark locally, I ran into some issues caused by **MCP server version updates**, and I’d like to ask about the intended way to handle this.

### Observed issues

- **Notion MCP**  
  Recent updates changed the `tools/list` JSON schema.  
  This causes tool-call format mismatches when running MCPMark tasks that assume an older schema.

- **GitHub MCP**  
  The current GitHub MCP exposes ~40 tools, while MCPMark documentation mentions **92 tools**, suggesting the benchmark was evaluated with an earlier server version.

These differences make it hard to reproduce MCPMark results reliably.

### Questions

1. **Server version pinning**  
   - Does MCPMark recommend pinning MCP server versions or commits?
   - Is there a reference to the exact MCP server versions used for the official scores?

2. **Benchmark updates**  
   - Will MCPMark be updated as MCP servers evolve, or is server behavior intended to be “frozen” for benchmarking?

3. **Official scores**  
   - Will the published MCPMark model scores be re-evaluated when MCP servers change, or are they tied to a fixed snapshot?

4. **Reproducibility**  
   - Are there plans to provide Docker images, version manifests, or other guidance to ensure reproducible runs?

Clarification on how MCPMark balances **live MCP ecosystem changes vs. benchmark reproducibility** would be very helpful.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does MCPMark handle MCP server version changes (Notion / GitHub MCP)? #245

Observed issues

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How does MCPMark handle MCP server version changes (Notion / GitHub MCP)? #245

Description

Observed issues

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions