Skip to content

feature request: MCP server #170

@a-h

Description

@a-h

Problem

Agentic coding tools (Claude Code, Cursor, etc.) need to discover and run project tasks. Today, agents must read the entire README.md to find available tasks, wasting tokens.

An MCP server would let agents discover tasks as tools and run them directly.

A secondary problem: when an agent calls a slow task like xc test, it blocks for the entire duration. The agent cannot continue working until the task completes. Also, if the output is verbose, more tokens are wasted.

Goals

  • Expose xc tasks as MCP tools so agents can discover and run them without reading the README.
  • Allow agents to run tasks asynchronously so they are not blocked by slow tasks.
  • Give agents control over output verbosity to save tokens.
  • Reuse xc's existing parser and runner packages. Do not duplicate execution logic.
  • Do not turn xc into a process manager. Tasks remain run-to-completion.

Non-goals

  • Persistent/background/watch task types in the xc task model.
  • Restart policies, health checks, or readiness probes.
  • Process supervision or service orchestration.
  • HTTP transport. The server uses stdio only.

If someone wants test --watch, they write a task whose script invokes their test runner's watch mode. xc runs it; xc does not manage it.

Tool Registration

The server parses tasks from the project's Markdown file and registers MCP tools dynamically.

Per-task tools

One tool per task, prefixed with xc_ to avoid collisions with other MCP tools. For a task named test, the tool is xc_test.

Utility tools

Three additional tools, always registered:

Tool Purpose Read-only
xc_list Returns all tasks with names, descriptions, deps, and inputs Yes
xc_describe Returns full metadata for a single task (script, dep graph, env, dir) Yes
xc_result Retrieves the result of an async task invocation by run ID Yes

Total tool count: N + 3, where N is the number of tasks.

Why not 4 tools per task?

I considered registering _run, _status, _stream_logs, and _wait per task, but:

  1. Tool list bloat degrades model selection accuracy and wastes tokens on schema alone. 10 tasks would produce 40 tools.
  2. _status, _stream_logs, and _wait reinvent what MCP provides natively via notifications/progress, notifications/message, and notifications/cancelled.
  3. Naming collisions: a task named run would produce run_run.

Tool Schemas

Per-task tool

Using test as an example, with a task that accepts a VERSION input:

{
  "name": "xc_test",
  "description": "Runs the unit and integration tests.",
  "annotations": {
    "readOnlyHint": false,
    "destructiveHint": false,
    "idempotentHint": true
  },
  "inputSchema": {
    "type": "object",
    "properties": {
      "async": {
        "type": "boolean",
        "default": false,
        "description": "If true, starts the task and returns a run ID immediately. Use xc_result to collect output later."
      },
      "skip_deps": {
        "type": "boolean",
        "default": false,
        "description": "If true, skips dependency tasks."
      },
      "output": {
        "type": "string",
        "enum": ["full", "tail", "stderr", "silent"],
        "default": "tail",
        "description": "Controls how much output is returned. 'tail' returns the last N lines."
      },
      "tail_lines": {
        "type": "integer",
        "default": 50,
        "description": "Number of lines to return when output is 'tail'."
      },
      "VERSION": {
        "type": "string",
        "description": "Input: VERSION"
      }
    }
  }
}

The first four properties (async, skip_deps, output, tail_lines) are
injected by the server on every task tool. Task-specific Inputs are appended
as additional properties. Inputs that have defaults in the task's Env
attribute are not listed in required.

xc_list

{
  "name": "xc_list",
  "description": "Lists all available xc tasks with names, descriptions, dependencies, and inputs.",
  "annotations": {
    "readOnlyHint": true,
    "destructiveHint": false,
    "idempotentHint": true
  },
  "inputSchema": {
    "type": "object",
    "properties": {}
  }
}

xc_describe

{
  "name": "xc_describe",
  "description": "Returns full metadata for a task including script source, dependency graph, environment variables, and working directory.",
  "annotations": {
    "readOnlyHint": true,
    "destructiveHint": false,
    "idempotentHint": true
  },
  "inputSchema": {
    "type": "object",
    "properties": {
      "task": {
        "type": "string",
        "description": "The task name to describe."
      }
    },
    "required": ["task"]
  }
}

xc_result

{
  "name": "xc_result",
  "description": "Retrieves the result of a task invocation by run ID. If the task is still running, returns the current status and output so far.",
  "annotations": {
    "readOnlyHint": true,
    "destructiveHint": false,
    "idempotentHint": true
  },
  "inputSchema": {
    "type": "object",
    "properties": {
      "run_id": {
        "type": "string",
        "description": "The run ID returned by an async task invocation."
      },
      "output": {
        "type": "string",
        "enum": ["full", "tail", "stderr", "silent"],
        "default": "tail"
      },
      "tail_lines": {
        "type": "integer",
        "default": 50
      }
    },
    "required": ["run_id"]
  }
}

Execution Model

Synchronous (default)

  1. Resolve inputs from tool arguments, falling back to environment variables, then task-defined defaults, per xc's existing precedence.
  2. Run dependencies (unless skip_deps: true) per the task's RunDeps setting (sync or async).
  3. Execute the script as a subprocess. Capture stdout and stderr separately in ring buffers.
  4. Stream output lines as notifications/message (level info for stdout, warning for stderr), tagged with the task name as logger.
  5. If the client provided a progressToken, send notifications/progress periodically. Since shell tasks have no measurable progress, increment progress without a total.
  6. On completion, return the result shaped by the output parameter.
  7. Respect notifications/cancelled by sending SIGTERM to the subprocess, then SIGKILL after a grace period.

Async (async: true)

  1. Allocate a run ID (e.g., test-a1b2c3).
  2. Start the task in a background goroutine. Capture output in the same ring buffers.
  3. Return immediately:
    {
      "content": [{ "type": "text", "text": "Task 'test' started. Run ID: test-a1b2c3" }]
    }
  4. The agent continues working. When it wants the result, it calls xc_result.
  5. If the task is still running, xc_result returns current status and output so far:
    {
      "content": [{
        "type": "text",
        "text": "Task 'test' is still running (12s elapsed).\n\n--- stderr (last 50 lines) ---\n..."
      }]
    }
  6. If the task is finished, xc_result returns the final output and exit code.
  7. Completed run results are retained for the lifetime of the server session. The server evicts old results after a configurable retention count (default: 8 runs).

This is not a background task concept in xc. It is purely an MCP server execution concern. xc's task model does not change.

Parallel execution

MCP and JSON-RPC allow concurrent tools/call requests. The server handles each in its own goroutine. An agent can call xc_test and xc_lint simultaneously -- both run as independent subprocesses. No special configuration is needed.

However, only a single concurrent execution of each xc task is allowed.

Cancellation

When the server receives notifications/cancelled for an in-progress tools/call:

  1. Send SIGTERM to the subprocess.
  2. Wait up to 5 seconds for graceful shutdown.
  3. Send SIGKILL if the process has not exited.
  4. Return the partial output captured so far.

For async runs, notifications/cancelled cancels the background goroutine's context, triggering the same SIGTERM/SIGKILL sequence.

Output Control

The output parameter controls how much output is included in the tool result:

Value Behavior
full Return all captured stdout and stderr
tail (default) Return the last N lines of combined output (N = tail_lines, default 50)
stderr Return only stderr (full). Useful when stdout is noisy but errors are what matters
silent Return only the exit code and a pass/fail summary. No output

On error (isError: true), the server always includes full stderr regardless of the output setting. The agent needs to see what went wrong.

File Watching

The server watches the task file with fsnotify. On change:

  1. Re-parse tasks.
  2. Diff against the current tool set.
  3. If the set changed, emit notifications/tools/list_changed.

When a developer adds a new task to their README during an active agent session, the agent can discover it immediately after the client re-fetches the tool list.

Tool Annotations

Annotations are derived from the task definition. Reasonable defaults:

Condition Annotation
Task has no script (deps-only) readOnlyHint: true
Task name contains test, lint, check, fmt, vet destructiveHint: false, idempotentHint: true
Task name contains deploy, push, delete, clean, drop destructiveHint: true
Default destructiveHint: false, idempotentHint: false

These are hints, not guarantees. An optional Annotations Markdown attribute could override them in a future xc extension. This is not required for v1.

Architecture

A Run Manager holds a sync.Map of active and completed runs. Each run wraps an xc Runner execution in a goroutine with ring-buffered stdout/stderr capture. Completed runs are kept in memory for result retrieval and evicted after the retention limit.

Transport and Configuration

stdio only. The agent's host spawns the server as a subprocess.

{
  "mcpServers": {
    "xc": {
      "command": "xc",
      "args": ["--mcp", "--file", "README.md", "--tail-lines", "50"]
    }
  }
}

Server flags:

Flag Default Description
--file README.md Task file to parse
--heading Tasks Markdown heading containing tasks
--tail-lines 50 Default tail length for output
--max-runs 20 Number of completed runs to retain

Implementation

The server reuses xc's existing packages:

  • Parser: parser/parsemd and parser/parseorg for task extraction.
  • Runner: run.Runner for task execution, dependency resolution, input
    handling, and script interpretation.
  • Models: models.Task for the task data structure.

The MCP protocol layer uses the official Go SDK (github.com/modelcontextprotocol/go-sdk).

Example Agent Session

Agent                           MCP Server
  |                                 |
  |-- tools/list ------------------>|
  |<-- [xc_build, xc_test, ...]  --|
  |                                 |
  |-- xc_list() ------------------>|
  |<-- [{name: "build", ...}, ...] |
  |                                 |
  |-- xc_test(async: true) ------->|
  |<-- "Run ID: test-a1b2c3"     --|
  |                                 |
  | (agent continues writing code)  |
  |                                 |
  |-- xc_result("test-a1b2c3") --->|
  |<-- "Still running (8s)" -------|
  |                                 |
  | (agent continues working)       |
  |                                 |
  |-- xc_result("test-a1b2c3") --->|
  |<-- "Completed. Exit 0. ..."  --|
  |                                 |

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions