feature request: MCP server

## Problem

Agentic coding tools (Claude Code, Cursor, etc.) need to discover and run project tasks. Today, agents must read the entire README.md to find available tasks, wasting tokens.

An MCP server would let agents discover tasks as tools and run them directly.

A secondary problem: when an agent calls a slow task like `xc test`, it blocks for the entire duration. The agent cannot continue working until the task completes. Also, if the output is verbose, more tokens are wasted.

## Goals

- Expose xc tasks as MCP tools so agents can discover and run them without reading the README.
- Allow agents to run tasks asynchronously so they are not blocked by slow  tasks.
- Give agents control over output verbosity to save tokens.
- Reuse xc's existing parser and runner packages. Do not duplicate execution logic.
- Do not turn xc into a process manager. Tasks remain run-to-completion.

## Non-goals

- Persistent/background/watch task types in the xc task model.
- Restart policies, health checks, or readiness probes.
- Process supervision or service orchestration.
- HTTP transport. The server uses stdio only.

If someone wants `test --watch`, they write a task whose script invokes their test runner's watch mode. xc runs it; xc does not manage it.

## Tool Registration

The server parses tasks from the project's Markdown file and registers MCP tools dynamically.

### Per-task tools

One tool per task, prefixed with `xc_` to avoid collisions with other MCP tools. For a task named `test`, the tool is `xc_test`.

### Utility tools

Three additional tools, always registered:

| Tool | Purpose | Read-only |
|------|---------|-----------|
| `xc_list` | Returns all tasks with names, descriptions, deps, and inputs | Yes |
| `xc_describe` | Returns full metadata for a single task (script, dep graph, env, dir) | Yes |
| `xc_result` | Retrieves the result of an async task invocation by run ID | Yes |

Total tool count: N + 3, where N is the number of tasks.

### Why not 4 tools per task?

I considered registering `_run`, `_status`, `_stream_logs`, and `_wait` per task, but:

1. Tool list bloat degrades model selection accuracy and wastes tokens on schema alone. 10 tasks would produce 40 tools.
2. `_status`, `_stream_logs`, and `_wait` reinvent what MCP provides natively via `notifications/progress`, `notifications/message`, and `notifications/cancelled`.
3. Naming collisions: a task named `run` would produce `run_run`.

## Tool Schemas

### Per-task tool

Using `test` as an example, with a task that accepts a `VERSION` input:

```json
{
  "name": "xc_test",
  "description": "Runs the unit and integration tests.",
  "annotations": {
    "readOnlyHint": false,
    "destructiveHint": false,
    "idempotentHint": true
  },
  "inputSchema": {
    "type": "object",
    "properties": {
      "async": {
        "type": "boolean",
        "default": false,
        "description": "If true, starts the task and returns a run ID immediately. Use xc_result to collect output later."
      },
      "skip_deps": {
        "type": "boolean",
        "default": false,
        "description": "If true, skips dependency tasks."
      },
      "output": {
        "type": "string",
        "enum": ["full", "tail", "stderr", "silent"],
        "default": "tail",
        "description": "Controls how much output is returned. 'tail' returns the last N lines."
      },
      "tail_lines": {
        "type": "integer",
        "default": 50,
        "description": "Number of lines to return when output is 'tail'."
      },
      "VERSION": {
        "type": "string",
        "description": "Input: VERSION"
      }
    }
  }
}
```

The first four properties (`async`, `skip_deps`, `output`, `tail_lines`) are
injected by the server on every task tool. Task-specific `Inputs` are appended
as additional properties. Inputs that have defaults in the task's `Env`
attribute are not listed in `required`.

### xc_list

```json
{
  "name": "xc_list",
  "description": "Lists all available xc tasks with names, descriptions, dependencies, and inputs.",
  "annotations": {
    "readOnlyHint": true,
    "destructiveHint": false,
    "idempotentHint": true
  },
  "inputSchema": {
    "type": "object",
    "properties": {}
  }
}
```

### xc_describe

```json
{
  "name": "xc_describe",
  "description": "Returns full metadata for a task including script source, dependency graph, environment variables, and working directory.",
  "annotations": {
    "readOnlyHint": true,
    "destructiveHint": false,
    "idempotentHint": true
  },
  "inputSchema": {
    "type": "object",
    "properties": {
      "task": {
        "type": "string",
        "description": "The task name to describe."
      }
    },
    "required": ["task"]
  }
}
```

### xc_result

```json
{
  "name": "xc_result",
  "description": "Retrieves the result of a task invocation by run ID. If the task is still running, returns the current status and output so far.",
  "annotations": {
    "readOnlyHint": true,
    "destructiveHint": false,
    "idempotentHint": true
  },
  "inputSchema": {
    "type": "object",
    "properties": {
      "run_id": {
        "type": "string",
        "description": "The run ID returned by an async task invocation."
      },
      "output": {
        "type": "string",
        "enum": ["full", "tail", "stderr", "silent"],
        "default": "tail"
      },
      "tail_lines": {
        "type": "integer",
        "default": 50
      }
    },
    "required": ["run_id"]
  }
}
```

## Execution Model

### Synchronous (default)

1. Resolve inputs from tool arguments, falling back to environment variables, then task-defined defaults, per xc's existing precedence.
2. Run dependencies (unless `skip_deps: true`) per the task's `RunDeps` setting (sync or async).
3. Execute the script as a subprocess. Capture stdout and stderr separately in ring buffers.
4. Stream output lines as `notifications/message` (level `info` for stdout, `warning` for stderr), tagged with the task name as `logger`.
5. If the client provided a `progressToken`, send `notifications/progress` periodically. Since shell tasks have no measurable progress, increment `progress` without a `total`.
6. On completion, return the result shaped by the `output` parameter.
7. Respect `notifications/cancelled` by sending SIGTERM to the subprocess, then SIGKILL after a grace period.

### Async (`async: true`)

1. Allocate a run ID (e.g., `test-a1b2c3`).
2. Start the task in a background goroutine. Capture output in the same ring buffers.
3. Return immediately:
   ```json
   {
     "content": [{ "type": "text", "text": "Task 'test' started. Run ID: test-a1b2c3" }]
   }
   ```
4. The agent continues working. When it wants the result, it calls `xc_result`.
5. If the task is still running, `xc_result` returns current status and output so far:
   ```json
   {
     "content": [{
       "type": "text",
       "text": "Task 'test' is still running (12s elapsed).\n\n--- stderr (last 50 lines) ---\n..."
     }]
   }
   ```
6. If the task is finished, `xc_result` returns the final output and exit code.
7. Completed run results are retained for the lifetime of the server session. The server evicts old results after a configurable retention count   (default: 8 runs).

This is not a background task concept in xc. It is purely an MCP server execution concern. xc's task model does not change.

### Parallel execution

MCP and JSON-RPC allow concurrent `tools/call` requests. The server handles each in its own goroutine. An agent can call `xc_test` and `xc_lint` simultaneously -- both run as independent subprocesses. No special configuration is needed.

However, only a single concurrent execution of each xc task is allowed.

### Cancellation

When the server receives `notifications/cancelled` for an in-progress `tools/call`:

1. Send SIGTERM to the subprocess.
2. Wait up to 5 seconds for graceful shutdown.
3. Send SIGKILL if the process has not exited.
4. Return the partial output captured so far.

For async runs, `notifications/cancelled` cancels the background goroutine's context, triggering the same SIGTERM/SIGKILL sequence.

## Output Control

The `output` parameter controls how much output is included in the tool result:

| Value | Behavior |
|-------|----------|
| `full` | Return all captured stdout and stderr |
| `tail` (default) | Return the last N lines of combined output (N = `tail_lines`, default 50) |
| `stderr` | Return only stderr (full). Useful when stdout is noisy but errors are what matters |
| `silent` | Return only the exit code and a pass/fail summary. No output |

On error (`isError: true`), the server always includes full stderr regardless of the `output` setting. The agent needs to see what went wrong.

## File Watching

The server watches the task file with `fsnotify`. On change:

1. Re-parse tasks.
2. Diff against the current tool set.
3. If the set changed, emit `notifications/tools/list_changed`.

When a developer adds a new task to their README during an active agent session, the agent can discover it immediately after the client re-fetches the tool list.

## Tool Annotations

Annotations are derived from the task definition. Reasonable defaults:

| Condition | Annotation |
|-----------|------------|
| Task has no script (deps-only) | `readOnlyHint: true` |
| Task name contains `test`, `lint`, `check`, `fmt`, `vet` | `destructiveHint: false`, `idempotentHint: true` |
| Task name contains `deploy`, `push`, `delete`, `clean`, `drop` | `destructiveHint: true` |
| Default | `destructiveHint: false`, `idempotentHint: false` |

These are hints, not guarantees. An optional `Annotations` Markdown attribute could override them in a future xc extension. This is not required for v1.

## Architecture

A Run Manager holds a `sync.Map` of active and completed runs. Each run wraps an xc `Runner` execution in a goroutine with ring-buffered stdout/stderr capture. Completed runs are kept in memory for result retrieval and evicted after the retention limit.

## Transport and Configuration

stdio only. The agent's host spawns the server as a subprocess.

```json
{
  "mcpServers": {
    "xc": {
      "command": "xc",
      "args": ["--mcp", "--file", "README.md", "--tail-lines", "50"]
    }
  }
}
```

Server flags:

| Flag | Default | Description |
|------|---------|-------------|
| `--file` | `README.md` | Task file to parse |
| `--heading` | `Tasks` | Markdown heading containing tasks |
| `--tail-lines` | `50` | Default tail length for output |
| `--max-runs` | `20` | Number of completed runs to retain |

## Implementation

The server reuses xc's existing packages:

- **Parser**: `parser/parsemd` and `parser/parseorg` for task extraction.
- **Runner**: `run.Runner` for task execution, dependency resolution, input
  handling, and script interpretation.
- **Models**: `models.Task` for the task data structure.

The MCP protocol layer uses the official Go SDK (`github.com/modelcontextprotocol/go-sdk`).

## Example Agent Session

```
Agent                           MCP Server
  |                                 |
  |-- tools/list ------------------>|
  |<-- [xc_build, xc_test, ...]  --|
  |                                 |
  |-- xc_list() ------------------>|
  |<-- [{name: "build", ...}, ...] |
  |                                 |
  |-- xc_test(async: true) ------->|
  |<-- "Run ID: test-a1b2c3"     --|
  |                                 |
  | (agent continues writing code)  |
  |                                 |
  |-- xc_result("test-a1b2c3") --->|
  |<-- "Still running (8s)" -------|
  |                                 |
  | (agent continues working)       |
  |                                 |
  |-- xc_result("test-a1b2c3") --->|
  |<-- "Completed. Exit 0. ..."  --|
  |                                 |
```

Condition	Annotation
Task has no script (deps-only)	`readOnlyHint: true`
Task name contains `test`, `lint`, `check`, `fmt`, `vet`	`destructiveHint: false`, `idempotentHint: true`
Task name contains `deploy`, `push`, `delete`, `clean`, `drop`	`destructiveHint: true`
Default	`destructiveHint: false`, `idempotentHint: false`

Flag	Default	Description
`--file`	`README.md`	Task file to parse
`--heading`	`Tasks`	Markdown heading containing tasks
`--tail-lines`	`50`	Default tail length for output
`--max-runs`	`20`	Number of completed runs to retain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: MCP server #170

Problem

Goals

Non-goals

Tool Registration

Per-task tools

Utility tools

Why not 4 tools per task?

Tool Schemas

Per-task tool

xc_list

xc_describe

xc_result

Execution Model

Synchronous (default)

Async (`async: true`)

Parallel execution

Cancellation

Output Control

File Watching

Tool Annotations

Architecture

Transport and Configuration

Implementation

Example Agent Session

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tool	Purpose	Read-only
`xc_list`	Returns all tasks with names, descriptions, deps, and inputs	Yes
`xc_describe`	Returns full metadata for a single task (script, dep graph, env, dir)	Yes
`xc_result`	Retrieves the result of an async task invocation by run ID	Yes

Value	Behavior
`full`	Return all captured stdout and stderr
`tail` (default)	Return the last N lines of combined output (N = `tail_lines`, default 50)
`stderr`	Return only stderr (full). Useful when stdout is noisy but errors are what matters
`silent`	Return only the exit code and a pass/fail summary. No output

feature request: MCP server #170

Description

Problem

Goals

Non-goals

Tool Registration

Per-task tools

Utility tools

Why not 4 tools per task?

Tool Schemas

Per-task tool

xc_list

xc_describe

xc_result

Execution Model

Synchronous (default)

Async (async: true)

Parallel execution

Cancellation

Output Control

File Watching

Tool Annotations

Architecture

Transport and Configuration

Implementation

Example Agent Session

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Async (`async: true`)