mcp: add system architecture awareness to analysis tools

## What

Make the MCP server (`rezolus mcp`) architecture-aware when analyzing recordings — the analysis tools (anomaly detection, correlation, PromQL queries, describe-metrics) currently work on metric values without consciously factoring in the underlying system topology and configuration.

## Why

Many diagnostic conclusions are valid only in the context of the host's architecture. Without that context, the MCP server can produce technically correct but operationally misleading analysis.

Examples where architecture changes the interpretation:

- **"CPU 12 shows high softirq"** — is CPU 12 listed in `isolcpus`? An SMT sibling of a hot CPU? On the wrong NUMA node for the device whose IRQs land there? The same number means three different things.
- **"Off-CPU time is high in cgroup X"** — is the cgroup hitting `cpu.max` quota, or is it just IO-bound? Throttling vs scheduling pressure looks identical without the cgroup config.
- **"ENA allowance exceeded"** — only meaningful on EC2 Nitro instances; on bare metal those counters never increment.
- **"Steal time is 5%"** — context-free verdict differs between bare-metal (alarming) and shared-tenancy VM (often normal during a live-migration window).
- **"L3 miss rate is 30%"** — depends on cache hierarchy size, which differs by CPU generation and NUMA layout.

The patterns documented in `docs/patterns.md` are explicitly architecture-conditional in many cases — that doc is for human operators; the MCP server should be able to apply the same conditioning automatically.

## Categories of awareness that would help

- **CPU topology** — cores, sockets, SMT siblings, NUMA nodes, cache hierarchy, frequency capabilities.
- **Cgroup hierarchy and configuration** — `cpu.max`, `memory.high`/`memory.max`, `cpuset.cpus`, parent/child relationships.
- **Kernel and userspace versioning** — kernel version (which tracepoints exist), libc, key driver versions.
- **Cloud / hypervisor context** — bare metal vs VM, cloud provider, instance type/family, hypervisor steal-time semantics.
- **Block / network device configuration** — IO scheduler, queue depth, IRQ affinity, NVMe poll mode, NIC RSS/RPS state, multipath topology.
- **Boot-time isolation posture** — `isolcpus`, `nohz_full`, `rcu_nocbs`, governor settings.

The agent already captures some of this (`systeminfo` is in parquet metadata per `docs/parquet_metadata.md`); the gap is in the MCP tools using it consistently when interpreting metrics.

## Concrete benefits

- **Anomaly detection** that knows "normal" for the host shape rather than treating every host's metrics as i.i.d.
- **Correlations** that respect topology (don't correlate "all CPUs" when the host has heterogeneous CPU pools, e.g. P-cores vs E-cores).
- **Suggested diagnostic queries** that adapt to the platform — different on AWS Nitro vs bare metal vs Azure.
- **More confident answers** to "is this metric value problematic" — the same value can be benign on one host and a SEV on another.

## Out of scope

This issue is about the *what* and *why*. Specific design choices (where awareness lives, how it's expressed in tool output, schema for the topology context, etc.) should be decided when this is picked up.

## Related

- `docs/parquet_metadata.md` — the agent's existing `systeminfo` capture
- `docs/patterns.md` — diagnostic patterns whose validity is architecture-conditional
- #883 — sampler-development methodology doc, which intersects on the question of "what does rezolus consider important about a system"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mcp: add system architecture awareness to analysis tools #889

What

Why

Categories of awareness that would help

Concrete benefits

Out of scope

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

mcp: add system architecture awareness to analysis tools #889

Description

What

Why

Categories of awareness that would help

Concrete benefits

Out of scope

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions