Feature/mcp semantic state #6067

elasticdotventures · 2025-12-03T21:42:12Z

Here are the instructions for the pm2-mcp development agent:

---
PM2-MCP Enhancement Analysis for LLM Observability

Current Assessment (Based on Testing)

Working Tools Tested:
- ✅ pm2_list_processes - Returns basic metrics (status, uptime, cpu, memory)
- ✅ pm2_describe_process - Returns full process config and environment
- ✅ pm2_tail_logs - Returns recent log lines from stdout/stderr

Observations from Real Usage:
1. JSON output format is excellent for LLM parsing
2. describe_process is extremely verbose (full env vars duplicated)
3. No semantic understanding of process state beyond "online/stopped/errored"
4. Logs are raw text without context extraction
5. No way to detect "stuck" processes (online but not progressing)
6. Sensitive data (API keys, tokens) exposed in environment dumps

---
Feature Gaps & Enhancement Priorities (Ranked by Impact)

🔥 CRITICAL - Highest Impact for LLM Agents

1. Semantic Process State (Impact: 10/10)

Problem: "online" status doesn't tell an LLM what's actually happening
Solution: Add intelligent state detection

// Proposed new field in pm2_list_processes and pm2_describe_process
{
  "semantic_state": {
    "status": "downloading",  // online|starting|downloading|processing|idle|stuck|degraded
    "context": "Pulling container image (blob 15/20)",
    "progress": 75,  // percentage when applicable
    "confidence": 0.9,  // how certain we are about this interpretation
    "inferred_from": "log_pattern_match"  // or "health_endpoint", "restart_count", etc.
  }
}

Implementation:
- Parse recent logs for common patterns (downloading, compiling, serving, etc.)
- Track restart frequency to detect crash loops
- Monitor log silence to detect "stuck" processes
- Integrate with health check endpoints if available

Why Critical: LLMs need to know "is this process working or should I intervene?" Currently they only know
"is it running?"

---
2. Log Intelligence & Pattern Extraction (Impact: 9/10)

Problem: Raw logs require LLM to parse unstructured text
Solution: Add structured log analysis tool

// New tool: pm2_analyze_logs
{
  "process": "comfyui",
  "timeframe_minutes": 5,
  "analysis": {
    "current_activity": "downloading_container_image",
    "detected_patterns": [
      {
        "pattern": "copying_blob",
        "occurrences": 17,
        "last_seen": "2025-12-03T21:01:09Z",
        "sample": "Copying blob sha256:9fa1d2a8ea5b..."
      }
    ],
    "errors_found": [],
    "warnings_found": [],
    "progress_indicators": [
      {
        "metric": "blobs_downloaded",
        "current": 17,
        "estimated_total": 20,
        "trend": "increasing"
      }
    ],
    "anomalies": [],
    "suggested_action": "wait_for_completion"  // or "investigate", "restart", "none"
  }
}

Implementation:
- Regex patterns for common frameworks (express, fastapi, docker, webpack, etc.)
- Error/warning level detection
- Progress indicator extraction (percentage, X/Y counts, etc.)
- Anomaly detection (sudden log spam, repeated errors)

Why Critical: Enables LLM to understand process behavior without reading hundreds of log lines

---
3. Health Check Integration (Impact: 8/10)

Problem: No way to verify application is actually functioning
Solution: Add configurable health checks

// Extension to ecosystem config
{
  "apps": [{
    "name": "comfyui",
    "health_check": {
      "type": "http",  // or "tcp", "exec"
      "endpoint": "http://localhost:8188/health",
      "interval_seconds": 30,
      "timeout_seconds": 10,
      "retries": 3,
      "expected_response": {
        "status_code": 200,
        "body_contains": "ok"  // optional
      }
    }
  }]
}

// New field in process status
{
  "health": {
    "status": "healthy",  // healthy|unhealthy|unknown|checking
    "last_check": "2025-12-03T21:05:00Z",
    "consecutive_failures": 0,
    "latency_ms": 45,
    "message": "Responded with 200 OK in 45ms"
  }
}

Why Critical: Distinguishes "process running" from "application working"

---
🚀 HIGH IMPACT - Very Valuable

4. Simplified Describe (Privacy-Aware) (Impact: 8/10)

Problem: Full environment dump is verbose and exposes secrets
Solution: Add filtering and privacy controls

// New tool: pm2_describe_process_safe
{
  "process": "comfyui",
  "include_environment": false,  // default false
  "environment_filter": ["PATH", "NODE_ENV", "PORT"],  // allowlist
  "redact_secrets": true,  // redact values matching secret patterns
}

Implementation:
- Filter out environment variables by default
- Redact values containing: API_KEY, TOKEN, PASSWORD, SECRET
- Remove duplicate data (env appears twice currently)
- Option to include specific vars when needed

Why Important: Makes output more readable for LLMs, prevents credential leaks

---
5. Change Detection & Alerts (Impact: 7/10)

Problem: No way to know when process behavior changes
Solution: Track and report state changes

// New tool: pm2_get_recent_events
{
  "process": "comfyui",
  "since_minutes": 60,
  "events": [
    {
      "timestamp": "2025-12-03T20:42:27Z",
      "type": "process_started",
      "details": "Started by user via PM2"
    },
    {
      "timestamp": "2025-12-03T20:50:15Z",
      "type": "state_change",
      "details": "Semantic state changed: null → downloading",
      "context": "Detected container image pull in logs"
    },
    {
      "timestamp": "2025-12-03T21:00:00Z",
      "type": "resource_spike",
      "details": "Memory increased from 5MB → 250MB",
      "context": "Normal for this stage (image extraction)"
    }
  ]
}

Why Important: Helps LLM understand process lifecycle and detect issues

---
6. Resource Trend Analysis (Impact: 6/10)

Problem: Current metrics are point-in-time only
Solution: Add historical tracking

// Extension to pm2_describe_process
{
  "resource_trends": {
    "memory": {
      "current": 5373952,
      "avg_5min": 5200000,
      "avg_1hour": 4800000,
      "trend": "stable",  // increasing|decreasing|stable|volatile
      "anomaly": false
    },
    "cpu": {
      "current": 0,
      "avg_5min": 0.5,
      "trend": "decreasing",
      "anomaly": false
    },
    "restarts": {
      "total": 0,
      "last_hour": 0,
      "rate_per_hour": 0,
      "trend": "stable"
    }
  }
}

Why Important: Distinguishes normal vs abnormal resource usage

---
💡 MEDIUM IMPACT - Nice to Have

7. Dependency Detection (Impact: 5/10)

Track which processes depend on others (e.g., app needs database)

8. Cost Attribution (Impact: 4/10)

Estimate resource costs based on CPU/memory usage

9. Log Search (Impact: 6/10)

Search across all process logs with filters

10. Metric Exports (Impact: 5/10)

Export metrics to Prometheus/StatsD format

---
Implementation Priority Recommendation

Phase 1 (MVP):
1. Semantic Process State
2. Log Intelligence
3. Simplified Describe (Privacy-Aware)

Phase 2:
4. Health Check Integration
5. Change Detection & Alerts

Phase 3:
6. Resource Trend Analysis
---
Current Tool Usage Example (Reference)

// What worked well in testing:
await mcp__pm2-process-manager__pm2_list_processes()
// → Returns: {processes: [{name, status, uptime, cpu, memory, ...}]}

await mcp__pm2-process-manager__pm2_describe_process({process: "comfyui"})
// → Returns: Very detailed config (too much data, needs filtering)

await mcp__pm2-process-manager__pm2_tail_logs({
  process: "comfyui",
  lines: 15,
  type: "err"
})
// → Returns: {lines: ["log line 1", "log line 2", ...]}

Add MCP server support to PM2 for process management through MCP-compatible clients. Features: - New pm2-mcp binary that exposes PM2 process management via MCP - 12 MCP tools for process lifecycle, logging, and monitoring: - pm2_list_processes, pm2_describe_process - pm2_start_process, pm2_restart_process, pm2_reload_process - pm2_stop_process, pm2_delete_process - pm2_flush_logs, pm2_reload_logs, pm2_tail_logs - pm2_dump, pm2_kill_daemon - 2 MCP resources for real-time process information: - pm2://processes (list) - pm2://process/{id} (detail) - Automatic sandbox environment detection and adaptation - Support for stdio and HTTP (Streamable) transports - Client notifications for sandbox status and recommendations - Compatible with Claude Code, Codex, and other MCP clients Implementation: - New lib/mcp/server.js with full MCP server implementation - Uses @modelcontextprotocol/sdk for MCP protocol - Sandbox detection checks home directory writability and environment - Auto-selects writable PM2_HOME in sandboxed environments - No-daemon mode by default for MCP client compatibility - Comprehensive environment variable configuration Documentation: - README with MCP server quickstart and setup commands - Environment variables table (PM2_MCP_*, PM2_HOME, etc.) - Sandbox detection explanation - Tool and resource documentation - Justfile recipes for easy registration with MCP clients Related: - Enables pkgx packaging: pkgxdev/pantry#11219 - Development fork: https://github.com/PromptExecution/pm2-mcp - MCP Specification: https://modelcontextprotocol.io/ Co-authored-by: Claude <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds Model Context Protocol (MCP) server functionality to PM2, enabling process management through MCP-compatible clients like Claude Code and Codex. The implementation provides both stdio and HTTP transport options with extensive tooling for PM2 process control.

Key Changes:

Adds comprehensive MCP server with 13 tools and 2 resource endpoints for PM2 process management
Implements sandbox detection and adaptive PM2_HOME resolution for restricted environments
Includes semantic state inference with log pattern analysis for enhanced process monitoring

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
package.json	Added MCP SDK dependency, increased Node.js requirement to 22.0.0, added pm2-mcp bin entry and npm script
lib/mcp/server.js	New MCP server implementation with tools, resources, sandbox detection, log analysis, and transport abstraction
bin/pm2-mcp	New CLI entry point with argument parsing for transport configuration and PM2 self-management
README.md	Added comprehensive MCP server documentation including setup, environment variables, and feature descriptions
Justfile	Added automation recipes for registering MCP server with clients and debugging sandbox detection

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-03T21:46:24Z

package.json

  "version": "6.0.14",
  "engines": {
-    "node": ">=16.0.0"
+    "node": ">=22.0.0"


Changing the minimum Node.js version from >=16.0.0 to >=22.0.0 is a breaking change that will prevent users on Node.js 16.x-21.x from using this version. This contradicts the PR metadata which indicates "BC breaks? no". Either update the PR metadata to indicate this is a breaking change, or reconsider this version requirement increase.

Suggested change

"node": ">=22.0.0"

"node": ">=16.0.0"

README.md

Copilot · 2025-12-03T21:46:25Z

Justfile

+    #!/usr/bin/env bash
+    set -euo pipefail
+    echo "Testing normal environment:"
+    node -e "const {detectSandbox} = require('./lib/mcp/server.js'); console.log(detectSandbox ? 'Available' : 'Not exported');" || echo "Normal detection test"


The function detectSandbox() is not exported from the module, but the Justfile at line 59 attempts to use it in a test. Either export this function or update the test to use a different approach.

Copilot · 2025-12-03T21:46:25Z

lib/mcp/server.js

+async function tailFile(filePath, lineCount) {
+  const fh = await fs.promises.open(filePath, 'r');
+  try {
+    const stats = await fh.stat();
+    let position = stats.size;
+    const chunkSize = 8192;
+    let buffer = '';
+
+    while (position > 0 && buffer.split(/\r?\n/).length <= lineCount + 1) {
+      const readSize = Math.min(chunkSize, position);
+      position -= readSize;
+      const result = await fh.read({ buffer: Buffer.alloc(readSize), position });
+      buffer = result.buffer.slice(0, result.bytesRead).toString('utf8') + buffer;
+    }
+
+    const lines = buffer.trimEnd().split(/\r?\n/);
+    return lines.slice(-lineCount);
+  } finally {
+    await fh.close();
+  }
+}


The tailFile function reads arbitrary files without path validation. If the filePath parameter comes from user-controlled process configuration, this could potentially allow reading sensitive files outside the intended log directory. Consider adding path validation to ensure the file path is within expected log directories.

lib/mcp/server.js

Copilot · 2025-12-03T21:46:26Z

lib/mcp/server.js

+          console.error('[pm2-mcp][debug] failed to send sandbox notification', err);
+        }
+      });
+    }, 100);


[nitpick] The magic number 100 (milliseconds) for the setTimeout delay is used without explanation. Consider extracting this to a named constant like SANDBOX_NOTIFICATION_DELAY_MS to improve code clarity and maintainability.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

elasticdotventures and others added 2 commits December 3, 2025 12:12

Add semantic state and log analysis tools to MCP server

e5be71b

Copilot AI review requested due to automatic review settings December 3, 2025 21:42

Copilot started reviewing on behalf of elasticdotventures December 3, 2025 21:42 View session

Copilot finished reviewing on behalf of elasticdotventures December 3, 2025 21:44

Copilot AI reviewed Dec 3, 2025

View reviewed changes

elasticdotventures and others added 3 commits December 4, 2025 10:11

Update README.md

04d43b1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update lib/mcp/server.js

8bcb96e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update lib/mcp/server.js

0fc5f62

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

elasticdotventures closed this Dec 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/mcp semantic state #6067

Feature/mcp semantic state #6067

elasticdotventures commented Dec 3, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feature/mcp semantic state #6067

Feature/mcp semantic state #6067

Conversation

elasticdotventures commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

elasticdotventures commented Dec 3, 2025 •

edited

Loading