Skip to content

Conversation

@elasticdotventures
Copy link

@elasticdotventures elasticdotventures commented Dec 3, 2025

Here are the instructions for the pm2-mcp development agent:

---
PM2-MCP Enhancement Analysis for LLM Observability

Current Assessment (Based on Testing)

Working Tools Tested:
- ✅ pm2_list_processes - Returns basic metrics (status, uptime, cpu, memory)
- ✅ pm2_describe_process - Returns full process config and environment
- ✅ pm2_tail_logs - Returns recent log lines from stdout/stderr

Observations from Real Usage:
1. JSON output format is excellent for LLM parsing
2. describe_process is extremely verbose (full env vars duplicated)
3. No semantic understanding of process state beyond "online/stopped/errored"
4. Logs are raw text without context extraction
5. No way to detect "stuck" processes (online but not progressing)
6. Sensitive data (API keys, tokens) exposed in environment dumps

---
Feature Gaps & Enhancement Priorities (Ranked by Impact)

🔥 CRITICAL - Highest Impact for LLM Agents

1. Semantic Process State (Impact: 10/10)

Problem: "online" status doesn't tell an LLM what's actually happening
Solution: Add intelligent state detection

// Proposed new field in pm2_list_processes and pm2_describe_process
{
  "semantic_state": {
    "status": "downloading",  // online|starting|downloading|processing|idle|stuck|degraded
    "context": "Pulling container image (blob 15/20)",
    "progress": 75,  // percentage when applicable
    "confidence": 0.9,  // how certain we are about this interpretation
    "inferred_from": "log_pattern_match"  // or "health_endpoint", "restart_count", etc.
  }
}

Implementation:
- Parse recent logs for common patterns (downloading, compiling, serving, etc.)
- Track restart frequency to detect crash loops
- Monitor log silence to detect "stuck" processes
- Integrate with health check endpoints if available

Why Critical: LLMs need to know "is this process working or should I intervene?" Currently they only know
"is it running?"

---
2. Log Intelligence & Pattern Extraction (Impact: 9/10)

Problem: Raw logs require LLM to parse unstructured text
Solution: Add structured log analysis tool

// New tool: pm2_analyze_logs
{
  "process": "comfyui",
  "timeframe_minutes": 5,
  "analysis": {
    "current_activity": "downloading_container_image",
    "detected_patterns": [
      {
        "pattern": "copying_blob",
        "occurrences": 17,
        "last_seen": "2025-12-03T21:01:09Z",
        "sample": "Copying blob sha256:9fa1d2a8ea5b..."
      }
    ],
    "errors_found": [],
    "warnings_found": [],
    "progress_indicators": [
      {
        "metric": "blobs_downloaded",
        "current": 17,
        "estimated_total": 20,
        "trend": "increasing"
      }
    ],
    "anomalies": [],
    "suggested_action": "wait_for_completion"  // or "investigate", "restart", "none"
  }
}

Implementation:
- Regex patterns for common frameworks (express, fastapi, docker, webpack, etc.)
- Error/warning level detection
- Progress indicator extraction (percentage, X/Y counts, etc.)
- Anomaly detection (sudden log spam, repeated errors)

Why Critical: Enables LLM to understand process behavior without reading hundreds of log lines

---
3. Health Check Integration (Impact: 8/10)

Problem: No way to verify application is actually functioning
Solution: Add configurable health checks

// Extension to ecosystem config
{
  "apps": [{
    "name": "comfyui",
    "health_check": {
      "type": "http",  // or "tcp", "exec"
      "endpoint": "http://localhost:8188/health",
      "interval_seconds": 30,
      "timeout_seconds": 10,
      "retries": 3,
      "expected_response": {
        "status_code": 200,
        "body_contains": "ok"  // optional
      }
    }
  }]
}

// New field in process status
{
  "health": {
    "status": "healthy",  // healthy|unhealthy|unknown|checking
    "last_check": "2025-12-03T21:05:00Z",
    "consecutive_failures": 0,
    "latency_ms": 45,
    "message": "Responded with 200 OK in 45ms"
  }
}

Why Critical: Distinguishes "process running" from "application working"

---
🚀 HIGH IMPACT - Very Valuable

4. Simplified Describe (Privacy-Aware) (Impact: 8/10)

Problem: Full environment dump is verbose and exposes secrets
Solution: Add filtering and privacy controls

// New tool: pm2_describe_process_safe
{
  "process": "comfyui",
  "include_environment": false,  // default false
  "environment_filter": ["PATH", "NODE_ENV", "PORT"],  // allowlist
  "redact_secrets": true,  // redact values matching secret patterns
}

Implementation:
- Filter out environment variables by default
- Redact values containing: API_KEY, TOKEN, PASSWORD, SECRET
- Remove duplicate data (env appears twice currently)
- Option to include specific vars when needed

Why Important: Makes output more readable for LLMs, prevents credential leaks

---
5. Change Detection & Alerts (Impact: 7/10)

Problem: No way to know when process behavior changes
Solution: Track and report state changes

// New tool: pm2_get_recent_events
{
  "process": "comfyui",
  "since_minutes": 60,
  "events": [
    {
      "timestamp": "2025-12-03T20:42:27Z",
      "type": "process_started",
      "details": "Started by user via PM2"
    },
    {
      "timestamp": "2025-12-03T20:50:15Z",
      "type": "state_change",
      "details": "Semantic state changed: null → downloading",
      "context": "Detected container image pull in logs"
    },
    {
      "timestamp": "2025-12-03T21:00:00Z",
      "type": "resource_spike",
      "details": "Memory increased from 5MB → 250MB",
      "context": "Normal for this stage (image extraction)"
    }
  ]
}

Why Important: Helps LLM understand process lifecycle and detect issues

---
6. Resource Trend Analysis (Impact: 6/10)

Problem: Current metrics are point-in-time only
Solution: Add historical tracking

// Extension to pm2_describe_process
{
  "resource_trends": {
    "memory": {
      "current": 5373952,
      "avg_5min": 5200000,
      "avg_1hour": 4800000,
      "trend": "stable",  // increasing|decreasing|stable|volatile
      "anomaly": false
    },
    "cpu": {
      "current": 0,
      "avg_5min": 0.5,
      "trend": "decreasing",
      "anomaly": false
    },
    "restarts": {
      "total": 0,
      "last_hour": 0,
      "rate_per_hour": 0,
      "trend": "stable"
    }
  }
}

Why Important: Distinguishes normal vs abnormal resource usage

---
💡 MEDIUM IMPACT - Nice to Have

7. Dependency Detection (Impact: 5/10)

Track which processes depend on others (e.g., app needs database)

8. Cost Attribution (Impact: 4/10)

Estimate resource costs based on CPU/memory usage

9. Log Search (Impact: 6/10)

Search across all process logs with filters

10. Metric Exports (Impact: 5/10)

Export metrics to Prometheus/StatsD format

---
Implementation Priority Recommendation

Phase 1 (MVP):
1. Semantic Process State
2. Log Intelligence
3. Simplified Describe (Privacy-Aware)

Phase 2:
4. Health Check Integration
5. Change Detection & Alerts

Phase 3:
6. Resource Trend Analysis
---
Current Tool Usage Example (Reference)

// What worked well in testing:
await mcp__pm2-process-manager__pm2_list_processes()
// → Returns: {processes: [{name, status, uptime, cpu, memory, ...}]}

await mcp__pm2-process-manager__pm2_describe_process({process: "comfyui"})
// → Returns: Very detailed config (too much data, needs filtering)

await mcp__pm2-process-manager__pm2_tail_logs({
  process: "comfyui",
  lines: 15,
  type: "err"
})
// → Returns: {lines: ["log line 1", "log line 2", ...]}

elasticdotventures and others added 2 commits December 3, 2025 12:12
Add MCP server support to PM2 for process management through MCP-compatible clients.

Features:
- New pm2-mcp binary that exposes PM2 process management via MCP
- 12 MCP tools for process lifecycle, logging, and monitoring:
  - pm2_list_processes, pm2_describe_process
  - pm2_start_process, pm2_restart_process, pm2_reload_process
  - pm2_stop_process, pm2_delete_process
  - pm2_flush_logs, pm2_reload_logs, pm2_tail_logs
  - pm2_dump, pm2_kill_daemon
- 2 MCP resources for real-time process information:
  - pm2://processes (list)
  - pm2://process/{id} (detail)
- Automatic sandbox environment detection and adaptation
- Support for stdio and HTTP (Streamable) transports
- Client notifications for sandbox status and recommendations
- Compatible with Claude Code, Codex, and other MCP clients

Implementation:
- New lib/mcp/server.js with full MCP server implementation
- Uses @modelcontextprotocol/sdk for MCP protocol
- Sandbox detection checks home directory writability and environment
- Auto-selects writable PM2_HOME in sandboxed environments
- No-daemon mode by default for MCP client compatibility
- Comprehensive environment variable configuration

Documentation:
- README with MCP server quickstart and setup commands
- Environment variables table (PM2_MCP_*, PM2_HOME, etc.)
- Sandbox detection explanation
- Tool and resource documentation
- Justfile recipes for easy registration with MCP clients

Related:
- Enables pkgx packaging: pkgxdev/pantry#11219
- Development fork: https://github.com/PromptExecution/pm2-mcp
- MCP Specification: https://modelcontextprotocol.io/

Co-authored-by: Claude <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Model Context Protocol (MCP) server functionality to PM2, enabling process management through MCP-compatible clients like Claude Code and Codex. The implementation provides both stdio and HTTP transport options with extensive tooling for PM2 process control.

Key Changes:

  • Adds comprehensive MCP server with 13 tools and 2 resource endpoints for PM2 process management
  • Implements sandbox detection and adaptive PM2_HOME resolution for restricted environments
  • Includes semantic state inference with log pattern analysis for enhanced process monitoring

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
package.json Added MCP SDK dependency, increased Node.js requirement to 22.0.0, added pm2-mcp bin entry and npm script
lib/mcp/server.js New MCP server implementation with tools, resources, sandbox detection, log analysis, and transport abstraction
bin/pm2-mcp New CLI entry point with argument parsing for transport configuration and PM2 self-management
README.md Added comprehensive MCP server documentation including setup, environment variables, and feature descriptions
Justfile Added automation recipes for registering MCP server with clients and debugging sandbox detection

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"version": "6.0.14",
"engines": {
"node": ">=16.0.0"
"node": ">=22.0.0"
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the minimum Node.js version from >=16.0.0 to >=22.0.0 is a breaking change that will prevent users on Node.js 16.x-21.x from using this version. This contradicts the PR metadata which indicates "BC breaks? no". Either update the PR metadata to indicate this is a breaking change, or reconsider this version requirement increase.

Suggested change
"node": ">=22.0.0"
"node": ">=16.0.0"

Copilot uses AI. Check for mistakes.
#!/usr/bin/env bash
set -euo pipefail
echo "Testing normal environment:"
node -e "const {detectSandbox} = require('./lib/mcp/server.js'); console.log(detectSandbox ? 'Available' : 'Not exported');" || echo "Normal detection test"
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function detectSandbox() is not exported from the module, but the Justfile at line 59 attempts to use it in a test. Either export this function or update the test to use a different approach.

Copilot uses AI. Check for mistakes.
Comment on lines +349 to +369
async function tailFile(filePath, lineCount) {
const fh = await fs.promises.open(filePath, 'r');
try {
const stats = await fh.stat();
let position = stats.size;
const chunkSize = 8192;
let buffer = '';

while (position > 0 && buffer.split(/\r?\n/).length <= lineCount + 1) {
const readSize = Math.min(chunkSize, position);
position -= readSize;
const result = await fh.read({ buffer: Buffer.alloc(readSize), position });
buffer = result.buffer.slice(0, result.bytesRead).toString('utf8') + buffer;
}

const lines = buffer.trimEnd().split(/\r?\n/);
return lines.slice(-lineCount);
} finally {
await fh.close();
}
}
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tailFile function reads arbitrary files without path validation. If the filePath parameter comes from user-controlled process configuration, this could potentially allow reading sensitive files outside the intended log directory. Consider adding path validation to ensure the file path is within expected log directories.

Copilot uses AI. Check for mistakes.
console.error('[pm2-mcp][debug] failed to send sandbox notification', err);
}
});
}, 100);
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The magic number 100 (milliseconds) for the setTimeout delay is used without explanation. Consider extracting this to a named constant like SANDBOX_NOTIFICATION_DELAY_MS to improve code clarity and maintainability.

Copilot uses AI. Check for mistakes.
elasticdotventures and others added 3 commits December 4, 2025 10:11
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant