-
Notifications
You must be signed in to change notification settings - Fork 43
Description
Problem
The MCP Inspector workflow is failing consistently with an 80% failure rate (8/10 recent runs failed). The workflow has been non-operational since 2026-01-05 (18 days ago).
Error Details
Failed step: Step 24 - "Start MCP gateway"
Last successful run: 2026-01-05
Recent failures:
- Run #37 - 2026-01-19T18:54:44Z
- Run #36 - 2026-01-16T19:24:44Z
- Run #35 - 2026-01-16T19:20:45Z
- Run gh add: better tracking of added/modified files #34 - 2026-01-12T18:55:15Z
Example run: §21148514645
Impact
- MCP tooling inspection capabilities are offline
- Cannot validate MCP server configurations
- Affects workflow development and debugging
Root Cause Analysis
The failure occurs at the "Start MCP gateway" step, which suggests:
- Tavily MCP server configuration issue - The workflow uses Tavily MCP server (similar to Daily News which recently recovered)
- MCP Gateway connectivity problem - Server may be unreachable or misconfigured
- Missing or invalid secrets - May need TAVILY_API_KEY or other credentials
Related Context
Daily News workflow had similar TAVILY_API_KEY issue and recovered after the secret was added (2026-01-22). Research workflow is also failing with likely similar root cause.
Recommended Investigation Steps
-
Check Tavily configuration:
- Verify TAVILY_API_KEY secret is accessible
- Review
.github/workflows/shared/mcp/tavily.mdconfiguration - Compare with Daily News configuration (now working)
-
Review MCP Gateway logs:
- Download artifacts from failed run: artifacts
- Check
/tmp/gh-aw/mcp-logs/for error messages - Look for connection timeouts or authentication errors
-
Test MCP Gateway startup:
- Run workflow manually with increased logging
- Check if issue is specific to scheduled runs or all triggers
- Verify Docker container images are accessible
-
Compare with working workflows:
- Smoke Claude and Smoke Codex are healthy (90% success rate)
- They don't use Tavily - check what MCP servers they use
- Identify differences in configuration
Success Criteria
- MCP Inspector workflow runs successfully
- "Start MCP gateway" step completes without errors
- Success rate returns to >80% over next 5 runs
Priority: P1 (High)
This workflow provides critical tooling inspection capabilities. Fix within 24 hours to restore MCP debugging functionality.
References:
AI generated by Workflow Health Manager - Meta-Orchestrator
- expires on Jan 24, 2026, 2:59 AM UTC