Skip to content

Add mcp-malt integration tests workflow#154

Merged
tgrunnagle merged 28 commits intomainfrom
malt_cli_workflow_2025-12-10
Dec 16, 2025
Merged

Add mcp-malt integration tests workflow#154
tgrunnagle merged 28 commits intomainfrom
malt_cli_workflow_2025-12-10

Conversation

@tgrunnagle
Copy link
Collaborator

@tgrunnagle tgrunnagle commented Dec 10, 2025

Add a new GitHub Actions workflow that uses mcp-tef CLI (mtef) to run query alignment tests against mcp-optimizer.

Changes

  • Add .github/workflows/mcp-tef-integration-tests.yml workflow that:
    • Triggers on PR creation and updates
    • Installs mcp-tef CLI via uv tool install from the mcp-tef git repository
    • Deploys time and fetch MCP servers via ToolHive
    • Deploys mcp-optimizer configured to use the backend servers
    • Deploys mcp-tef server for LLM-based tool alignment testing
    • Creates a test case with prompt "Tell me the current time in Tokyo" expecting find_tool to be called
    • Executes the test run and polls for completion
    • Uploads test results as workflow artifacts (30 day retention)
    • Reports test status with warnings (non-blocking)

Configuration

  • Uses continue-on-error: true so test failures warn but don't block PR merges
  • Requires OPENROUTER_API_KEY secret for LLM access
  • Uses anthropic/claude-sonnet-4.5 model via OpenRouter

@claude
Copy link

claude bot commented Dec 10, 2025

Review Summary

Overall this is a well-structured integration test workflow. The code is clear and follows good GitHub Actions practices. Below are specific items to address:

Issues to Fix

1. Security: Hardcoded Docker host IP (172.17.0.1)

  • Line 65: Uses 172.17.0.1 which is Docker's default bridge but not guaranteed
  • Better: Use host.docker.internal or Docker networks for cross-container communication
  • Inconsistency: Line 170 uses host.docker.internal, line 65 should too

2. Error handling: set +e too early

  • Line 165: Disables error exit for entire test execution block
  • Problem: Silent failures between commands won't be caught
  • Better: Use set +e only around specific commands that should not fail the step

3. Redundant operation

  • Lines 147-150: Retrieves mcp-optimizer URL twice (also done in step at line 146)
  • Remove duplication by using steps.get-url.outputs.MCP_OPTIMIZER_URL

4. Hardcoded timeout values lack context

  • Line 72: 180 second timeout for readiness
  • Line 97: 60 second timeout for ingestion
  • Line 160: 60 second health timeout
  • Line 238: 120 second test timeout
  • Consider making these configurable via env vars at workflow level

Minor Improvements

5. Python script should use project dependencies

  • Lines 102-126: Inline Python uses MCP SDK but doesn't declare it
  • Could fail if uv run doesn't have access to those packages
  • Consider: Pre-install dependencies or use a separate script file

6. Shell verbosity

  • Many echo statements for debugging (good for CI logs)
  • Could reduce some repetitive logging once stable

7. Magic number in log tails

  • Line 312, 316, 320, 323: Different tail values (100, 50)
  • Consider consistent value or make configurable

Best Practices Followed ✓

  • Pinned action versions with commit SHAs
  • Non-blocking tests with continue-on-error: true
  • Artifact uploads for debugging
  • Proper secrets handling
  • Clear step names and structure
  • Good timeout handling overall

Performance Note

Workflow appears appropriate for its purpose. The polling intervals (2s) and timeouts are reasonable for integration tests.

Verdict

Request changes - Address items 1-3 before merge. Items 4-7 are optional improvements.

@tgrunnagle tgrunnagle force-pushed the malt_cli_workflow_2025-12-10 branch from 35f83df to 9fe83cd Compare December 15, 2025 15:42
@tgrunnagle
Copy link
Collaborator Author

@claude review these changes now that the new github action is working

@claude
Copy link

claude bot commented Dec 16, 2025

Claude finished @tgrunnagle's task —— View job


Code Review Complete ✅

I've reviewed the working MCP-TEF integration workflow. Overall this is a well-structured test workflow with good error handling and clear logging. Here's my analysis:


Strengths ✨

  1. Excellent error handling - Proper use of set +e/set -e around specific operations (lines 182, 212, 256)
  2. Good timeout strategy - Reasonable timeouts with clear max attempts (180s for readiness, 60s for ingestion, 120s for test completion)
  3. Comprehensive logging - Captures all container logs at the end for debugging (lines 325-338)
  4. Security best practices - Pinned action versions with commit SHAs, proper secrets handling
  5. Non-blocking design - Uses continue-on-error: true and warnings rather than hard failures
  6. Clean JSON output handling - Saves test results and creates structured summaries

Issues to Address 🔧

1. Docker networking inconsistency

Lines 70 & 177 - Uses hardcoded 172.17.0.1 for Docker bridge IP

--env TOOLHIVE_HOST=172.17.0.1  # line 70
MCP_SERVER_URL=$(echo "${MCP_SERVER_URL}" | sed 's/127\.0\.0\.1/172\.17\.0\.1/g')  # line 177

Issue: Docker's default bridge IP isn't guaranteed across environments
Recommendation: Consider using Docker networks or host.docker.internal for more portable container communication

2. Artifact upload version inconsistency

Line 300 - Uses actions/upload-artifact@v4 without SHA pinning

uses: actions/upload-artifact@v4

Issue: All other actions are pinned to specific commit SHAs for security, but this one only uses the version tag
Recommendation: Pin to a specific commit SHA like the other actions for consistency and security

3. Large log capture

Lines 329 & 332 - Captures 1000 lines of logs for main containers

docker logs mcp-tef 2>&1 | tail -1000 || true
docker logs mcp-optimizer 2>&1 | tail -1000 || true

Issue: May create very verbose output in CI logs
Consideration: This is acceptable for debugging, but if logs become unwieldy, consider reducing to 200-300 lines or conditionally capturing full logs only on failure


Minor Observations 📝

4. Python script inline vs file

Lines 112-134 - Embeds Python script using heredoc

  • Pro: Keeps everything in one file
  • Con: Could be harder to test/maintain if it grows
  • Current verdict: Fine as-is for this simple check

5. Magic values scattered throughout

Various hardcoded values: ports (9090, 8080, 8088), timeouts (180, 60, 120), poll intervals (2)

  • Current verdict: Acceptable for integration tests. Could be extracted to env vars at workflow level if reusability becomes a priority

6. Graceful exit on errors

Lines 204, 233, 250 use exit 0 after setting error output variables

  • Pro: Allows workflow to continue and report partial results
  • Con: Might mask certain failure modes
  • Current verdict: Appropriate given the continue-on-error: true design

Comparison with Previous Review

Looking at the earlier review, I notice the workflow has addressed some concerns:

  • Error handling improved - Now uses set +e around specific commands rather than globally
  • ⚠️ Docker networking - Still uses 172.17.0.1 (previously flagged)
  • ⚠️ URL retrieval - Lines 154-159 and 176-177 still retrieve/transform the URL separately

Verdict

Approve with minor suggestions - The workflow is functional and well-designed. The issues identified are minor and don't block merge:

The workflow demonstrates good CI/CD practices and should work reliably in the GitHub Actions environment.


[Job Run](https://github.com/StacklokLabs/mcp-optimizer/actions/runs/${{ github.run_id }}) | Branch: malt_cli_workflow_2025-12-10

@tgrunnagle
Copy link
Collaborator Author

@aponcedeleonch @therealnb The action is working now, so this is ready for review. Thanks!

aponcedeleonch
aponcedeleonch previously approved these changes Dec 16, 2025
Copy link
Member

@aponcedeleonch aponcedeleonch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, some minor comments

aponcedeleonch
aponcedeleonch previously approved these changes Dec 16, 2025
@tgrunnagle tgrunnagle merged commit 3bbe9e3 into main Dec 16, 2025
6 checks passed
@tgrunnagle tgrunnagle deleted the malt_cli_workflow_2025-12-10 branch December 16, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants