-
Notifications
You must be signed in to change notification settings - Fork 5
Fix MCP Client Initialization Error Handling and Import Issues #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…hecks - Implement proper exception handling for MCP client TaskGroup errors - Add BaseException catching to handle ExceptionGroup from async TaskGroup - Create initialize_mcp_client helper function for safe client initialization - Add defensive null checks for MCP clients before calling list_tools_sync() - Ensure graceful degradation when MCP clients fail to initialize - Add comprehensive error logging for TaskGroup exceptions - Include test files for MCP error handling verification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Summary
This PR addresses critical MCP client initialization issues with improved error handling for TaskGroup exceptions and graceful degradation. The implementation shows good understanding of the problem and provides comprehensive test coverage.
Critical Issues Requiring Fix
-
Logic Error in Exception Flow: The RuntimeError on line 133 in
cloud_engineer_agent.pywill always crash the application when both MCP clients fail, contradicting the graceful degradation goal. -
Unsafe Exception Handling: Bare
except:clauses in cleanup code (lines 162-169) can mask critical errors and hinder debugging. -
Security Risk: Hardcoded AWS credentials in test files establish dangerous patterns that could lead to credential leaks.
Positive Aspects
- Proper use of
BaseExceptionto catchExceptionGroupexceptions from async TaskGroup operations - Comprehensive test coverage with both unit and integration tests
- Good separation of concerns with the
initialize_mcp_clienthelper function - Detailed error logging and troubleshooting guidance
- Maintains backward compatibility
Recommendation
Request Changes - The logic error preventing graceful degradation must be fixed before merge. The security and exception handling issues should also be addressed to maintain code quality and safety standards.
You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.
| if not mcp_initialized: | ||
| raise RuntimeError("Failed to initialize any MCP clients") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛑 Logic Error: The RuntimeError on line 133 will always be raised when both MCP clients fail to initialize, causing the application to crash despite the graceful degradation logic in the exception handler. This contradicts the PR's goal of allowing the application to continue with limited functionality.
| if not mcp_initialized: | |
| raise RuntimeError("Failed to initialize any MCP clients") | |
| # Only raise error if this is unexpected - graceful degradation should handle MCP failures | |
| # if not mcp_initialized: | |
| # raise RuntimeError("Failed to initialize any MCP clients") |
| aws_docs_mcp_client.stop() | ||
| except: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛑 Exception Handling Risk: Using bare except: clauses can mask critical errors and make debugging difficult. This could hide important failure information during client cleanup operations.
| aws_docs_mcp_client.stop() | |
| except: | |
| pass | |
| try: | |
| aws_docs_mcp_client.stop() | |
| except Exception as e: | |
| print(f"Warning: Error stopping AWS docs MCP client: {e}") | |
| pass |
| try: | ||
| aws_diagram_mcp_client.stop() | ||
| except: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛑 Exception Handling Risk: Using bare except: clauses can mask critical errors and make debugging difficult. This could hide important failure information during client cleanup operations.
| try: | |
| aws_diagram_mcp_client.stop() | |
| except: | |
| pass | |
| try: | |
| aws_diagram_mcp_client.stop() | |
| except Exception as e: | |
| print(f"Warning: Error stopping AWS diagram MCP client: {e}") | |
| pass |
| def initialize_mcp_client(client_name, client_factory): | ||
| """ | ||
| Copy of the initialize_mcp_client function from cloud_engineer_agent.py | ||
| to test the exception handling logic independently. | ||
| """ | ||
| try: | ||
| print(f"Initializing {client_name}...") | ||
| client = client_factory() | ||
| print(f"Starting {client_name}...") | ||
| # Simulate the start call | ||
| if hasattr(client, 'start'): | ||
| client.start() | ||
| print(f"{client_name} started successfully.") | ||
| return client, True | ||
| except BaseException as e: | ||
| # Catch BaseException to handle both Exception and ExceptionGroup (from TaskGroup) | ||
| error_type = type(e).__name__ | ||
| error_message = str(e) | ||
|
|
||
| # Handle ExceptionGroup specially (Python 3.11+ async TaskGroup errors) | ||
| if error_type == "ExceptionGroup": | ||
| print(f"Error initializing {client_name}: TaskGroup exception occurred") | ||
| print(f" Exception type: {error_type}") | ||
| # Extract sub-exceptions from ExceptionGroup | ||
| if hasattr(e, 'exceptions'): | ||
| print(f" Sub-exceptions ({len(e.exceptions)}):") | ||
| for i, sub_exc in enumerate(e.exceptions, 1): | ||
| print(f" {i}. {type(sub_exc).__name__}: {sub_exc}") | ||
| else: | ||
| print(f" Details: {error_message}") | ||
| else: | ||
| print(f"Error initializing {client_name}: {error_message}") | ||
|
|
||
| return None, False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicating the initialize_mcp_client function creates maintenance overhead and potential inconsistencies. Consider importing the actual function from the main module or extracting it to a shared utility module.
| # Mock the environment variables to prevent early exit | ||
| os.environ['AWS_REGION'] = 'us-east-1' | ||
| os.environ['AWS_ACCESS_KEY_ID'] = 'test-key-id' | ||
| os.environ['AWS_SECRET_ACCESS_KEY'] = 'test-secret-key' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛑 Security Risk: Hardcoding AWS credentials in test files, even with placeholder values, can establish dangerous patterns and potentially leak into version control. Use environment variable mocking or test fixtures instead.
| # Mock the environment variables to prevent early exit | |
| os.environ['AWS_REGION'] = 'us-east-1' | |
| os.environ['AWS_ACCESS_KEY_ID'] = 'test-key-id' | |
| os.environ['AWS_SECRET_ACCESS_KEY'] = 'test-secret-key' | |
| # Mock the environment variables to prevent early exit | |
| import unittest.mock | |
| with unittest.mock.patch.dict(os.environ, { | |
| 'AWS_REGION': 'us-east-1', | |
| 'AWS_ACCESS_KEY_ID': 'test-key-id', | |
| 'AWS_SECRET_ACCESS_KEY': 'test-secret-key' | |
| }): |
Summary
This PR addresses critical MCP client initialization issues, including TaskGroup exception handling and ImportError fixes, ensuring the application can start reliably even when MCP services are unavailable.
Requirements Implemented
1. Fix ImportError for
mcp_initializedVariablemcp_initializedis properly defined at the module level incloud_engineer_agent.pyapp.pyand other modulesFalseinitially and updated based on MCP client initialization successImportError: cannot import name 'mcp_initialized' from 'cloud_engineer_agent'2. MCP Client Initialization Error Handling (GitHub Issue #1)
ExceptionGrouperrors from asyncTaskGroupoperationsBaseExceptioninstead ofExceptionto catch both regular exceptions andExceptionGroupinitialize_mcp_client()helper for safe, consistent client initializationclient failed to initialize unhandled errors in a TaskGroup (1 sub-exception)3. Defensive Null Checks for MCP Clients
mcp_initialized and aws_docs_mcp_client is not Nonebefore callinglist_tools_sync()Noneobjectscleanup()function as well4. Environment Variable Validation
env_validator.py(already merged in previous PRs)5. Health Check Endpoint
health_check.py(already merged in previous PRs)6. Code Simplification (Amazon Q Developer Suggestions)
mcp_initialized and aws_docs_mcp_client is not Noneaws_docs_mcp_clientshould beNonewhenmcp_initializedisFalse, defensive programming requires explicit null checks to prevent potentialAttributeErrorwhen callinglist_tools_sync()Technical Changes
Modified Files
cloud_engineer_agent.py- Core MCP client initialization improvementsinitialize_mcp_client()helper function for safe client initializationBaseExceptioncatching to handleExceptionGrouperrorsmcp_initialized,aws_docs_mcp_client, andaws_diagram_mcp_clientare module-level variableslist_tools_sync()test_exception_handling.py(New) - Unit tests for exception handling logictest_mcp_error_handling.py(New) - Integration tests for MCP error handlingError Handling Strategy
Before (Problematic)
After (Robust)
Benefits
Testing
Unit Tests
test_exception_handling.py- All 4 test cases passIntegration Tests
test_mcp_error_handling.py- Validates real-world scenariosDocker Validation
cloudeng-strands-agent/DockerfileRelated Issues
Deployment Notes
Validation Checklist