-
Notifications
You must be signed in to change notification settings - Fork 4.5k
feat: implement fallback LLMs for agent execution #3033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: implement fallback LLMs for agent execution #3033
Conversation
- Add fallback_llms field to Agent class to support multiple LLM fallbacks - Modify get_llm_response in agent_utils.py to try fallback LLMs when primary fails - Update CrewAgentExecutor and LiteAgent to pass fallback LLMs to get_llm_response - Add smart error handling that skips fallbacks for auth errors but tries them for other failures - Add comprehensive tests covering all fallback scenarios - Maintain full backward compatibility for agents without fallback LLMs Addresses GitHub Issue #3032: Support Fallback LLMs for Agent Execution Co-Authored-By: João <joao@crewai.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Disclaimer: This review was made by a crew of AI Agents. Code Review Comment: Fallback LLMs ImplementationOverviewThe implementation of fallback LLM support for agent execution is a significant enhancement that improves resilience and fault tolerance when primary LLMs encounter failures. Overall, the PR demonstrates a well-structured approach with comprehensive test coverage. File-by-File Analysissrc/crewai/agent.pyPositive Aspects:
Suggestions for Improvement:
@validator('fallback_llms')
def validate_fallback_llms(cls, v):
if v is not None:
seen = set()
unique_llms = []
for llm in v:
llm_id = str(llm)
if llm_id not in seen:
seen.add(llm_id)
unique_llms.append(llm)
return unique_llms
return v src/crewai/utilities/agent_utils.pyPositive Aspects:
Suggestions for Improvement:
async def get_llm_response_with_timeout(
llm, messages, callbacks, timeout=30
):
try:
async with asyncio.timeout(timeout):
return await llm.acall(messages, callbacks=callbacks)
except asyncio.TimeoutError:
raise TimeoutError(f"LLM call timed out after {timeout} seconds") tests/test_agent_fallback_llms.pyPositive Aspects:
Suggestions for Improvement:
@pytest.mark.parametrize("error,should_fallback", [
(AuthenticationError("Invalid key"), False),
(ContextWindowExceededError("Too long"), True),
(RateLimitError("Rate limit"), True),
(TimeoutError("Timeout"), True),
])
def test_fallback_error_handling(error, should_fallback):
primary_llm = MagicMock()
fallback_llm = MagicMock()
primary_llm.call.side_effect = error
fallback_llm.call.return_value = "Fallback response"
if not should_fallback:
with pytest.raises(type(error)):
result = get_llm_response(
llm=primary_llm,
messages=[{"role": "user", "content": "test"}],
callbacks=[],
printer=Printer(),
fallback_llms=[fallback_llm]
)
else:
result = get_llm_response(...)
assert result == "Fallback response" General Recommendations
class FallbackConfig(BaseModel):
max_retries: int = 3
timeout_seconds: int = 30
parallel_attempts: bool = False
cost_threshold: float = None
class FallbackMetrics:
def __init__(self):
self.fallback_attempts = 0
self.successful_fallbacks = 0
self.failed_fallbacks = 0
self.response_times = []
def record_attempt(self, success: bool, response_time: float):
self.fallback_attempts += 1
if success:
self.successful_fallbacks += 1
else:
self.failed_fallbacks += 1
self.response_times.append(response_time)
async def get_llm_response_parallel(...) Security Considerations
Performance ImpactThe implementation is expected to have minimal performance impact when the primary LLM succeeds, with additional latency only during fallback attempts. Exploring parallel execution could significantly enhance process efficiency for time-sensitive applications. The enhancements proposed align with best practices and aim to refine the design further while preserving the original architecture's integrity. This review acknowledges the robust foundation laid by the current implementation and encourages ongoing improvements in error handling, performance, and testing standards. |
Disclaimer: This review was made by a crew of AI Agents. Code Review for PR #3033: Implement Fallback LLMs for Agent Execution in crewAIInc/crewAISummary of Key Changes and FindingsThis PR introduces a robust fallback mechanism for language models (LLMs) used by agents, allowing multiple fallback LLMs to be specified and tried sequentially when the primary LLM call fails. The main updates include:
The patch is well-constructed with backward compatibility and clean integration into existing workflows. Detailed Review with Improvement Suggestions1.
|
Concern Area | Severity | Recommendation |
---|---|---|
Fallback LLM instantiation | Low | Add type checking before create_llm call. |
Exception handling robustness | Medium | Use isinstance checks for auth errors, import exceptions explicitly. |
Logging clarity | Low | Enhance fallback attempt messages with indices. |
Exception raising | Low | Wrap final exceptions for better debugging context. |
Test coverage | Low | Add tests with many fallbacks & capture printer output. |
Overall Assessment
This PR is a substantial and well-engineered enhancement, significantly improving the reliability of agent LLM calls through fallback mechanisms. It is backward compatible and is supported by a robust test suite. The recommended improvements mainly concern defensive coding and improving error detection clarity, but none block merging.
Conclusion: LGTM with minor improvements as noted above. This adds valuable robustness to the crewAI agent ecosystem.
Please let me know if you need assistance implementing any suggested improvements or generating code-ready patches for these recommendations.
Thank you for the great contribution!
- Fix type checker error by adding None check before raising last_exception - Fix ContextWindowExceededError constructor with correct signature (message, model, llm_provider) - Update auth error test assertion to match new print message format Co-Authored-By: João <joao@crewai.com>
Implement Fallback LLMs for Agent Execution
Overview
This PR implements fallback LLM support for CrewAI agents, addressing GitHub Issue #3032. The feature allows agents to automatically try alternative language models when the primary LLM fails, improving reliability and resilience of agent execution.
Changes Made
Core Implementation
fallback_llms
field to Agent class - Optional list of fallback LLMs that are tried when the primary LLM failsget_llm_response
function - Modified to support fallback logic with smart error handlingCrewAgentExecutor
andLiteAgent
now pass fallback LLMs to the response functionKey Features
Usage Example
Testing
tests/test_agent_fallback_llms.py
with 12 test cases covering:Files Modified
src/crewai/agent.py
- Added fallback_llms field and initialization logicsrc/crewai/utilities/agent_utils.py
- Enhanced get_llm_response with fallback supportsrc/crewai/agents/crew_agent_executor.py
- Updated to pass fallback LLMssrc/crewai/lite_agent.py
- Updated to pass fallback LLMstests/test_agent_fallback_llms.py
- Comprehensive test suite for fallback functionalityError Handling Strategy
Backward Compatibility
✅ Agents without
fallback_llms
work exactly as before✅ No changes to existing API surface
✅ All existing functionality preserved
✅ Default value of
fallback_llms
isNone
Link to Devin run
https://app.devin.ai/sessions/1c295a5d9b8848a097afb5d082d5768f
Requested by
João (joao@crewai.com)
Fixes #3032