Skip to content

feat: implement fallback LLMs for agent execution #3033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

devin-ai-integration[bot]
Copy link
Contributor

Implement Fallback LLMs for Agent Execution

Overview

This PR implements fallback LLM support for CrewAI agents, addressing GitHub Issue #3032. The feature allows agents to automatically try alternative language models when the primary LLM fails, improving reliability and resilience of agent execution.

Changes Made

Core Implementation

  • Added fallback_llms field to Agent class - Optional list of fallback LLMs that are tried when the primary LLM fails
  • Enhanced get_llm_response function - Modified to support fallback logic with smart error handling
  • Updated agent executors - Both CrewAgentExecutor and LiteAgent now pass fallback LLMs to the response function
  • Smart error handling - Authentication errors skip fallbacks (since they won't help), while other errors trigger fallback attempts

Key Features

  • Backward compatibility - Agents without fallback LLMs work exactly as before
  • Flexible configuration - Supports string model names or LLM instances for fallbacks
  • Intelligent error handling - Different error types are handled appropriately
  • Clear user feedback - Printer messages inform users when fallbacks are being tried
  • Ordered fallback attempts - Fallbacks are tried in the specified order

Usage Example

from crewai import Agent, Task
from crewai.llm import LLM

# Create agent with fallback LLMs
agent = Agent(
    role="Research Analyst",
    goal="Analyze market trends",
    backstory="Expert in market analysis",
    llm=LLM("gpt-4"),  # Primary LLM
    fallback_llms=[    # Fallback LLMs tried in order
        LLM("gpt-3.5-turbo"),
        LLM("claude-3-sonnet-20240229")
    ]
)

task = Task(
    description="Analyze the current tech market trends",
    expected_output="A comprehensive market analysis report",
    agent=agent
)

# If gpt-4 fails, it will automatically try gpt-3.5-turbo, then claude-3-sonnet
result = agent.execute_task(task)

Testing

  • Comprehensive test suite - Added tests/test_agent_fallback_llms.py with 12 test cases covering:
    • Basic fallback functionality when primary LLM fails
    • Multiple fallback LLMs in sequence
    • Authentication errors that skip fallbacks
    • Context window errors that try fallbacks
    • All LLMs failing scenario
    • Backward compatibility (no fallback LLMs specified)
    • String initialization of fallback LLMs
    • Empty response handling
  • Manual verification - Created and ran basic functionality tests to verify implementation
  • Backward compatibility testing - Ensured existing agents work unchanged

Files Modified

  • src/crewai/agent.py - Added fallback_llms field and initialization logic
  • src/crewai/utilities/agent_utils.py - Enhanced get_llm_response with fallback support
  • src/crewai/agents/crew_agent_executor.py - Updated to pass fallback LLMs
  • src/crewai/lite_agent.py - Updated to pass fallback LLMs
  • tests/test_agent_fallback_llms.py - Comprehensive test suite for fallback functionality

Error Handling Strategy

  • Authentication errors - Skip remaining fallbacks (won't help)
  • Context window errors - Try fallbacks (different models may have different limits)
  • General LLM errors - Try fallbacks (may be model-specific issues)
  • Empty responses - Try fallbacks (may be temporary model issues)

Backward Compatibility

✅ Agents without fallback_llms work exactly as before
✅ No changes to existing API surface
✅ All existing functionality preserved
✅ Default value of fallback_llms is None

Link to Devin run

https://app.devin.ai/sessions/1c295a5d9b8848a097afb5d082d5768f

Requested by

João (joao@crewai.com)

Fixes #3032

- Add fallback_llms field to Agent class to support multiple LLM fallbacks
- Modify get_llm_response in agent_utils.py to try fallback LLMs when primary fails
- Update CrewAgentExecutor and LiteAgent to pass fallback LLMs to get_llm_response
- Add smart error handling that skips fallbacks for auth errors but tries them for other failures
- Add comprehensive tests covering all fallback scenarios
- Maintain full backward compatibility for agents without fallback LLMs

Addresses GitHub Issue #3032: Support Fallback LLMs for Agent Execution

Co-Authored-By: João <joao@crewai.com>
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@joaomdmoura
Copy link
Collaborator

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment: Fallback LLMs Implementation

Overview

The implementation of fallback LLM support for agent execution is a significant enhancement that improves resilience and fault tolerance when primary LLMs encounter failures. Overall, the PR demonstrates a well-structured approach with comprehensive test coverage.

File-by-File Analysis

src/crewai/agent.py

Positive Aspects:

  • The introduction of the fallback_llms field is clean and well-structured.
  • Type hints and field descriptions provide clarity for future maintainers.
  • The fallback_llm initialization in the post_init_setup method is effectively implemented.

Suggestions for Improvement:

  • Validation for Duplicates: Implement a validator to ensure fallback_llms contains unique entries.
@validator('fallback_llms')
def validate_fallback_llms(cls, v):
    if v is not None:
        seen = set()
        unique_llms = []
        for llm in v:
            llm_id = str(llm)
            if llm_id not in seen:
                seen.add(llm_id)
                unique_llms.append(llm)
        return unique_llms
    return v

src/crewai/utilities/agent_utils.py

Positive Aspects:

  • The handling of authentication errors is well thought out, with clear logging of fallback attempts.
  • The implementation maintains backward compatibility, which is crucial for existing users.

Suggestions for Improvement:

  • Extract Error Classification Logic: Consider creating a separate function for classifying errors, enhancing reusability.
  • Timeout Handling: Add a timeout feature to LLM calls to prevent indefinite waiting.
async def get_llm_response_with_timeout(
    llm, messages, callbacks, timeout=30
):
    try:
        async with asyncio.timeout(timeout):
            return await llm.acall(messages, callbacks=callbacks)
    except asyncio.TimeoutError:
        raise TimeoutError(f"LLM call timed out after {timeout} seconds")

tests/test_agent_fallback_llms.py

Positive Aspects:

  • The test coverage is comprehensive, including edge cases and error conditions.
  • Clear organization improves readability.

Suggestions for Improvement:

  • Parametrized Tests: Incorporate parametrized tests for varying error types to enhance testing robustness.
  • Performance Benchmarks: Include performance benchmarks to assess the efficiency of the fallback mechanism.
@pytest.mark.parametrize("error,should_fallback", [
    (AuthenticationError("Invalid key"), False),
    (ContextWindowExceededError("Too long"), True),
    (RateLimitError("Rate limit"), True),
    (TimeoutError("Timeout"), True),
])
def test_fallback_error_handling(error, should_fallback):
    primary_llm = MagicMock()
    fallback_llm = MagicMock()
    primary_llm.call.side_effect = error
    fallback_llm.call.return_value = "Fallback response"
    
    if not should_fallback:
        with pytest.raises(type(error)):
            result = get_llm_response(
                llm=primary_llm,
                messages=[{"role": "user", "content": "test"}],
                callbacks=[],
                printer=Printer(),
                fallback_llms=[fallback_llm]
            )
    else:
        result = get_llm_response(...)
        assert result == "Fallback response"

General Recommendations

  • Configuration Options: A configuration class for Fallback settings can enhance flexibility.
class FallbackConfig(BaseModel):
    max_retries: int = 3
    timeout_seconds: int = 30
    parallel_attempts: bool = False
    cost_threshold: float = None
  • Metrics Collection: Implement a metrics collection system for monitoring fallback attempts and successes.
class FallbackMetrics:
    def __init__(self):
        self.fallback_attempts = 0
        self.successful_fallbacks = 0
        self.failed_fallbacks = 0
        self.response_times = []

    def record_attempt(self, success: bool, response_time: float):
        self.fallback_attempts += 1
        if success:
            self.successful_fallbacks += 1
        else:
            self.failed_fallbacks += 1
        self.response_times.append(response_time)
  • Parallel Fallback Execution: Consider implementing an asynchronous approach to handle multiple fallback requests simultaneously.
async def get_llm_response_parallel(...)

Security Considerations

  • Ensure API keys for fallback LLMs are well-managed and rotated regularly.
  • Implement rate limiting for fallback LLMs to manage costs and prevent abuse.
  • Maintain error logging practices that secure sensitive data.

Performance Impact

The implementation is expected to have minimal performance impact when the primary LLM succeeds, with additional latency only during fallback attempts. Exploring parallel execution could significantly enhance process efficiency for time-sensitive applications.

The enhancements proposed align with best practices and aim to refine the design further while preserving the original architecture's integrity. This review acknowledges the robust foundation laid by the current implementation and encourages ongoing improvements in error handling, performance, and testing standards.

@mplachta
Copy link
Contributor

Disclaimer: This review was made by a crew of AI Agents.

Code Review for PR #3033: Implement Fallback LLMs for Agent Execution in crewAIInc/crewAI


Summary of Key Changes and Findings

This PR introduces a robust fallback mechanism for language models (LLMs) used by agents, allowing multiple fallback LLMs to be specified and tried sequentially when the primary LLM call fails. The main updates include:

  • Adding an optional fallback_llms field to the Agent class to hold multiple fallback LLMs.
  • Adjusting the agent executors (CrewAgentExecutor and LiteAgent) to pass fallback LLMs to the LLM response utility.
  • Refactoring get_llm_response to implement retry logic over the primary and fallback LLMs, with special handling for authentication errors to halt fallback attempts.
  • Comprehensive new test suite covering fallback scenarios including authentication failures, context errors, multiple fallback attempts, and backward compatibility.

The patch is well-constructed with backward compatibility and clean integration into existing workflows.


Detailed Review with Improvement Suggestions

1. src/crewai/agent.py

  • Positive:

    • Clear addition of fallback_llms with proper type hints and descriptive field.
    • Fallback LLMs are instantiated similarly to the primary LLM in post_init_setup.
  • Areas for Improvement:

    • The fallback LLMs are recreated with create_llm unconditionally, which risks double-instantiation if fallback LLMs are already instances.
    • Suggest adding a type check before calling create_llm, for example:
      if self.fallback_llms:
          self.fallback_llms = [
              llm if isinstance(llm, BaseLLM) else create_llm(llm)
              for llm in self.fallback_llms
          ]
    • Optionally, consider defining fallback_llms with default_factory=list instead of default=None to avoid mutable default issues and improve type stability if an empty list is a valid default.

2. src/crewai/agents/crew_agent_executor.py & src/crewai/lite_agent.py

  • Passing fallback LLMs to get_llm_response uses getattr(self.agent, 'fallback_llms', None) which is robust and backward-compatible.
  • Minor note: consider defining a formal interface or property on the Agent class to expose fallback LLMs for future-proofing, but not critical at this stage.

3. src/crewai/utilities/agent_utils.py

  • Strengths:

    • The refactored logic elegantly iterates over primary and fallback LLMs.
    • Skips fallbacks on authentication errors, which is a valuable optimization.
    • Printer logs provide detailed insight into the fallback process.
  • Enhancement Suggestions:

    • The detection of authentication errors relies on checking that e.__class__.__module__.startswith("litellm") plus string matching in the error message. This is brittle and could fail if errors are wrapped or originate from other sources.
    • Instead, import the specific exception classes (e.g., AuthenticationError) and use isinstance checks for robust error detection:
      from litellm.exceptions import AuthenticationError
      
      if isinstance(e, AuthenticationError):
          printer.print(...skip fallbacks...)
          raise e
    • Improve fallback logging for clarity by showing which fallback number is being tried relative to total fallbacks, e.g.:
      printer.print(content=f"Trying fallback LLM {i} of {len(llms_to_try)-1}...", color="yellow")
    • When all LLMs fail, raise a wrapped exception with enhanced context:
      raise RuntimeError("All LLMs failed. See above for details.") from last_exception

4. tests/test_agent_fallback_llms.py

  • Excellent coverage: Tests cover a wide range of scenarios including primary failure, multiple fallbacks, authentication errors, context window issues, all failures, backward compatibility, string initialization, empty responses, and primary success cases where fallbacks are skipped.

  • Minor suggestions:

    • Consider adding a test with a larger number of fallback LLMs (e.g., 5 or more) to verify scalability of the fallback loop.
    • Use MagicMock(spec=LLM) to tighten mock typing and catch any interface mismatches early.
    • Optionally, capture printer output (e.g., using pytest's capfd) to assert that error and fallback messages are output as expected, which will enhance test robustness.

Historical Context and Related Work

This PR addresses GitHub Issue #3032, proposing improved resiliency of agents by adding fallback LLMs. Similar patterns have been introduced in past PRs focused on error handling and LLM abstraction, but this is the first comprehensive multi-LLM fallback implementation with thoughtful error class handling.

Tests reflect a matured testing strategy consistent with prior PRs emphasizing thorough mocks and error simulation.


Summary Table of Recommendations

Concern Area Severity Recommendation
Fallback LLM instantiation Low Add type checking before create_llm call.
Exception handling robustness Medium Use isinstance checks for auth errors, import exceptions explicitly.
Logging clarity Low Enhance fallback attempt messages with indices.
Exception raising Low Wrap final exceptions for better debugging context.
Test coverage Low Add tests with many fallbacks & capture printer output.

Overall Assessment

This PR is a substantial and well-engineered enhancement, significantly improving the reliability of agent LLM calls through fallback mechanisms. It is backward compatible and is supported by a robust test suite. The recommended improvements mainly concern defensive coding and improving error detection clarity, but none block merging.

Conclusion: LGTM with minor improvements as noted above. This adds valuable robustness to the crewAI agent ecosystem.


Please let me know if you need assistance implementing any suggested improvements or generating code-ready patches for these recommendations.

Thank you for the great contribution!

- Fix type checker error by adding None check before raising last_exception
- Fix ContextWindowExceededError constructor with correct signature (message, model, llm_provider)
- Update auth error test assertion to match new print message format

Co-Authored-By: João <joao@crewai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support Fallback LLMs for Agent Execution
2 participants