Model Fallbacks

Automatically try alternative models when your primary model fails.

Basic Configuration

class MyAgent < ApplicationAgent
  model "gpt-4o"
  fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
end

How Fallbacks Work

When the primary model fails (after any retries):

1. Primary: gpt-4o
   └─ Fails after retries

2. Fallback 1: gpt-4o-mini
   └─ Succeeds! Return result

# If fallback 1 also fails:

3. Fallback 2: claude-3-5-sonnet
   └─ Succeeds! Return result

# If all fail:
   └─ Raise error

With Retries

Each model gets its own retry attempts:

class MyAgent < ApplicationAgent
  model "gpt-4o"
  retries max: 2
  fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
end

# Total possible attempts:
# gpt-4o: 3 attempts (1 + 2 retries)
# gpt-4o-mini: 3 attempts
# claude-3-5-sonnet: 3 attempts
# = Up to 9 attempts total

Tracking Fallback Usage

result = MyAgent.call(query: "test")

# Check which model succeeded
result.model_id         # Original model requested
result.chosen_model_id  # Model that actually succeeded
result.used_fallback?   # true if not the primary model

# Example
result.model_id         # => "gpt-4o"
result.chosen_model_id  # => "claude-3-5-sonnet"
result.used_fallback?   # => true

Execution Record Details

execution = RubyLLM::Agents::Execution.last

execution.model_id        # => "gpt-4o"
execution.chosen_model_id # => "claude-3-5-sonnet"

execution.attempts.each do |attempt|
  puts "Model: #{attempt['model_id']}"
  puts "Success: #{attempt['success']}"
  puts "Error: #{attempt['error_class']}" unless attempt['success']
end

Fallback Strategies

Cost Optimization

Start expensive, fall back to cheaper:

class CostOptimizedAgent < ApplicationAgent
  model "gpt-4o"               # Best quality
  fallback_models "gpt-4o-mini" # Cheaper fallback
end

Provider Diversity

Spread across providers for outage resilience:

class MultiProviderAgent < ApplicationAgent
  model "gpt-4o"
  fallback_models "claude-3-5-sonnet", "gemini-2.0-flash"
  # OpenAI → Anthropic → Google
end

Quality Tiers

Progressively lower quality:

class TieredAgent < ApplicationAgent
  model "gpt-4o"
  fallback_models "gpt-4o-mini", "gpt-3.5-turbo"
end

Speed Priority

Fastest models first:

class SpeedFirstAgent < ApplicationAgent
  model "gemini-2.0-flash"
  fallback_models "gpt-4o-mini", "claude-3-haiku"
end

Global Fallback Configuration

Set fallbacks for all agents:

# config/initializers/ruby_llm_agents.rb
RubyLLM::Agents.configure do |config|
  config.default_fallback_models = ["gpt-4o-mini", "claude-3-haiku"]
end

Per-agent configuration overrides global:

class MyAgent < ApplicationAgent
  model "gpt-4o"
  fallback_models "claude-3-5-sonnet"  # Overrides global
end

Model Compatibility Notes

When using fallbacks across providers, ensure your prompts work with all models:

Schema Support

All fallback models should support your schema:

class MyAgent < ApplicationAgent
  model "gpt-4o"
  fallback_models "claude-3-5-sonnet", "gemini-2.0-flash"
  # All three support JSON mode/structured output

  def schema
    @schema ||= RubyLLM::Schema.create do
      string :result
    end
  end
end

Prompt Compatibility

Avoid provider-specific prompt features:

# Good: Universal prompt
system "You are a helpful assistant."

# Potentially problematic: Provider-specific syntax
system "<|im_start|>system..."  # OpenAI-specific

Feature Differences

Be aware of capability differences:

Feature	GPT-4o	Claude 3.5	Gemini 2.0
JSON mode	Yes	Yes	Yes
Vision	Yes	Yes	Yes
Function calling	Yes	Yes	Yes
Max tokens	128K	200K	2M

Monitoring Fallback Usage

Track how often fallbacks are used:

# Fallback rate this week
total = RubyLLM::Agents::Execution.this_week.count
fallbacks = RubyLLM::Agents::Execution
  .this_week
  .where("chosen_model_id != model_id")
  .count

fallback_rate = fallbacks.to_f / total
puts "Fallback rate: #{(fallback_rate * 100).round(1)}%"

# Breakdown by model
RubyLLM::Agents::Execution
  .this_week
  .where("chosen_model_id != model_id")
  .group(:model_id, :chosen_model_id)
  .count
# => { ["gpt-4o", "claude-3-5-sonnet"] => 45, ... }

Alerting on High Fallback Usage

Monitor fallback patterns using ActiveSupport::Notifications:

# config/initializers/ruby_llm_agents.rb

# Option 1: Track via the dedicated fallback event
ActiveSupport::Notifications.subscribe("ruby_llm_agents.reliability.fallback_used") do |*, payload|
  StatsD.increment("llm.fallback", tags: [
    "agent:#{payload[:agent_type]}",
    "primary:#{payload[:primary_model]}",
    "used:#{payload[:used_model]}"
  ])
end

# Option 2: Alert when all models are exhausted
ActiveSupport::Notifications.subscribe("ruby_llm_agents.reliability.all_models_exhausted") do |*, payload|
  Slack::Notifier.new(ENV['SLACK_WEBHOOK']).ping(
    ":rotating_light: All models exhausted for #{payload[:agent_type]}: #{payload[:models_tried].join(', ')}"
  )
end

# Option 3: Track fallback ratio from execution events
fallback_count = 0
total_count = 0

ActiveSupport::Notifications.subscribe("ruby_llm_agents.execution.complete") do |*, payload|
  total_count += 1
  fallback_count += 1 if payload[:attempts_made].to_i > 1

  if total_count >= 100 && (fallback_count.to_f / total_count) > 0.1
    Slack::Notifier.new(ENV['SLACK_WEBHOOK']).ping(
      "High fallback rate: #{(fallback_count.to_f / total_count * 100).round}%"
    )
    fallback_count = 0
    total_count = 0
  end
end

Best Practices

Order by Priority

# First fallback should be the best alternative
fallback_models "best_alternative", "second_choice", "last_resort"

Consider Cost

# Know the cost implications
model "gpt-4o"           # $0.005/1K input
fallback_models "claude-3-opus"  # $0.015/1K input (more expensive!)

# Better: Fall back to cheaper
fallback_models "gpt-4o-mini"  # $0.00015/1K input

Test All Fallbacks

# In tests, verify each model works
["gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet"].each do |model|
  result = MyAgent.call(query: "test", model: model)
  expect(result.success?).to be true
end

Don't Over-Fallback

# Good: 2-3 fallbacks
fallback_models "alternative1", "alternative2"

# Excessive: Too many
fallback_models "a", "b", "c", "d", "e", "f"
# Wastes time trying failed providers

Related Pages

Reliability - Overview of reliability features
Automatic Retries - Retry configuration
Circuit Breakers - Prevent cascading failures
Agent DSL - Configuration reference

Model Fallbacks

Model Fallbacks

Basic Configuration

How Fallbacks Work

With Retries

Tracking Fallback Usage

Execution Record Details

Fallback Strategies

Cost Optimization

Provider Diversity

Quality Tiers

Speed Priority

Global Fallback Configuration

Model Compatibility Notes

Schema Support

Prompt Compatibility

Feature Differences

Monitoring Fallback Usage

Alerting on High Fallback Usage

Best Practices

Order by Priority

Consider Cost

Test All Fallbacks

Don't Over-Fallback

Related Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally