Skip to content

Model Fallbacks

adham90 edited this page Feb 20, 2026 · 5 revisions

Model Fallbacks

Automatically try alternative models when your primary model fails.

Basic Configuration

class MyAgent < ApplicationAgent
  model "gpt-4o"
  fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
end

How Fallbacks Work

When the primary model fails (after any retries):

1. Primary: gpt-4o
   └─ Fails after retries

2. Fallback 1: gpt-4o-mini
   └─ Succeeds! Return result

# If fallback 1 also fails:

3. Fallback 2: claude-3-5-sonnet
   └─ Succeeds! Return result

# If all fail:
   └─ Raise error

With Retries

Each model gets its own retry attempts:

class MyAgent < ApplicationAgent
  model "gpt-4o"
  retries max: 2
  fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
end

# Total possible attempts:
# gpt-4o: 3 attempts (1 + 2 retries)
# gpt-4o-mini: 3 attempts
# claude-3-5-sonnet: 3 attempts
# = Up to 9 attempts total

Tracking Fallback Usage

result = MyAgent.call(query: "test")

# Check which model succeeded
result.model_id         # Original model requested
result.chosen_model_id  # Model that actually succeeded
result.used_fallback?   # true if not the primary model

# Example
result.model_id         # => "gpt-4o"
result.chosen_model_id  # => "claude-3-5-sonnet"
result.used_fallback?   # => true

Execution Record Details

execution = RubyLLM::Agents::Execution.last

execution.model_id        # => "gpt-4o"
execution.chosen_model_id # => "claude-3-5-sonnet"

execution.attempts.each do |attempt|
  puts "Model: #{attempt['model_id']}"
  puts "Success: #{attempt['success']}"
  puts "Error: #{attempt['error_class']}" unless attempt['success']
end

Fallback Strategies

Cost Optimization

Start expensive, fall back to cheaper:

class CostOptimizedAgent < ApplicationAgent
  model "gpt-4o"               # Best quality
  fallback_models "gpt-4o-mini" # Cheaper fallback
end

Provider Diversity

Spread across providers for outage resilience:

class MultiProviderAgent < ApplicationAgent
  model "gpt-4o"
  fallback_models "claude-3-5-sonnet", "gemini-2.0-flash"
  # OpenAI → Anthropic → Google
end

Quality Tiers

Progressively lower quality:

class TieredAgent < ApplicationAgent
  model "gpt-4o"
  fallback_models "gpt-4o-mini", "gpt-3.5-turbo"
end

Speed Priority

Fastest models first:

class SpeedFirstAgent < ApplicationAgent
  model "gemini-2.0-flash"
  fallback_models "gpt-4o-mini", "claude-3-haiku"
end

Global Fallback Configuration

Set fallbacks for all agents:

# config/initializers/ruby_llm_agents.rb
RubyLLM::Agents.configure do |config|
  config.default_fallback_models = ["gpt-4o-mini", "claude-3-haiku"]
end

Per-agent configuration overrides global:

class MyAgent < ApplicationAgent
  model "gpt-4o"
  fallback_models "claude-3-5-sonnet"  # Overrides global
end

Model Compatibility Notes

When using fallbacks across providers, ensure your prompts work with all models:

Schema Support

All fallback models should support your schema:

class MyAgent < ApplicationAgent
  model "gpt-4o"
  fallback_models "claude-3-5-sonnet", "gemini-2.0-flash"
  # All three support JSON mode/structured output

  def schema
    @schema ||= RubyLLM::Schema.create do
      string :result
    end
  end
end

Prompt Compatibility

Avoid provider-specific prompt features:

# Good: Universal prompt
system "You are a helpful assistant."

# Potentially problematic: Provider-specific syntax
system "<|im_start|>system..."  # OpenAI-specific

Feature Differences

Be aware of capability differences:

Feature GPT-4o Claude 3.5 Gemini 2.0
JSON mode Yes Yes Yes
Vision Yes Yes Yes
Function calling Yes Yes Yes
Max tokens 128K 200K 2M

Monitoring Fallback Usage

Track how often fallbacks are used:

# Fallback rate this week
total = RubyLLM::Agents::Execution.this_week.count
fallbacks = RubyLLM::Agents::Execution
  .this_week
  .where("chosen_model_id != model_id")
  .count

fallback_rate = fallbacks.to_f / total
puts "Fallback rate: #{(fallback_rate * 100).round(1)}%"

# Breakdown by model
RubyLLM::Agents::Execution
  .this_week
  .where("chosen_model_id != model_id")
  .group(:model_id, :chosen_model_id)
  .count
# => { ["gpt-4o", "claude-3-5-sonnet"] => 45, ... }

Alerting on High Fallback Usage

Monitor fallback patterns using ActiveSupport::Notifications:

# config/initializers/ruby_llm_agents.rb

# Option 1: Track via the dedicated fallback event
ActiveSupport::Notifications.subscribe("ruby_llm_agents.reliability.fallback_used") do |*, payload|
  StatsD.increment("llm.fallback", tags: [
    "agent:#{payload[:agent_type]}",
    "primary:#{payload[:primary_model]}",
    "used:#{payload[:used_model]}"
  ])
end

# Option 2: Alert when all models are exhausted
ActiveSupport::Notifications.subscribe("ruby_llm_agents.reliability.all_models_exhausted") do |*, payload|
  Slack::Notifier.new(ENV['SLACK_WEBHOOK']).ping(
    ":rotating_light: All models exhausted for #{payload[:agent_type]}: #{payload[:models_tried].join(', ')}"
  )
end

# Option 3: Track fallback ratio from execution events
fallback_count = 0
total_count = 0

ActiveSupport::Notifications.subscribe("ruby_llm_agents.execution.complete") do |*, payload|
  total_count += 1
  fallback_count += 1 if payload[:attempts_made].to_i > 1

  if total_count >= 100 && (fallback_count.to_f / total_count) > 0.1
    Slack::Notifier.new(ENV['SLACK_WEBHOOK']).ping(
      "High fallback rate: #{(fallback_count.to_f / total_count * 100).round}%"
    )
    fallback_count = 0
    total_count = 0
  end
end

Best Practices

Order by Priority

# First fallback should be the best alternative
fallback_models "best_alternative", "second_choice", "last_resort"

Consider Cost

# Know the cost implications
model "gpt-4o"           # $0.005/1K input
fallback_models "claude-3-opus"  # $0.015/1K input (more expensive!)

# Better: Fall back to cheaper
fallback_models "gpt-4o-mini"  # $0.00015/1K input

Test All Fallbacks

# In tests, verify each model works
["gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet"].each do |model|
  result = MyAgent.call(query: "test", model: model)
  expect(result.success?).to be true
end

Don't Over-Fallback

# Good: 2-3 fallbacks
fallback_models "alternative1", "alternative2"

# Excessive: Too many
fallback_models "a", "b", "c", "d", "e", "f"
# Wastes time trying failed providers

Related Pages

Clone this wiki locally