Best Practices

Guidelines for building production-ready LLM agents with RubyLLM::Agents.

Agent Design

1. Use ApplicationAgent as Base Class

Centralize shared configuration:

# app/agents/application_agent.rb
class ApplicationAgent < RubyLLM::Agents::Base
  # Shared defaults
  temperature 0.0

  reliability do
    retries max: 3, backoff: :exponential
    total_timeout 60
  end

  # Common metadata
  def metadata
    {
      request_id: Current.request_id,
      user_id: Current.user&.id
    }
  end
end

2. Type Your Parameters

Catch type errors early:

class MyAgent < ApplicationAgent
  param :query, type: String, required: true
  param :limit, type: Integer, default: 10
  param :filters, type: Hash, default: {}
end

3. Use Structured Output

Ensure predictable responses:

def schema
  @schema ||= RubyLLM::Schema.create do
    string :result, description: "The processed result"
    array :items, of: :string
    boolean :success
  end
end

Reliability

4. Enable Reliability for Production

Don't rely on single requests:

class ProductionAgent < ApplicationAgent
  model "gpt-4o"

  reliability do
    retries max: 3, backoff: :exponential
    fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
    circuit_breaker errors: 10, within: 60, cooldown: 300
    total_timeout 30
  end
end

5. Use the reliability Block

Group related config together:

# Good - organized
reliability do
  retries max: 3, backoff: :exponential
  fallback_models "gpt-4o-mini"
  total_timeout 30
end

# Less clear - scattered
retries max: 3, backoff: :exponential
fallback_models "gpt-4o-mini"
total_timeout 30

Cost Management

6. Set Budgets

Prevent runaway costs:

RubyLLM::Agents.configure do |config|
  config.budgets = {
    global_daily: 100.0,
    global_monthly: 2000.0,
    per_agent_daily: { "ExpensiveAgent" => 50.0 },
    enforcement: :hard
  }
end

7. Cache Expensive Operations

Reduce API calls:

class ExpensiveAgent < ApplicationAgent
  cache_for 2.hours

  # Custom cache key to maximize hits
  def cache_key_data
    { query: query.downcase.strip }
  end
end

8. Use cache_for over cache

Clearer intent, no deprecation warning:

# Good
cache_for 1.hour

# Deprecated
cache 1.hour

Observability

9. Monitor via Dashboard

Track costs, errors, and latency:

# Mount dashboard
mount RubyLLM::Agents::Engine => "/agents"

# Set up authentication
config.dashboard_auth = ->(c) { c.current_user&.admin? }

10. Add Meaningful Metadata

Enable filtering and debugging:

def metadata
  {
    user_id: user_id,
    feature: "search",
    source: source,
    experiment_variant: experiment_variant
  }
end

11. Set Up Alerts

Get notified of issues:

config.on_alert = ->(event, payload) {
  case event
  when :budget_hard_cap
    PagerDuty.trigger(summary: "Budget exceeded")
  when :breaker_open
    Slack::Notifier.new(ENV['SLACK_WEBHOOK']).ping("Circuit breaker opened")
  end
}

Development

12. Test with dry_run

Debug prompts without API calls:

result = MyAgent.call(query: "test", dry_run: true)
puts result.content[:user_prompt]
puts result.content[:system_prompt]

13. Use Generators

Scaffold quickly:

rails generate ruby_llm_agents:agent search query:required limit:10
rails generate ruby_llm_agents:embedder document --dimensions 512

14. Write Agent Tests

Mock LLM responses:

RSpec.describe SearchAgent do
  let(:mock_response) do
    double(content: { results: [] }, input_tokens: 10, output_tokens: 5)
  end

  before do
    allow_any_instance_of(RubyLLM::Chat).to receive(:ask).and_return(mock_response)
  end

  it "returns results" do
    result = described_class.call(query: "test")
    expect(result.content[:results]).to eq([])
  end
end

Security

15. Control Prompt Persistence

Disable for sensitive applications:

config.persist_prompts = false
config.persist_responses = false

16. Use before_call for Content Safety

Implement custom content moderation:

class SafeAgent < ApplicationAgent
  before_call :check_content_safety

  private

  def check_content_safety(context)
    # Use your preferred moderation service
    result = ModerationService.check(context.params[:query])
    raise "Content blocked" if result.flagged?
  end
end

Performance

17. Use Streaming for Long Responses

Better UX for chat interfaces:

class ChatAgent < ApplicationAgent
  streaming true
end

ChatAgent.call(message: msg) do |chunk|
  stream << chunk.content
end

18. Use Appropriate Models

Match model to task:

# Classification - fast, cheap, deterministic
class ClassifierAgent < ApplicationAgent
  model "gpt-4o-mini"
  temperature 0.0
end

# Creative writing - more capable
class WriterAgent < ApplicationAgent
  model "gpt-4o"
  temperature 0.8
end

# Simple extraction - fastest
class ExtractorAgent < ApplicationAgent
  model "gemini-2.0-flash"
  temperature 0.0
end

Multi-Tenancy

19. Isolate Tenant Data

Set up proper tenant resolution:

config.multi_tenancy_enabled = true
config.tenant_resolver = -> { Current.tenant&.id }

20. Set Per-Tenant Budgets

Prevent tenant cost overruns:

RubyLLM::Agents::Tenant.create!(
  tenant_id: "acme_corp",
  name: "Acme Corp",
  daily_limit: 50.0,
  monthly_limit: 500.0,
  enforcement: "hard"
)

Deprecation Handling

21. Address Deprecation Warnings

Update deprecated methods:

# Deprecated
cache 1.hour
result[:key]
result.dig(:a, :b)

# Preferred
cache_for 1.hour
result.content[:key]
result.content.dig(:a, :b)

Silence warnings during migration:

RubyLLM::Agents::Deprecations.silenced = true

Related Pages

Testing Agents - Testing patterns
Production Deployment - Deployment guide
Error Handling - Error recovery
Configuration - All settings

Best Practices

Best Practices

Agent Design

1. Use ApplicationAgent as Base Class

2. Type Your Parameters

3. Use Structured Output

Reliability

4. Enable Reliability for Production

5. Use the reliability Block

Cost Management

6. Set Budgets

7. Cache Expensive Operations

8. Use cache_for over cache

Observability

9. Monitor via Dashboard

10. Add Meaningful Metadata

11. Set Up Alerts

Development

12. Test with dry_run

13. Use Generators

14. Write Agent Tests

Security

15. Control Prompt Persistence

16. Use before_call for Content Safety

Performance

17. Use Streaming for Long Responses

18. Use Appropriate Models

Multi-Tenancy

19. Isolate Tenant Data

20. Set Per-Tenant Budgets

Deprecation Handling

21. Address Deprecation Warnings

Related Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally