Skip to content

Best Practices

adham90 edited this page Feb 14, 2026 · 3 revisions

Best Practices

Guidelines for building production-ready LLM agents with RubyLLM::Agents.

Agent Design

1. Use ApplicationAgent as Base Class

Centralize shared configuration:

# app/agents/application_agent.rb
class ApplicationAgent < RubyLLM::Agents::Base
  # Shared defaults
  temperature 0.0

  reliability do
    retries max: 3, backoff: :exponential
    total_timeout 60
  end

  # Common metadata
  def metadata
    {
      request_id: Current.request_id,
      user_id: Current.user&.id
    }
  end
end

2. Type Your Parameters

Catch type errors early:

class MyAgent < ApplicationAgent
  param :query, type: String, required: true
  param :limit, type: Integer, default: 10
  param :filters, type: Hash, default: {}
end

3. Use Structured Output

Ensure predictable responses:

def schema
  @schema ||= RubyLLM::Schema.create do
    string :result, description: "The processed result"
    array :items, of: :string
    boolean :success
  end
end

Reliability

4. Enable Reliability for Production

Don't rely on single requests:

class ProductionAgent < ApplicationAgent
  model "gpt-4o"

  reliability do
    retries max: 3, backoff: :exponential
    fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
    circuit_breaker errors: 10, within: 60, cooldown: 300
    total_timeout 30
  end
end

5. Use the reliability Block

Group related config together:

# Good - organized
reliability do
  retries max: 3, backoff: :exponential
  fallback_models "gpt-4o-mini"
  total_timeout 30
end

# Less clear - scattered
retries max: 3, backoff: :exponential
fallback_models "gpt-4o-mini"
total_timeout 30

Cost Management

6. Set Budgets

Prevent runaway costs:

RubyLLM::Agents.configure do |config|
  config.budgets = {
    global_daily: 100.0,
    global_monthly: 2000.0,
    per_agent_daily: { "ExpensiveAgent" => 50.0 },
    enforcement: :hard
  }
end

7. Cache Expensive Operations

Reduce API calls:

class ExpensiveAgent < ApplicationAgent
  cache_for 2.hours

  # Custom cache key to maximize hits
  def cache_key_data
    { query: query.downcase.strip }
  end
end

8. Use cache_for over cache

Clearer intent, no deprecation warning:

# Good
cache_for 1.hour

# Deprecated
cache 1.hour

Observability

9. Monitor via Dashboard

Track costs, errors, and latency:

# Mount dashboard
mount RubyLLM::Agents::Engine => "/agents"

# Set up authentication
config.dashboard_auth = ->(c) { c.current_user&.admin? }

10. Add Meaningful Metadata

Enable filtering and debugging:

def metadata
  {
    user_id: user_id,
    feature: "search",
    source: source,
    experiment_variant: experiment_variant
  }
end

11. Set Up Alerts

Get notified of issues:

config.on_alert = ->(event, payload) {
  case event
  when :budget_hard_cap
    PagerDuty.trigger(summary: "Budget exceeded")
  when :breaker_open
    Slack::Notifier.new(ENV['SLACK_WEBHOOK']).ping("Circuit breaker opened")
  end
}

Development

12. Test with dry_run

Debug prompts without API calls:

result = MyAgent.call(query: "test", dry_run: true)
puts result.content[:user_prompt]
puts result.content[:system_prompt]

13. Use Generators

Scaffold quickly:

rails generate ruby_llm_agents:agent search query:required limit:10
rails generate ruby_llm_agents:embedder document --dimensions 512

14. Write Agent Tests

Mock LLM responses:

RSpec.describe SearchAgent do
  let(:mock_response) do
    double(content: { results: [] }, input_tokens: 10, output_tokens: 5)
  end

  before do
    allow_any_instance_of(RubyLLM::Chat).to receive(:ask).and_return(mock_response)
  end

  it "returns results" do
    result = described_class.call(query: "test")
    expect(result.content[:results]).to eq([])
  end
end

Security

15. Control Prompt Persistence

Disable for sensitive applications:

config.persist_prompts = false
config.persist_responses = false

16. Use before_call for Content Safety

Implement custom content moderation:

class SafeAgent < ApplicationAgent
  before_call :check_content_safety

  private

  def check_content_safety(context)
    # Use your preferred moderation service
    result = ModerationService.check(context.params[:query])
    raise "Content blocked" if result.flagged?
  end
end

Performance

17. Use Streaming for Long Responses

Better UX for chat interfaces:

class ChatAgent < ApplicationAgent
  streaming true
end

ChatAgent.call(message: msg) do |chunk|
  stream << chunk.content
end

18. Use Appropriate Models

Match model to task:

# Classification - fast, cheap, deterministic
class ClassifierAgent < ApplicationAgent
  model "gpt-4o-mini"
  temperature 0.0
end

# Creative writing - more capable
class WriterAgent < ApplicationAgent
  model "gpt-4o"
  temperature 0.8
end

# Simple extraction - fastest
class ExtractorAgent < ApplicationAgent
  model "gemini-2.0-flash"
  temperature 0.0
end

Multi-Tenancy

19. Isolate Tenant Data

Set up proper tenant resolution:

config.multi_tenancy_enabled = true
config.tenant_resolver = -> { Current.tenant&.id }

20. Set Per-Tenant Budgets

Prevent tenant cost overruns:

RubyLLM::Agents::Tenant.create!(
  tenant_id: "acme_corp",
  name: "Acme Corp",
  daily_limit: 50.0,
  monthly_limit: 500.0,
  enforcement: "hard"
)

Deprecation Handling

21. Address Deprecation Warnings

Update deprecated methods:

# Deprecated
cache 1.hour
result[:key]
result.dig(:a, :b)

# Preferred
cache_for 1.hour
result.content[:key]
result.content.dig(:a, :b)

Silence warnings during migration:

RubyLLM::Agents::Deprecations.silenced = true

Related Pages

Clone this wiki locally