Skip to content

Caching

adham90 edited this page Feb 16, 2026 · 5 revisions

Caching

Cache LLM responses to reduce costs and latency for repeated requests.

Enabling Caching

Per-Agent

class CachedAgent < ApplicationAgent
  model "gpt-4o"
  cache 1.hour  # Cache responses for 1 hour

  user "{query}"
end

Cache Duration Options

cache 30.minutes
cache 1.hour
cache 6.hours
cache 1.day
cache 1.week

How Cache Invalidation Works

Cache keys are content-based - they're automatically generated from a hash of your prompts and parameters. This means:

  • Automatic invalidation: When you change your system prompt, user prompt, or parameters, the cache key changes automatically
  • No manual version bumping: You don't need to remember to update a version number when changing prompts
  • Reliable: The cache key reflects the actual content being sent to the LLM

To manually clear caches, use Rails cache clearing:

Rails.cache.clear  # Clear all caches

Or use a cache namespace in your configuration for more granular control.

How Caching Works

  1. Cache key is generated from:

    • Agent class name
    • All parameters
    • System prompt
    • User prompt
  2. Before making an API call, the cache is checked

  3. If found, cached response is returned immediately

  4. If not found, API call is made and response is cached

Cache Key Generation

Default Behavior

All parameters are included in the cache key:

class SearchAgent < ApplicationAgent
  cache 1.hour
  param :query, required: true
  param :limit, default: 10
end

# These produce DIFFERENT cache keys
SearchAgent.call(query: "test", limit: 10)
SearchAgent.call(query: "test", limit: 20)

Custom Cache Keys

Override cache_key_data to control what affects caching:

class SearchAgent < ApplicationAgent
  cache 1.hour
  param :query, required: true
  param :limit, default: 10
  param :request_id  # Should NOT affect caching

  def cache_key_data
    # Only query and limit affect the cache key
    { query: query, limit: limit }
    # request_id is excluded
  end
end

# These now use the SAME cache (request_id ignored)
SearchAgent.call(query: "test", limit: 10, request_id: "abc")
SearchAgent.call(query: "test", limit: 10, request_id: "xyz")

Bypassing Cache

Skip Cache for Specific Call

# Force a fresh API call
result = MyAgent.call(query: "test", skip_cache: true)

Check if Result Was Cached

result = MyAgent.call(query: "test")
result.cached?  # => true/false (if available)

Cache Store Configuration

Default (Rails.cache)

# Uses whatever Rails.cache is configured to
config.cache_store = Rails.cache

Memory Store (Development)

config.cache_store = ActiveSupport::Cache::MemoryStore.new(
  size: 64.megabytes
)

Redis (Production)

config.cache_store = ActiveSupport::Cache::RedisCacheStore.new(
  url: ENV['REDIS_URL'],
  namespace: 'llm_agents',
  expires_in: 1.day
)

File Store

config.cache_store = ActiveSupport::Cache::FileStore.new(
  Rails.root.join('tmp', 'llm_cache'),
  expires_in: 1.day
)

Caching Strategies

Static Content

High TTL for stable, factual responses:

class FactAgent < ApplicationAgent
  cache 1.week  # Facts don't change often

  user "Explain: {topic}"
end

User-Specific Content

Include user context in cache key:

class PersonalizedAgent < ApplicationAgent
  cache 1.hour
  param :query, required: true
  param :user_id, required: true

  def cache_key_data
    { query: query, user_id: user_id }
  end
end

Time-Sensitive Content

Short TTL or no caching:

class NewsAgent < ApplicationAgent
  # No caching - always fetch fresh
  param :topic, required: true
end

# Or very short cache
class WeatherAgent < ApplicationAgent
  cache 15.minutes
end

Caching and Streaming

Important: Streaming responses are never cached.

class StreamingAgent < ApplicationAgent
  streaming true
  cache 1.hour  # Ignored when streaming
end

# This will always make an API call
StreamingAgent.call(user: "test") do |chunk|
  print chunk
end

Cache Metrics

Track cache performance:

# In your monitoring/metrics
cache_hits = 0
cache_misses = 0

# Wrap agent calls
result = MyAgent.call(query: query)
if result.cached?
  cache_hits += 1
else
  cache_misses += 1
end

hit_rate = cache_hits.to_f / (cache_hits + cache_misses)

Clearing Cache

Clear All Agent Cache

Rails.cache.delete_matched("ruby_llm_agents/*")

Clear Specific Agent Cache

# Clear all SearchAgent caches
Rails.cache.delete_matched("ruby_llm_agents/SearchAgent/*")

Clear in Development

rails tmp:cache:clear

Best Practices

Cache Deterministic Responses

class ClassifierAgent < ApplicationAgent
  temperature 0.0  # Deterministic
  cache 1.day      # Safe to cache
end

Be Careful with High Temperature

class CreativeAgent < ApplicationAgent
  temperature 1.0  # Non-deterministic
  cache 30.minutes # Short cache or no cache
end

Include Relevant Context in Cache Key

def cache_key_data
  {
    query: query,
    user_locale: locale,      # Different locales = different responses
    model_version: version    # Track model updates
  }
end

Monitor Cache Size

# Redis
redis = Redis.new(url: ENV['REDIS_URL'])
redis.info('memory')['used_memory_human']

# Memory store
Rails.cache.instance_variable_get(:@data).size

Troubleshooting

Cache Not Working

  1. Verify cache is enabled:

    cache 1.hour  # Must be set
  2. Check cache store is configured:

    RubyLLM::Agents.configuration.cache_store
  3. Verify cache key is consistent:

    result = MyAgent.call(query: "test", dry_run: true)
    # Check parameters in output

Stale Responses

Clear cache manually:

Rails.cache.clear

Related Pages

Clone this wiki locally