-
Notifications
You must be signed in to change notification settings - Fork 4
Caching
Cache LLM responses to reduce costs and latency for repeated requests.
class CachedAgent < ApplicationAgent
model "gpt-4o"
cache 1.hour # Cache responses for 1 hour
user "{query}"
endcache 30.minutes
cache 1.hour
cache 6.hours
cache 1.day
cache 1.weekCache keys are content-based - they're automatically generated from a hash of your prompts and parameters. This means:
- Automatic invalidation: When you change your system prompt, user prompt, or parameters, the cache key changes automatically
- No manual version bumping: You don't need to remember to update a version number when changing prompts
- Reliable: The cache key reflects the actual content being sent to the LLM
To manually clear caches, use Rails cache clearing:
Rails.cache.clear # Clear all cachesOr use a cache namespace in your configuration for more granular control.
-
Cache key is generated from:
- Agent class name
- All parameters
- System prompt
- User prompt
-
Before making an API call, the cache is checked
-
If found, cached response is returned immediately
-
If not found, API call is made and response is cached
All parameters are included in the cache key:
class SearchAgent < ApplicationAgent
cache 1.hour
param :query, required: true
param :limit, default: 10
end
# These produce DIFFERENT cache keys
SearchAgent.call(query: "test", limit: 10)
SearchAgent.call(query: "test", limit: 20)Override cache_key_data to control what affects caching:
class SearchAgent < ApplicationAgent
cache 1.hour
param :query, required: true
param :limit, default: 10
param :request_id # Should NOT affect caching
def cache_key_data
# Only query and limit affect the cache key
{ query: query, limit: limit }
# request_id is excluded
end
end
# These now use the SAME cache (request_id ignored)
SearchAgent.call(query: "test", limit: 10, request_id: "abc")
SearchAgent.call(query: "test", limit: 10, request_id: "xyz")# Force a fresh API call
result = MyAgent.call(query: "test", skip_cache: true)result = MyAgent.call(query: "test")
result.cached? # => true/false (if available)# Uses whatever Rails.cache is configured to
config.cache_store = Rails.cacheconfig.cache_store = ActiveSupport::Cache::MemoryStore.new(
size: 64.megabytes
)config.cache_store = ActiveSupport::Cache::RedisCacheStore.new(
url: ENV['REDIS_URL'],
namespace: 'llm_agents',
expires_in: 1.day
)config.cache_store = ActiveSupport::Cache::FileStore.new(
Rails.root.join('tmp', 'llm_cache'),
expires_in: 1.day
)High TTL for stable, factual responses:
class FactAgent < ApplicationAgent
cache 1.week # Facts don't change often
user "Explain: {topic}"
endInclude user context in cache key:
class PersonalizedAgent < ApplicationAgent
cache 1.hour
param :query, required: true
param :user_id, required: true
def cache_key_data
{ query: query, user_id: user_id }
end
endShort TTL or no caching:
class NewsAgent < ApplicationAgent
# No caching - always fetch fresh
param :topic, required: true
end
# Or very short cache
class WeatherAgent < ApplicationAgent
cache 15.minutes
endImportant: Streaming responses are never cached.
class StreamingAgent < ApplicationAgent
streaming true
cache 1.hour # Ignored when streaming
end
# This will always make an API call
StreamingAgent.call(user: "test") do |chunk|
print chunk
endTrack cache performance:
# In your monitoring/metrics
cache_hits = 0
cache_misses = 0
# Wrap agent calls
result = MyAgent.call(query: query)
if result.cached?
cache_hits += 1
else
cache_misses += 1
end
hit_rate = cache_hits.to_f / (cache_hits + cache_misses)Rails.cache.delete_matched("ruby_llm_agents/*")# Clear all SearchAgent caches
Rails.cache.delete_matched("ruby_llm_agents/SearchAgent/*")rails tmp:cache:clearclass ClassifierAgent < ApplicationAgent
temperature 0.0 # Deterministic
cache 1.day # Safe to cache
endclass CreativeAgent < ApplicationAgent
temperature 1.0 # Non-deterministic
cache 30.minutes # Short cache or no cache
enddef cache_key_data
{
query: query,
user_locale: locale, # Different locales = different responses
model_version: version # Track model updates
}
end# Redis
redis = Redis.new(url: ENV['REDIS_URL'])
redis.info('memory')['used_memory_human']
# Memory store
Rails.cache.instance_variable_get(:@data).size-
Verify cache is enabled:
cache 1.hour # Must be set
-
Check cache store is configured:
RubyLLM::Agents.configuration.cache_store
-
Verify cache key is consistent:
result = MyAgent.call(query: "test", dry_run: true) # Check parameters in output
Clear cache manually:
Rails.cache.clear- Agent DSL - Cache configuration
- Configuration - Cache store setup
- Production Deployment - Production caching