-
Notifications
You must be signed in to change notification settings - Fork 4
Async Fiber
RubyLLM::Agents supports concurrent execution using Ruby's Fiber scheduler. When used inside an Async block, LLM operations automatically become non-blocking, allowing you to handle many concurrent requests with minimal resources.
Traditional thread-based approaches have limitations:
- LLM operations take 5-60 seconds, spending 99% of time waiting
- Each thread consumes ~1MB of memory
- Thread pool exhaustion under load
Async/Fiber approach:
- Lightweight fibers (~10KB each) instead of threads
- Single database connection shared across fibers
- 100x more concurrent operations with same resources
Add the async gem to your Gemfile:
gem 'async', '~> 2.0'Then bundle install:
bundle installrequire 'async'
Async do
# These run concurrently, not sequentially
results = [
Async { SentimentAgent.call(input: "I love this!") },
Async { SummaryAgent.call(input: "Long text...") },
Async { CategoryAgent.call(input: "Product review") }
].map(&:wait)
endThe RubyLLM::Agents::Async.batch method provides rate-limited concurrent execution:
require 'async'
Async do
results = RubyLLM::Agents::Async.batch([
[SentimentAgent, { input: "Text 1" }],
[SentimentAgent, { input: "Text 2" }],
[SentimentAgent, { input: "Text 3" }]
], max_concurrent: 5)
endAsync do
reviews = ["Great!", "Terrible!", "Okay"]
results = RubyLLM::Agents::Async.each(reviews, max_concurrent: 10) do |review|
SentimentAgent.call(input: review)
end
endAsync do
RubyLLM::Agents::Async.stream(agents_with_params) do |result, agent_class, index|
puts "#{agent_class.name} finished: #{result.content}"
end
endRubyLLM::Agents.configure do |config|
# Maximum concurrent operations for batch processing
config.async_max_concurrency = 20
endRubyLLM uses Net::HTTP which automatically cooperates with Ruby's fiber scheduler. When you wrap agent calls in an Async block:
- Automatic Detection: The gem detects it's running in an async context
- Non-blocking I/O: HTTP requests yield to other fibers while waiting
- Shared Resources: Database connections and HTTP pools are shared efficiently
- Transparent Fallback: Outside async context, everything works synchronously
BEFORE (Threads): AFTER (Fibers):
┌─────────────┐ ┌─────────────┐
│ Thread 1 │ (1MB RAM) │ Event Loop │
│ blocking │ │ (1 thread) │
└─────────────┘ └──────┬──────┘
┌─────────────┐ ┌────┴────┐
│ Thread 2 │ (1MB RAM) ▼ ▼
│ blocking │ ┌─────┐ ┌─────┐
└─────────────┘ │Fiber│ │Fiber│ (~10KB each)
└─────┘ └─────┘
Run multiple agents concurrently using fibers:
Async do
# These run concurrently as fibers, not threads
sentiment_task = Async { SentimentAgent.call(text: "Great product!") }
summary_task = Async { SummaryAgent.call(text: "Great product!") }
category_task = Async { CategoryAgent.call(text: "Great product!") }
sentiment = sentiment_task.wait
summary = summary_task.wait
categories = category_task.wait
endRetry delays are automatically non-blocking in async context:
class ReliableAgent < ApplicationAgent
retries max: 3, backoff: :exponential
# In async context, sleep doesn't block other fibers
endFalcon is an async-native web server:
# Gemfile
gem 'falcon'falcon servePuma works but requires wrapping requests in Async blocks:
# app/controllers/api/agents_controller.rb
def analyze
Async do
result = RubyLLM::Agents::Async.batch(analysis_agents)
render json: result
end.wait
end# Prevent overwhelming API rate limits
RubyLLM::Agents::Async.batch(agents, max_concurrent: 10)# Good: Rate-limited batch processing
Async do
RubyLLM::Agents::Async.batch(
items.map { |item| [ProcessorAgent, { input: item }] },
max_concurrent: 20
)
end
# Avoid: Spawning unlimited concurrent tasks
Async do
items.map { |item| Async { ProcessorAgent.call(input: item) } }.map(&:wait)
endAsync do
results = RubyLLM::Agents::Async.batch(agents) do |result, index|
if result.error?
Rails.logger.error "Agent #{index} failed: #{result.error}"
end
end
end# Check if async gem is loaded
RubyLLM::Agents::Async.available?
# => true
# Check if currently in async context
RubyLLM::Agents::Async.async_context?
# => true (inside Async block)| Feature | Threads (ThreadPool) | Fibers (Async) |
|---|---|---|
| Memory per worker | ~1MB | ~10KB |
| Context switching | Kernel (expensive) | Userspace (cheap) |
| Max concurrent | ~100 practical | ~10,000+ |
| DB connections | 1 per thread | Shared |
| Code changes | None | Wrap in Async do
|
Existing code works unchanged:
# Still works exactly the same (synchronous)
result = MyAgent.call(input: "Hello")
# Opt-in to async
Async do
result = MyAgent.call(input: "Hello") # Now non-blocking
end