-
Notifications
You must be signed in to change notification settings - Fork 4
Image Generation
Generate, analyze, edit, and transform images with templates and cost tracking.
This guide covers all image-related capabilities:
- Image Generation - Create images from text prompts
- Image Analysis - Extract captions, tags, objects, and colors from images
- Background Removal - Extract subjects from images
- Image Pipelines - Chain multiple operations into automated workflows
- Image Variations - Create variations of existing images
- Image Editing - Modify images with text instructions
- Image Transformation - Apply style transfers and transformations
- Image Upscaling - Enhance image resolution
Generate images from text prompts with templates and cost tracking.
The ImageGenerator base class provides a DSL for creating image generators with:
- Built-in execution tracking and cost monitoring
- Budget controls (image generation counts toward limits)
- Multi-tenancy support
- Prompt templates for consistent styling
- Caching for repeated prompts
rails generate ruby_llm_agents:image_generator LogoThis creates app/agents/images/logo_generator.rb:
class LogoGenerator < ApplicationImageGenerator
model "gpt-image-1"
size "1024x1024"
quality "standard"
style "vivid"
end# Generate a single image
result = Images::LogoGenerator.call(prompt: "Minimalist tech company logo")
result.url # "https://..."
result.total_cost # 0.04
result.success? # true
# Save to file
result.save("logo.png")
# Generate multiple images
result = Images::LogoGenerator.call(prompt: "App icon variations", count: 4)
result.urls # ["https://...", ...]
result.count # 4
result.save_all("./icons")class ProductImageGenerator < ApplicationImageGenerator
model "gpt-image-1" # OpenAI GPT Image 1
# or
model "dall-e-3" # OpenAI DALL-E 3
# or
model "flux-pro" # Flux Pro
endclass HeroImageGenerator < ApplicationImageGenerator
model "gpt-image-1"
size "1792x1024" # Wide format
quality "hd" # High definition
style "vivid" # Vivid or natural
endAvailable sizes depend on the model:
-
"1024x1024"- Square (default) -
"1792x1024"- Wide landscape -
"1024x1792"- Tall portrait -
"512x512"- Small
Same prompt with same settings always produces similar results, so caching can be effective:
class CachedGenerator < ApplicationImageGenerator
model "gpt-image-1"
cache_for 1.day
endAdd a description for documentation and dashboard display:
class ProductPhotoGenerator < ApplicationImageGenerator
model "gpt-image-1"
description "Generates professional product photography"
endThe result object provides access to images and metadata:
result = Images::MyGenerator.call(prompt: "A sunset over mountains")
# Images
result.image # First image object
result.images # Array of all images
result.url # First image URL
result.urls # All image URLs
result.data # First image base64 data (if available)
result.datas # All base64 data
# Status
result.success? # true if generation succeeded
result.error? # true if failed
result.single? # true if single image
result.batch? # true if multiple images
result.count # Number of images
# Metadata
result.model_id # Model used
result.size # Image size
result.quality # Quality setting
result.style # Style setting
result.revised_prompt # Model-modified prompt (if any)
result.revised_prompts # All revised prompts
# Cost and Timing
result.total_cost # Cost in USD
result.duration_ms # Generation time
result.started_at # Start time
result.completed_at # End time
result.input_tokens # Approximate prompt tokens
# Errors
result.error_class # Error class name (if failed)
result.error_message # Error message (if failed)# Save single image
result.save("output.png")
# Save all images to directory
result.save_all("./outputs", prefix: "generated")
# Creates: generated_1.png, generated_2.png, etc.
# Get binary data
blob = result.to_blob
blobs = result.blobsDefine reusable templates for consistent styling.
class ProductPhotoGenerator < ApplicationImageGenerator
model "gpt-image-1"
template "Professional product photography of {prompt}, " \
"white background, studio lighting, 8k resolution"
end
result = Images::ProductPhotoGenerator.call(prompt: "a red sneaker")
# Actual prompt: "Professional product photography of a red sneaker, ..."# Get a preset template
template = ImageGenerator::Templates.preset(:product)
# => "Professional product photography of {prompt}, white background..."
# Apply a preset
prompt = ImageGenerator::Templates.apply_preset(:portrait, "a CEO")
# => "Professional portrait of a CEO, soft lighting..."
# List all presets
ImageGenerator::Templates.preset_names
# => [:product, :portrait, :landscape, :watercolor, :oil_painting, ...]Photography:
-
:product- Professional product photography -
:portrait- Studio portrait with soft lighting -
:landscape- Dramatic landscape photography
Artistic:
-
:watercolor- Watercolor painting style -
:oil_painting- Classical oil painting -
:digital_art- Modern digital art -
:anime- Anime/Studio Ghibli style
Technical:
-
:isometric- 3D isometric render -
:blueprint- Technical blueprint -
:wireframe- 3D wireframe visualization
Design:
-
:icon- App icon design -
:logo- Minimalist logo design -
:ui_mockup- Modern UI mockup
Generate multiple images in a single call:
result = Images::LogoGenerator.call(prompt: "Tech startup logo", count: 4)
result.count # => 4
result.urls # => ["https://...", "https://...", ...]
# Iterate over images
result.images.each_with_index do |image, idx|
image.save("logo_#{idx + 1}.png")
end
# Save all at once
result.save_all("./logos")Some options are specific to certain providers:
class FluxGenerator < ApplicationImageGenerator
model "flux-pro"
# Provider-specific options
negative_prompt "blurry, low quality, distorted"
seed 12345 # Reproducible generation
guidance_scale 7.5 # CFG scale (1.0-20.0)
steps 50 # Inference steps
endAttach generated images directly to Rails models:
class Product < ApplicationRecord
has_one_attached :hero_image
has_many_attached :gallery_images
end
class ProductImageGenerator < ApplicationImageGenerator
include RubyLLM::Agents::ImageGenerator::ActiveStorageSupport
model "gpt-image-1"
size "1024x1024"
end
# Generate and attach a single image
product = Product.find(1)
result = Images::ProductImageGenerator.generate_and_attach(
prompt: "Professional product photo of a red sneaker",
record: product,
attachment_name: :hero_image
)
# Generate and attach multiple images
result = Images::ProductImageGenerator.generate_and_attach_multiple(
prompt: "Product gallery shots",
record: product,
attachment_name: :gallery_images,
count: 4
)Override class settings at call time:
# Override model
result = Images::LogoGenerator.call(
prompt: "A logo",
model: "dall-e-3"
)
# Override size and quality
result = Images::LogoGenerator.call(
prompt: "A logo",
size: "1792x1024",
quality: "hd"
)
# With tenant for multi-tenancy
result = Images::LogoGenerator.call(
prompt: "A logo",
tenant: current_organization
)
# Skip cache
result = Images::LogoGenerator.call(
prompt: "A logo",
skip_cache: true
)Image generation executions are tracked in the ruby_llm_agents_executions table:
# View image generation executions
RubyLLM::Agents::Execution
.where(execution_type: 'image_generation')
.sum(:total_cost)
# Per-generator stats
RubyLLM::Agents::Execution
.where(agent_type: 'LogoGenerator')
.group(:model_id)
.countImage generation costs count toward tenant and global budgets:
RubyLLM::Agents.configure do |config|
config.budgets = {
global_daily: 25.0, # Includes image generation
global_monthly: 500.0,
enforcement: :hard
}
endFull multi-tenancy support:
# Using resolver
result = Images::LogoGenerator.call(prompt: "A logo")
# Automatically uses Current.tenant if configured
# Explicit tenant
result = Images::LogoGenerator.call(
prompt: "A logo",
tenant: "acme_corp"
)
# Tenant with budget limits
result = Images::LogoGenerator.call(
prompt: "A logo",
tenant: {
id: "acme_corp",
daily_limit: 50.0,
enforcement: :hard
}
)RubyLLM::Agents.configure do |config|
# Default image model
config.default_image_model = "gpt-image-1"
# Default settings
config.default_image_size = "1024x1024"
config.default_image_quality = "standard"
config.default_image_style = "vivid"
# Maximum prompt length
config.max_image_prompt_length = 4000
# Enable/disable tracking
config.track_image_generation = true
# Default cost for unknown models
config.default_image_cost = 0.04
# Model aliases
config.image_model_aliases = {
dalle: "dall-e-3",
gpt_image: "gpt-image-1"
}
# Custom pricing overrides
config.image_model_pricing = {
"custom-model" => 0.05,
"another-model" => {
standard: 0.03,
hd: 0.06
}
}
endPricing is dynamically fetched from LiteLLM and falls back to these estimates:
| Provider | Model | Price per Image | Notes |
|---|---|---|---|
| OpenAI | gpt-image-1 |
$0.04-0.12 | Varies by size/quality |
| OpenAI | dall-e-3 |
$0.04-0.12 | Varies by size/quality |
| OpenAI | dall-e-2 |
$0.016-0.02 | Legacy, smaller sizes |
| Black Forest | flux-pro |
$0.05 | High quality |
| Black Forest | flux-dev |
$0.025 | Development model |
| Black Forest | flux-schnell |
$0.003 | Fast, budget option |
| Stability | sdxl |
$0.04 | Stable Diffusion XL |
| Stability | stable-diffusion-3.5 |
$0.03 | Latest SD version |
imagen-3 |
$0.02 | Google's image model | |
| Ideogram | ideogram-2 |
$0.04 | Text-in-image specialist |
Image generation can be expensive at scale:
| Model | Standard | HD | Large HD |
|---|---|---|---|
| gpt-image-1 / dall-e-3 | $0.04 | $0.08 | $0.12 |
| flux-pro | $0.05 | - | - |
| flux-schnell | $0.003 | - | - |
Tips:
- Use caching for repeated prompts
- Consider
flux-schnellfor drafts/previews - Use smaller sizes during development
- Monitor costs in the dashboard
- Set budget limits to prevent overruns
class ProductImageGenerator < ApplicationImageGenerator
model "gpt-image-1"
size "1024x1024"
quality "hd"
template "Professional product photography of {prompt}, " \
"white background, soft studio lighting, commercial quality, " \
"high resolution, clean and minimalist"
end
result = Images::ProductImageGenerator.call(prompt: "a wireless mouse")
result.save("product_photo.png")class LogoGenerator < ApplicationImageGenerator
model "gpt-image-1"
size "1024x1024"
quality "hd"
style "vivid"
template "Minimalist logo design for {prompt}, " \
"clean lines, professional, vector style, " \
"suitable for business use, modern aesthetic"
end
# Generate 4 variations
result = Images::LogoGenerator.call(
prompt: "a tech startup called 'Nexus AI'",
count: 4
)
result.save_all("./logo_options")
puts "Generated #{result.count} logos for $#{result.total_cost}"class AvatarGenerator < ApplicationImageGenerator
include RubyLLM::Agents::ImageGenerator::ActiveStorageSupport
model "flux-schnell" # Fast and cheap for avatars
size "512x512"
template "Digital avatar portrait of {prompt}, " \
"friendly expression, vibrant colors, " \
"suitable for profile picture"
end
# In a controller
def generate_avatar
result = Images::AvatarGenerator.generate_and_attach(
prompt: params[:description],
record: current_user,
attachment_name: :avatar
)
if result.success?
redirect_to profile_path, notice: "Avatar generated!"
else
redirect_to profile_path, alert: result.error_message
end
endExtract captions, tags, objects, colors, and text from images using vision models.
The ImageAnalyzer base class provides a DSL for creating image analyzers with:
- Caption generation and detailed descriptions
- Tag extraction for image categorization
- Object detection with confidence levels
- Color extraction with percentages
- OCR text extraction
- Built-in execution tracking and cost monitoring
- Multi-tenancy support
- Caching for repeated analyses
rails generate ruby_llm_agents:image_analyzer ProductThis creates app/agents/images/product_analyzer.rb:
class ProductAnalyzer < ApplicationImageAnalyzer
model "gpt-4o"
analysis_type :detailed
end# Analyze an image
result = Images::ProductAnalyzer.call(image: "product.jpg")
result.caption # "A red sneaker on white background"
result.description # Detailed description
result.tags # ["sneaker", "red", "footwear", "product"]
result.success? # true
# Analyze from URL
result = Images::ProductAnalyzer.call(image: "https://example.com/image.jpg")
# With specific analysis types
result = Images::ProductAnalyzer.call(image: "photo.jpg", analysis_type: :all)
result.objects # [{ name: "shoe", confidence: "high" }]
result.colors # [{ hex: "#FF0000", percentage: 30 }]class ProductAnalyzer < ApplicationImageAnalyzer
model "gpt-4o" # OpenAI GPT-4 Vision
# or
model "claude-3-opus" # Anthropic Claude Vision
# or
model "gemini-pro" # Google Gemini Vision
endclass DetailedAnalyzer < ApplicationImageAnalyzer
model "gpt-4o"
analysis_type :detailed # Caption + detailed description
end
class TaggingAnalyzer < ApplicationImageAnalyzer
model "gpt-4o"
analysis_type :tags # Tags only
max_tags 20 # Maximum number of tags
end
class FullAnalyzer < ApplicationImageAnalyzer
model "gpt-4o"
analysis_type :all # Everything: caption, description, tags, objects, colors
extract_colors true # Extract dominant colors
detect_objects true # Detect objects with locations
extract_text true # OCR text extraction
endAvailable analysis types:
-
:caption- Short caption only -
:detailed- Caption + description (default) -
:tags- Tags/keywords only -
:objects- Object detection with confidence -
:colors- Color palette extraction -
:all- All analysis types
class EcommerceAnalyzer < ApplicationImageAnalyzer
model "gpt-4o"
custom_prompt "Describe this product for an e-commerce listing. " \
"Include material, color, style, and key features."
endclass CachedAnalyzer < ApplicationImageAnalyzer
model "gpt-4o"
cache_for 7.days # Cache results for repeated images
endThe result object provides access to analysis data:
result = Images::MyAnalyzer.call(image: "photo.jpg")
# Content
result.caption # Short caption
result.description # Detailed description
result.tags # Array of tags
result.tag_symbols # Tags as symbols [:sunset, :mountains]
result.objects # Detected objects with confidence
result.colors # Extracted colors
result.text # OCR extracted text
# Queries
result.caption? # true if caption present
result.tags? # true if tags present
result.objects? # true if objects detected
result.colors? # true if colors extracted
result.has_tag?("car") # Check for specific tag
result.has_object?("person") # Check for object
# Colors
result.dominant_color # { hex: "#FF0000", percentage: 30 }
# Objects with filtering
result.high_confidence_objects
result.objects_with_confidence("high")
# Status
result.success? # true if analysis succeeded
result.error? # true if failed
# Metadata
result.model_id # Model used
result.analysis_type # Type of analysis
result.duration_ms # Processing time
result.total_cost # Cost in USDclass ProductCatalogAnalyzer < ApplicationImageAnalyzer
model "gpt-4o"
analysis_type :all
extract_colors true
detect_objects true
max_tags 15
custom_prompt "Analyze this product image for e-commerce. " \
"Identify the product type, brand if visible, " \
"colors, materials, and key features."
end
result = Images::ProductCatalogAnalyzer.call(image: "product.jpg")
# Use for categorization
category = determine_category(result.tags)
# Extract primary color for filtering
primary_color = result.dominant_color[:name]
# Build search keywords
keywords = result.tags.join(", ")class ContentModerationAnalyzer < ApplicationImageAnalyzer
model "gpt-4o"
analysis_type :detailed
detect_objects true
custom_prompt "Analyze this image for content moderation. " \
"Identify any inappropriate content, violence, " \
"nudity, or concerning elements. Be specific."
end
result = Images::ContentModerationAnalyzer.call(image: uploaded_file)
if result.has_object?("weapon") || result.has_tag?("violence")
flag_for_review(result)
endExtract subjects from images by removing backgrounds.
The BackgroundRemover base class provides a DSL for creating background removers with:
- Subject extraction with alpha transparency
- Optional alpha matting for fine edges
- Edge refinement for clean cutouts
- Mask output for compositing
- Built-in execution tracking and cost monitoring
- Multi-tenancy support
- Caching for repeated operations
rails generate ruby_llm_agents:background_remover PhotoThis creates app/agents/images/photo_background_remover.rb:
class PhotoBackgroundRemover < ApplicationBackgroundRemover
model "rembg"
output_format :png
end# Remove background
result = Images::PhotoBackgroundRemover.call(image: "photo.jpg")
result.url # URL of foreground image
result.has_alpha? # true (PNG with transparency)
result.success? # true
# Save the result
result.save("foreground.png")
# Get mask if available
if result.mask?
result.save_mask("mask.png")
endclass ProductRemover < ApplicationBackgroundRemover
model "rembg" # Fast, good for general use
# or
model "segment-anything" # Better quality, slower
endclass TransparentRemover < ApplicationBackgroundRemover
model "rembg"
output_format :png # PNG with alpha (default)
end
class WebOptimizedRemover < ApplicationBackgroundRemover
model "rembg"
output_format :webp # WebP with alpha
endclass HighQualityRemover < ApplicationBackgroundRemover
model "segment-anything"
output_format :png
refine_edges true # Smooth edge transitions
alpha_matting true # Fine edge detection
foreground_threshold 0.6 # Foreground sensitivity (0.0-1.0)
background_threshold 0.4 # Background sensitivity (0.0-1.0)
erode_size 2 # Edge erosion for cleaner cuts
endclass MaskRemover < ApplicationBackgroundRemover
model "rembg"
return_mask true # Also return the mask image
end
result = Images::MaskRemover.call(image: "photo.jpg")
result.mask? # true
result.mask_url # URL of mask image
result.save_mask("mask.png") # Save mask separatelyclass CachedRemover < ApplicationBackgroundRemover
model "rembg"
cache_for 30.days # Cache results
endThe result object provides access to extracted images:
result = Images::MyRemover.call(image: "photo.jpg")
# Foreground (subject)
result.foreground # Foreground image object
result.url # Foreground URL
result.data # Base64 data (if available)
result.base64? # true if base64 encoded
# Mask (optional)
result.mask? # true if mask available
result.mask # Mask image object
result.mask_url # Mask URL
result.mask_data # Mask base64 data
# Properties
result.has_alpha? # true for PNG/WebP with transparency
result.output_format # :png or :webp
# File operations
result.save("foreground.png")
result.save_mask("mask.png")
result.to_blob # Binary foreground data
result.mask_blob # Binary mask data
# Status
result.success? # true if removal succeeded
result.error? # true if failed
# Metadata
result.model_id # Model used
result.alpha_matting # Whether alpha matting was used
result.refine_edges # Whether edge refinement was used
result.duration_ms # Processing time
result.total_cost # Cost in USDclass ProductPhotoRemover < ApplicationBackgroundRemover
model "segment-anything"
output_format :png
refine_edges true
alpha_matting true
foreground_threshold 0.55
description "Removes backgrounds from product photos"
end
# In a controller
def remove_background
result = Images::ProductPhotoRemover.call(image: params[:image])
if result.success?
# Attach to product using ActiveStorage
@product.processed_image.attach(
io: StringIO.new(result.to_blob),
filename: "product_transparent.png",
content_type: "image/png"
)
render json: { url: result.url }
else
render json: { error: result.error_message }, status: :unprocessable_entity
end
endclass PortraitRemover < ApplicationBackgroundRemover
model "segment-anything"
output_format :png
alpha_matting true
refine_edges true
return_mask true
description "Extracts portraits for compositing"
end
# Get subject and mask for compositing
result = Images::PortraitRemover.call(image: "portrait.jpg")
if result.success?
# Save foreground with transparency
result.save("subject.png")
# Save mask for further editing
result.save_mask("subject_mask.png") if result.mask?
# Use with image editing software or composite in Ruby
composite_with_background(result.to_blob, "new_background.jpg")
endGenerate variations of existing images while maintaining composition and style.
The ImageVariator base class provides a DSL for creating image variators with:
- Variation generation from source images
- Controllable variation strength
- Multiple variation generation in a single call
- Built-in execution tracking and cost monitoring
- Multi-tenancy support
- Caching for repeated operations
rails generate ruby_llm_agents:image_variator LogoThis creates app/agents/images/logo_variator.rb:
class LogoVariator < ApplicationImageVariator
model "gpt-image-1"
size "1024x1024"
variation_strength 0.5
end# Generate variations
result = Images::LogoVariator.call(image: "logo.png", count: 4)
result.urls # ["https://...", "https://...", ...]
result.count # 4
result.success? # true
# Save all variations
result.save_all("./logo_variations")class ProductVariator < ApplicationImageVariator
model "gpt-image-1"
size "1024x1024"
endControl how different variations should be from the original:
class SubtleVariator < ApplicationImageVariator
model "gpt-image-1"
variation_strength 0.2 # Subtle changes
end
class BoldVariator < ApplicationImageVariator
model "gpt-image-1"
variation_strength 0.8 # More dramatic changes
endclass CachedVariator < ApplicationImageVariator
model "gpt-image-1"
cache_for 1.day
endresult = Images::MyVariator.call(image: "source.png", count: 4)
# Images
result.images # All variation image objects
result.urls # All variation URLs
result.count # Number of variations
# Status
result.success? # true if generation succeeded
result.single? # true if single variation
result.batch? # true if multiple variations
# File operations
result.save("variation.png") # Save first variation
result.save_all("./variations") # Save all variations
# Metadata
result.model_id # Model used
result.total_cost # Cost in USD
result.duration_ms # Processing timeEdit specific regions of images using masks (inpainting/outpainting).
The ImageEditor base class provides a DSL for creating image editors with:
- Mask-based region editing (inpainting)
- Prompt-guided content generation
- Multiple edit generation
- Built-in execution tracking and cost monitoring
- Multi-tenancy support
rails generate ruby_llm_agents:image_editor ProductThis creates app/agents/images/product_editor.rb:
class ProductEditor < ApplicationImageEditor
model "gpt-image-1"
size "1024x1024"
end# Edit an image region
result = Images::ProductEditor.call(
image: "product.png",
mask: "mask.png", # White areas will be edited
prompt: "Replace background with beach scene"
)
result.url # Edited image URL
result.success? # true
# Generate multiple edit options
result = Images::ProductEditor.call(
image: "product.png",
mask: "mask.png",
prompt: "Add sunset sky",
count: 3
)
result.urls # ["https://...", ...]class BackgroundEditor < ApplicationImageEditor
model "gpt-image-1"
size "1024x1024"
endclass CachedEditor < ApplicationImageEditor
model "gpt-image-1"
cache_for 1.hour
endMasks should be:
- Same dimensions as the source image
- PNG format with alpha channel
- White (255) areas indicate regions to edit
- Black (0) areas indicate regions to preserve
result = Images::MyEditor.call(image: "photo.png", mask: "mask.png", prompt: "...")
# Images
result.image # Edited image object
result.images # All edited images (if count > 1)
result.url # First edited image URL
result.urls # All edited image URLs
# Status
result.success? # true if edit succeeded
result.error? # true if failed
# File operations
result.save("edited.png")
result.save_all("./edits")
# Metadata
result.model_id # Model used
result.total_cost # Cost in USD
result.duration_ms # Processing timeApply style transfers and image-to-image transformations.
The ImageTransformer base class provides a DSL for creating transformers with:
- Style transfer from images
- Prompt-guided transformations
- Controllable transformation strength
- Composition preservation
- Built-in execution tracking and cost monitoring
- Multi-tenancy support
rails generate ruby_llm_agents:image_transformer AnimeThis creates app/agents/images/anime_transformer.rb:
class AnimeTransformer < ApplicationImageTransformer
model "sdxl"
strength 0.75
end# Transform an image
result = Images::AnimeTransformer.call(
image: "photo.jpg",
prompt: "anime style portrait"
)
result.url # Transformed image URL
result.success? # true
# Override strength at runtime
result = Images::AnimeTransformer.call(
image: "photo.jpg",
prompt: "anime style portrait",
strength: 0.9 # More dramatic transformation
)class WatercolorTransformer < ApplicationImageTransformer
model "sdxl"
size "1024x1024"
endControl how much the image changes:
class SubtleTransformer < ApplicationImageTransformer
model "sdxl"
strength 0.3 # Subtle style transfer
preserve_composition true
end
class DramaticTransformer < ApplicationImageTransformer
model "sdxl"
strength 0.9 # Dramatic transformation
endclass OilPaintingTransformer < ApplicationImageTransformer
model "sdxl"
strength 0.8
template "oil painting, classical style, museum quality, {prompt}"
endclass PreciseTransformer < ApplicationImageTransformer
model "sdxl"
strength 0.75
negative_prompt "blurry, low quality, distorted"
guidance_scale 7.5 # CFG scale (1.0-20.0)
steps 50 # Inference steps
endclass CachedTransformer < ApplicationImageTransformer
model "sdxl"
cache_for 1.day
endresult = Images::MyTransformer.call(image: "photo.jpg", prompt: "watercolor")
# Images
result.image # Transformed image object
result.images # All transformed images (if count > 1)
result.url # First transformed image URL
result.urls # All transformed image URLs
# Status
result.success? # true if transformation succeeded
result.error? # true if failed
# File operations
result.save("transformed.png")
result.save_all("./transforms")
# Metadata
result.model_id # Model used
result.strength # Transformation strength used
result.total_cost # Cost in USD
result.duration_ms # Processing timeclass ArtTransformer < ApplicationImageTransformer
model "sdxl"
strength 0.85
template "masterpiece painting, {prompt}, detailed brushwork"
negative_prompt "photo, realistic, modern"
end
result = Images::ArtTransformer.call(
image: "landscape.jpg",
prompt: "impressionist landscape at sunset"
)Enhance image resolution using AI upscaling models.
The ImageUpscaler base class provides a DSL for creating upscalers with:
- Resolution enhancement (2x, 4x, 8x)
- Optional face enhancement
- Noise reduction
- Built-in execution tracking and cost monitoring
- Multi-tenancy support
- Caching for repeated operations
rails generate ruby_llm_agents:image_upscaler PhotoThis creates app/agents/images/photo_upscaler.rb:
class PhotoUpscaler < ApplicationImageUpscaler
model "real-esrgan"
scale 4
end# Upscale an image
result = Images::PhotoUpscaler.call(image: "low_res.jpg")
result.url # High resolution image URL
result.output_size # "4096x4096" (if input was 1024x1024)
result.success? # true
# Save the result
result.save("high_res.png")class PhotoUpscaler < ApplicationImageUpscaler
model "real-esrgan" # General purpose, good quality
# or
model "swinir" # Better for natural images
endclass SmallUpscaler < ApplicationImageUpscaler
model "real-esrgan"
scale 2 # 2x upscale
end
class LargeUpscaler < ApplicationImageUpscaler
model "real-esrgan"
scale 8 # 8x upscale (maximum)
endclass PortraitUpscaler < ApplicationImageUpscaler
model "real-esrgan"
scale 4
face_enhance true # Improve facial details
endclass DenoisingUpscaler < ApplicationImageUpscaler
model "real-esrgan"
scale 4
denoise_strength 0.5 # Reduce noise (0.0-1.0)
endclass CachedUpscaler < ApplicationImageUpscaler
model "real-esrgan"
cache_for 7.days
endresult = Images::MyUpscaler.call(image: "photo.jpg")
# Image
result.image # Upscaled image object
result.url # Upscaled image URL
result.data # Base64 data (if available)
# Dimensions
result.input_size # Original size "1024x1024"
result.output_size # Upscaled size "4096x4096"
result.scale_factor # 4
# Status
result.success? # true if upscaling succeeded
result.error? # true if failed
# File operations
result.save("upscaled.png")
result.to_blob # Binary image data
# Metadata
result.model_id # Model used
result.face_enhance # Whether face enhancement was used
result.total_cost # Cost in USD
result.duration_ms # Processing timeclass ProductUpscaler < ApplicationImageUpscaler
model "real-esrgan"
scale 4
description "Upscales product photos for e-commerce"
end
# In a controller
def upscale_image
result = Images::ProductUpscaler.call(image: params[:image])
if result.success?
@product.high_res_image.attach(
io: StringIO.new(result.to_blob),
filename: "product_hd.png",
content_type: "image/png"
)
redirect_to @product, notice: "Image upscaled!"
else
redirect_to @product, alert: result.error_message
end
endclass PortraitUpscaler < ApplicationImageUpscaler
model "real-esrgan"
scale 4
face_enhance true
denoise_strength 0.3
description "Upscales portraits with face enhancement"
end
result = Images::PortraitUpscaler.call(image: "headshot.jpg")
result.save("headshot_hd.png")Chain multiple image operations into automated workflows.
The ImagePipeline base class provides a DSL for creating multi-step image workflows with:
- Sequential execution of image operations
- Conditional step execution
- Aggregated cost tracking
- Unified result access
- Before/after callbacks
- Caching for deterministic pipelines
- Multi-tenancy support
rails generate ruby_llm_agents:image_pipeline Product --steps generate,upscale,analyzeThis creates app/agents/images/product_pipeline.rb:
class ProductPipeline < ApplicationImagePipeline
step :generate, generator: ProductGenerator
step :upscale, upscaler: ProductUpscaler
step :analyze, analyzer: ProductAnalyzer
description "Product image processing pipeline"
end# Run the pipeline
result = Images::ProductPipeline.call(prompt: "Professional laptop photo")
result.success? # true if all steps succeeded
result.final_image # Final processed image URL
result.total_cost # Combined cost of all steps
# Access individual steps
result.step(:generate) # ImageGenerationResult
result.step(:upscale) # ImageUpscaleResult
result.analysis # Shortcut to analyzer result
# Save the final image
result.save("output.png")class MyPipeline < ApplicationImagePipeline
# Generation step (text-to-image)
step :generate, generator: ProductGenerator
# Upscaling step
step :upscale, upscaler: PhotoUpscaler, scale: 2
# Transformation step (img2img)
step :transform, transformer: StyleTransformer, strength: 0.7
# Editing step (inpainting)
step :edit, editor: PhotoEditor
# Variation step
step :vary, variator: ProductVariator
# Analysis step (non-image output)
step :analyze, analyzer: ContentAnalyzer
# Background removal step
step :remove_bg, remover: BackgroundRemover
end| Type | Option Key | Input | Output |
|---|---|---|---|
| Generator | :generator |
Prompt | Image |
| Upscaler | :upscaler |
Image | Image |
| Transformer | :transformer |
Image + Prompt | Image |
| Editor | :editor |
Image + Mask + Prompt | Image |
| Variator | :variator |
Image | Image |
| Analyzer | :analyzer |
Image | Analysis |
| Remover | :remover |
Image | Image |
Execute steps based on context:
class SmartPipeline < ApplicationImagePipeline
step :generate, generator: ProductGenerator
# Only upscale if high_quality option is passed
step :upscale, upscaler: PhotoUpscaler, if: ->(ctx) { ctx[:high_quality] }
# Skip background removal if keep_background is true
step :remove_bg, remover: BackgroundRemover, unless: ->(ctx) { ctx[:keep_background] }
step :analyze, analyzer: ProductAnalyzer
end
# Usage with conditions
result = Images::SmartPipeline.call(
prompt: "Product photo",
high_quality: true, # Triggers upscale step
keep_background: false # Triggers remove_bg step
)Pass options to individual steps:
class CustomPipeline < ApplicationImagePipeline
step :generate, generator: ProductGenerator, size: "1792x1024"
step :upscale, upscaler: PhotoUpscaler, scale: 4
step :transform, transformer: StyleTransformer, strength: 0.8
endRun code before or after the pipeline:
class CallbackPipeline < ApplicationImagePipeline
step :generate, generator: ProductGenerator
step :upscale, upscaler: PhotoUpscaler
# Before callbacks
before_pipeline :validate_inputs
before_pipeline { |ctx| ctx[:started_at] = Time.current }
# After callbacks
after_pipeline :log_completion
after_pipeline { |result| notify_webhook(result) }
private
def validate_inputs
raise ArgumentError, "Prompt required" unless context[:prompt]
end
def log_completion(result)
Rails.logger.info("Pipeline #{self.class.name}: #{result.success?}")
end
def notify_webhook(result)
WebhookService.notify(result.to_h)
end
endclass ResilientPipeline < ApplicationImagePipeline
step :generate, generator: ProductGenerator
step :upscale, upscaler: PhotoUpscaler
step :analyze, analyzer: ProductAnalyzer
# Stop pipeline on first error (default)
stop_on_error true
# Or continue despite errors
# stop_on_error false
end
result = Images::ResilientPipeline.call(prompt: "Test")
if result.partial?
# Some steps succeeded, some failed
puts "Completed #{result.successful_step_count}/#{result.step_count} steps"
endclass CachedPipeline < ApplicationImagePipeline
step :generate, generator: ProductGenerator
step :upscale, upscaler: PhotoUpscaler
cache_for 1.hour
endclass DocumentedPipeline < ApplicationImagePipeline
step :generate, generator: ProductGenerator
description "Generates professional product images"
endThe result object provides access to all step results:
result = Images::MyPipeline.call(prompt: "Test")
# Status
result.success? # true if all steps succeeded
result.error? # true if any step failed
result.partial? # true if some succeeded, some failed
result.completed? # true if pipeline finished
# Steps
result.steps # Array of all step results
result.step(:generate) # Get specific step result
result[:upscale] # Alias for step()
result.step_names # [:generate, :upscale, ...]
result.step_count # Total step count
result.successful_step_count # Steps that succeeded
result.failed_step_count # Steps that failed
# Images
result.final_image # URL/data of last image-producing step
result.url # Final image URL
result.data # Final image base64 data
result.to_blob # Final image binary data
# Shortcut accessors
result.generation # Generator step result
result.upscale # Upscaler step result
result.transform # Transformer step result
result.analysis # Analyzer step result
result.background_removal # Remover step result
# Cost and timing
result.total_cost # Combined cost of all steps
result.duration_ms # Total pipeline duration
result.primary_model_id # Model from first step
# File operations
result.save("output.png") # Save final image
result.save_all("./dir", prefix: "step") # Save all intermediate images
# Serialization
result.to_h # Hash representation
result.to_cache # Cacheable formatclass EcommercePipeline < ApplicationImagePipeline
# Generate professional product photo
step :generate, generator: ProductPhotoGenerator
# Upscale for high resolution
step :upscale, upscaler: PhotoUpscaler, scale: 2
# Remove background for transparent cutout
step :remove_bg, remover: ProductBackgroundRemover
# Analyze for auto-tagging
step :analyze, analyzer: ProductAnalyzer
description "Complete e-commerce product image workflow"
end
result = Images::EcommercePipeline.call(
prompt: "Professional photo of wireless headphones",
tenant: current_store
)
if result.success?
product.hero_image.attach(
io: StringIO.new(result.to_blob),
filename: "product.png",
content_type: "image/png"
)
product.update!(
tags: result.analysis.tags,
description: result.analysis.description
)
endclass ModerationPipeline < ApplicationImagePipeline
# Analyze uploaded content
step :analyze, analyzer: ContentModerationAnalyzer
description "Content safety analysis"
after_pipeline :log_moderation_result
private
def log_moderation_result(result)
if result.analysis&.success?
Rails.logger.info(
"[Moderation] safe=#{result.analysis.safe?}, " \
"tags=#{result.analysis.tags.join(', ')}"
)
end
end
end
result = Images::ModerationPipeline.call(image: uploaded_file.path)
if result.analysis&.safe?
save_to_storage(uploaded_file)
else
queue_for_review(uploaded_file, result.analysis)
endclass MarketingPipeline < ApplicationImagePipeline
step :generate, generator: MarketingImageGenerator, size: "1792x1024"
step :upscale, upscaler: PhotoUpscaler, scale: 2
cache_for 1.day
description "High-quality marketing asset generation"
before_pipeline :validate_prompt
private
def validate_prompt
prompt = context[:prompt]
raise ArgumentError, "Prompt required" if prompt.blank?
raise ArgumentError, "Prompt too short" if prompt.length < 10
end
end
# Generate hero images for campaigns
result = Images::MarketingPipeline.call(
prompt: "Modern tech startup team collaborating in bright office",
tenant: current_organization
)
campaign.hero_image.attach(
io: StringIO.new(result.to_blob),
filename: "hero.png"
)class QualityPipeline < ApplicationImagePipeline
step :generate, generator: ProductGenerator
# Premium tier gets upscaling
step :upscale, upscaler: PhotoUpscaler, scale: 4,
if: ->(ctx) { ctx[:tier] == :premium }
# Enterprise tier gets background removal
step :remove_bg, remover: BackgroundRemover,
if: ->(ctx) { ctx[:tier] == :enterprise }
# Everyone gets analysis
step :analyze, analyzer: ProductAnalyzer
end
# Basic tier - just generate + analyze
result = Images::QualityPipeline.call(prompt: "Product", tier: :basic)
result.step_count # 2
# Premium tier - generate + upscale + analyze
result = Images::QualityPipeline.call(prompt: "Product", tier: :premium)
result.step_count # 3
# Enterprise tier - all steps
result = Images::QualityPipeline.call(prompt: "Product", tier: :enterprise)
result.step_count # 4RubyLLM::Agents.configure do |config|
# Image Generation
config.default_image_model = "gpt-image-1"
config.default_image_size = "1024x1024"
config.track_image_generation = true
# Image Analysis
config.default_analyzer_model = "gpt-4o"
config.default_analysis_type = :detailed
config.default_analyzer_max_tags = 10
# Background Removal
config.default_background_remover_model = "rembg"
config.default_background_output_format = :png
end