Skip to content

future-agi/agent-opt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

🎯 Agent Opt

Automated Workflow Optimization with State-of-the-Art Algorithms
Built by Future AGI | Docs | Platform


πŸš€ Try it Now

Open In Colab


πŸš€ Overview

agent-opt is a comprehensive Python SDK for optimizing prompts through iterative refinement. Powered by state-of-the-art optimization algorithms and flexible evaluation strategies from our ai-evaluation library, agent-opt helps you discover the best prompts for your LLM workflows automatically.

  • 🧬 Smart Optimization: 6 proven algorithms from random search to genetic evolution
  • πŸ“Š Flexible Evaluation: Heuristic metrics, LLM-as-a-judge, and platform integration
  • ⚑ Easy Integration: Works with any LLM through LiteLLM
  • πŸ”§ Extensible Design: Clean abstractions for custom optimizers and evaluators

🎨 Features

🧬 Multiple Optimization Algorithms

Choose from 6 battle-tested optimization strategies:

Algorithm Best For Key Feature
Random Search Quick baselines Simple random variations
Bayesian Search Few-shot optimization Intelligent hyperparameter tuning with Optuna
ProTeGi Gradient-based refinement Textual gradients for iterative improvement
Meta-Prompt Teacher-driven optimization Uses powerful models to analyze and rewrite
PromptWizard Multi-stage refinement Mutation, critique, and refinement pipeline
GEPA Complex solution spaces Genetic Pareto evolutionary optimization

πŸ“Š Flexible Evaluation

All evaluation backends powered by FutureAGI's ai-evaluation library:

  • βœ… Heuristic Metrics: BLEU, ROUGE, embedding similarity, and more
  • 🧠 LLM-as-a-Judge: Custom criteria with any LLM provider
  • 🎯 FutureAGI Platform: 50+ pre-built evaluation templates
  • πŸ”Œ Custom Metrics: Build your own evaluation logic

πŸ”§ Easy Integration

  • Works with any LLM through LiteLLM (OpenAI, Anthropic, Google, etc.)
  • Simple Python API with sensible defaults
  • Comprehensive logging and progress tracking
  • Clean separation of concerns

πŸ“¦ Installation

pip install agent-opt

Requirements:

  • Python >= 3.10
  • ai-evaluation >= 0.1.9
  • gepa >= 0.0.17
  • litellm >= 1.35.2
  • optuna >= 3.6.1

πŸ§‘β€πŸ’» Quick Start

from fi.opt.generators import LiteLLMGenerator
from fi.opt.optimizers import BayesianSearchOptimizer
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator
from fi.evals.metrics import BLEUScore

# 1. Set up your dataset
dataset = [
    {
        "context": "Paris is the capital of France",
        "question": "What is the capital of France?",
        "answer": "Paris"
    },
    # ... more examples
]

# 2. Configure the evaluator
metric = BLEUScore()
evaluator = Evaluator(metric)

# 3. Set up data mapping
data_mapper = BasicDataMapper(
    key_map={
        "response": "generated_output",
        "expected_response": "answer"
    }
)

# 4. Choose and configure an optimizer
optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    teacher_model_name="gpt-4o",
    n_trials=10
)

# 5. Run optimization
initial_prompt = "Given the context: {context}, answer the question: {question}"
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=[initial_prompt]
)

# 6. Get the best prompt
print(f"Best Score: {result.final_score:.4f}")
print(f"Best Prompt: {result.best_generator.get_prompt_template()}")

πŸ—οΈ Core Components

πŸ€– Generators

Generators execute prompts and return responses. Use LiteLLMGenerator for seamless integration with any LLM provider.

from fi.opt.generators import LiteLLMGenerator

generator = LiteLLMGenerator(
    model="gpt-4o-mini",
    prompt_template="Summarize this text: {text}"
)

πŸ“Š Evaluators

Evaluators score generated outputs using various strategies:

Heuristic Metrics

from fi.opt.base.evaluator import Evaluator
from fi.evals.metrics import BLEUScore

evaluator = Evaluator(metric=BLEUScore())

LLM-as-a-Judge

from fi.evals.llm import LiteLLMProvider
from fi.evals.metrics import CustomLLMJudge

# LLM provider used by the judge
provider = LiteLLMProvider()

# Create custom LLM judge metric
correctness_judge_config = {
    "name": "correctness_judge",
    "grading_criteria": '''You are evaluating an AI's answer to a question.
    The score must be 1.0 if the 'response' is semantically equivalent to the
    'expected_response' (the ground truth). The score should be 0.0 if incorrect.
    Partial credit is acceptable.'''
}

# Instantiate the judge and pass to evaluator
correctness_judge = CustomLLMJudge(
    provider=provider,
    config=correctness_judge_config,
    model="gemini/gemini-2.5-flash",
    temperature=0.4
)
evaluator = Evaluator(metric=correctness_judge)

FutureAGI Platform

Access 50+ pre-built evaluation templates:

evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

πŸ—ΊοΈ Data Mappers

Data mappers transform your data into the format expected by evaluators:

from fi.opt.datamappers import BasicDataMapper

mapper = BasicDataMapper(
    key_map={
        "output": "generated_output",  # Maps generator output
        "input": "question",            # Maps from dataset
        "ground_truth": "answer"        # Maps from dataset
    }
)

βš™οΈ Optimization Algorithms

πŸ” Bayesian Search

Uses Optuna for intelligent hyperparameter optimization of few-shot example selection.

from fi.opt.optimizers import BayesianSearchOptimizer

optimizer = BayesianSearchOptimizer(
    min_examples=2,
    max_examples=8,
    n_trials=20,
    inference_model_name="gpt-4o-mini",
    teacher_model_name="gpt-4o"
)

Best for: Few-shot prompt optimization with automatic example selection


🎯 ProTeGi

Gradient-based prompt optimization that iteratively refines prompts through error analysis.

from fi.opt.optimizers import ProTeGi
from fi.opt.generators import LiteLLMGenerator

teacher = LiteLLMGenerator(
    model="gpt-4o",
    prompt_template="{prompt}"
)
optimizer = ProTeGi(
    teacher_generator=teacher,
    num_gradients=4,
    beam_size=4
)

Best for: Iterative refinement with textual gradients


🧠 Meta-Prompt

Uses a powerful teacher model to analyze performance and rewrite prompts.

from fi.opt.optimizers import MetaPromptOptimizer

optimizer = MetaPromptOptimizer(
    teacher_generator=teacher,
    num_rounds=5
)

Best for: Leveraging powerful models for prompt refinement


🧬 GEPA (Genetic Pareto)

Evolutionary optimization using the GEPA library for complex solution spaces.

from fi.opt.optimizers import GEPAOptimizer

optimizer = GEPAOptimizer(
    reflection_model="gpt-5",
    generator_model="gpt-4o-mini"
)

Best for: Multi-objective optimization with genetic algorithms


πŸͺ„ PromptWizard

Multi-stage optimization with mutation, critique, and refinement.

from fi.opt.optimizers import PromptWizardOptimizer

optimizer = PromptWizardOptimizer(
    teacher_generator=teacher,
    mutate_rounds=3,
    refine_iterations=2
)

Best for: Comprehensive multi-phase optimization pipeline


🎲 Random Search

Simple baseline that tries random prompt variations.

from fi.opt.optimizers import RandomSearchOptimizer

optimizer = RandomSearchOptimizer(
    generator=generator,
    teacher_model="gpt-4o",
    num_variations=5
)

Best for: Quick baselines and sanity checks


πŸ”§ Advanced Usage

🎨 Custom Evaluation Metrics

Create custom heuristic metrics by extending BaseMetric:

from fi.evals.metrics.base_metric import BaseMetric

class CustomMetric(BaseMetric):
    @property
    def metric_name(self):
        return "your_custom_metric"

    def compute_one(self, inputs):
        # Your evaluation logic here
        score = your_scoring_logic(inputs)
        return score

πŸ“ Logging Configuration

from fi.opt.utils import setup_logging
import logging

setup_logging(
    level=logging.INFO,
    log_to_console=True,
    log_to_file=True,
    log_file="optimization.log"
)

πŸ—οΈ Custom Prompt Builders

For complex prompt construction:

def custom_prompt_builder(base_prompt: str, few_shot_examples: List[str]) -> str:
    examples = "\n\n".join(few_shot_examples)
    return f"{base_prompt}\n\nExamples:\n{examples}"

optimizer = BayesianSearchOptimizer(
    prompt_builder=custom_prompt_builder
)

πŸ”‘ Environment Setup

API Keys

Set up your API keys for LLM providers and FutureAGI:

export OPENAI_API_KEY="your_openai_key"
export GEMINI_API_KEY="your_gemini_key"  # If using Gemini
export FI_API_KEY="your_futureagi_key"
export FI_SECRET_KEY="your_futureagi_secret"

Or use a .env file:

OPENAI_API_KEY=your_openai_key
FI_API_KEY=your_futureagi_key
FI_SECRET_KEY=your_futureagi_secret

πŸ“š Examples & Tutorials

🎯 Complete Example: Check out examples/FutureAGI_Agent_Optimizer.ipynb for a comprehensive walkthrough!


πŸ“ Project Structure

src/fi/opt/
β”œβ”€β”€ base/              # Abstract base classes
β”œβ”€β”€ datamappers/       # Data transformation utilities
β”œβ”€β”€ generators/        # LLM generator implementations
β”œβ”€β”€ optimizers/        # Optimization algorithms
β”œβ”€β”€ utils/             # Helper utilities
└── types.py           # Type definitions

πŸ”Œ Related Projects

  • πŸ§ͺ ai-evaluation: Comprehensive LLM evaluation framework with 50+ metrics
  • 🚦 traceAI: Add tracing & observability to your optimized workflows

πŸ—ΊοΈ Roadmap

  • Core Optimization Algorithms
  • ai-evaluation Integration
  • LiteLLM Support
  • Bayesian Optimization
  • ProTeGi & Meta-Prompt
  • GEPA Integration

🀝 Contributing

We welcome contributions! To report issues, suggest features, or contribute improvements:

  1. Open a GitHub issue
  2. Submit a pull request
  3. Join our community discussions

πŸ’¬ Support

For questions and support:

πŸ“§ Email: support@futureagi.com
πŸ“š Documentation: docs.futureagi.com
🌐 Platform: app.futureagi.com


Built with ❀️ by Future AGI

About

Open Source Library for Automated Optimization of AI Agent Workflows

Topics

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages