π Stop struggling with prompt engineering. Let AI optimize your prompts automatically.
Generate accurate extraction prompts directly from your labeled data. Improve your existing prompts and eliminate manual fine-tuning.
Prompt Optimizer uses a mentor-agent architecture to automatically generate, refine, and optimize prompts for your specific use case. Simply provide your labeled examples, define your output schema, and let the system discover the optimal prompt through iterative learning.
| Feature | Description |
|---|---|
| π― Automatic Prompt Discovery | Don't know where to start? The system generates an initial prompt based on your data |
| π Continuous Improvement | Each iteration learns from mistakes and produces better prompts |
| β‘ Token Efficiency | When accuracy is tied, the shortest (cheapest) prompt wins |
| π Works for Any Domain | From NER to calculations, entity extraction to data transformation |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OPTIMIZATION LOOP β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββ βββββββββββ βββββββββββββ βββββββββββ β
β β Data ββββββΆβ Agent ββββββΆβ Evaluator ββββββΆβ Mentor β β
β β Samples β β Model β β β β Model β β
β βββββββββββ βββββββββββ βββββββββββββ ββββββ¬βββββ β
β β² β β
β β Improved Prompt β β
β ββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Agent Model: Processes input data with the current prompt and extracts structured information
- Evaluator: Compares extractions against ground truth, calculating accuracy and identifying errors
- Mentor Model: Analyzes failed predictions and generates an improved prompt
- Loop: Repeats for a configurable number of iterations until optimal accuracy is achieved
# Clone and install
git clone https://github.com/ademakdogan/prompt-optimizer.git
cd prompt-optimizer
uv sync --all-extras
# Configure
echo "OPENROUTER_API_KEY=your-api-key" > .env
# Run
uv run python -m prompt_optimizer --data resources/test_mapping.json --samples 5# Clone and configure
git clone https://github.com/ademakdogan/prompt-optimizer.git
cd prompt-optimizer
echo "OPENROUTER_API_KEY=your-api-key" > .env
# Build and run
make build
make optimize DATA=resources/test_mapping.json SAMPLES=5 LOOPS=3| Command | Description |
|---|---|
make build |
Build Docker image |
make run |
Run container (shows help) |
make optimize |
Run optimization with defaults |
make test |
Run tests in Docker |
make shell |
Open shell in container |
make clean |
Remove container and image |
make help |
Show all available commands |
Custom optimization:
make optimize DATA=resources/my_data.json SAMPLES=10 LOOPS=5The project includes two example datasets demonstrating different problem types:
This dataset demonstrates entity extraction from unstructured text. The AI must identify and extract specific pieces of information (names, emails, coordinates, etc.) from natural language.
{
"source_text": "Dear Mr. Vandervort, congratulations on turning 68! Your unique ID is 0Zr2bcG1X9Ub. Visit us at 609 Gorczany Pass.",
"target_result": {
"prefix": "Mr.",
"lastname": "Vandervort",
"age": "68",
"username": "0Zr2bcG1X9Ub",
"street": "609 Gorczany Pass"
}
}Characteristics:
- Text is unstructured natural language
- Fields must be recognized and extracted from context
- Values are copied verbatim from source text
- Typical for: NER, PII detection, document parsing, CV/resume extraction
This dataset demonstrates data transformation where the AI must not only extract but also calculate derived values from input fields.
{
"source_text": "\"name\": 'TechSolutions Inc', \"gross\": 1000, \"commission_rate\": 0.1, \"vat\": 180",
"target_result": {
"client_name": "TechSolutions Inc",
"total_gross": 1180,
"total_mid_gross": 1280
}
}Characteristics:
- Input is semi-structured (key-value pairs)
- Fields require mathematical calculations:
total_gross = gross + vattotal_mid_gross = total_gross + (commission_rate Γ gross)
- The AI must learn the formulas from examples
- Typical for: Financial calculations, data transformation, ETL pipelines
Create a JSON file with your labeled examples in the following format:
[
{
"source_text": "Your input text here...",
"target_result": {
"field1": "expected_value1",
"field2": "expected_value2"
}
},
// ... more examples
]Save it to resources/your_dataset.json.
Open src/prompt_optimizer/models/agent_model.py and update the ExtractionSchema class to match your target_result fields:
class ExtractionSchema(BaseModel):
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# π§ USER CUSTOMIZATION SECTION - MODIFY THIS FOR YOUR DATASET
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Define fields matching your target_result keys:
field1: str # Required field
field2: Optional[str] = None # Optional field
field3: Optional[float] = None # Optional numeric fieldTips:
- Use
strfor required fields,Optional[str]for optional ones - Use
floatorintfor numeric values - Add
Field(description="...")to provide hints to the AI
uv run python -m prompt_optimizer \
--data resources/your_dataset.json \
--samples 10 \
--loops 5After optimization completes, find your prompt in:
final_prompt.txt: The best-performing promptmentor_prompts.txt: Full history of all iterations
| Variable | Default | Description |
|---|---|---|
OPENROUTER_API_KEY |
Required | Your OpenRouter API key |
AGENT_MODEL |
openai/gpt-4.1-nano |
Model for data extraction |
MENTOR_MODEL |
openai/gpt-4.1-nano |
Model for prompt improvement |
WINDOW_SIZE |
2 |
History iterations shown to mentor |
LOOP_COUNT |
3 |
Maximum optimization iterations |
LOG_LEVEL |
INFO |
Logging verbosity |
| Parameter | Type | Default | Description |
|---|---|---|---|
--data |
string | resources/test_data.json |
Path to your dataset |
--samples |
int | 5 |
Number of samples to use |
--prompt |
string | None | Initial prompt (auto-generated if not provided) |
--loops |
int | From settings | Number of optimization iterations |
--window-size |
int | From settings | History window for mentor |
--output |
string | None | Save results to JSON file |
--log-level |
string | From settings | DEBUG/INFO/WARNING/ERROR |
from prompt_optimizer.core import PromptOptimizer
from prompt_optimizer.data import load_test_data
# Load your labeled data
data = load_test_data("resources/your_dataset.json", limit=10)
# Create optimizer
optimizer = PromptOptimizer(window_size=2, loop_count=5)
# Option 1: Let the system generate initial prompt
results = optimizer.optimize(data=data)
# Option 2: Improve an existing prompt
my_prompt = """
Extract client information and calculate totals.
Return as JSON with: client_name, total_gross, total_mid_gross
"""
results = optimizer.optimize(data=data, initial_prompt=my_prompt)
# The best prompt is automatically saved to final_prompt.txtAfter each run, you'll see a detailed metrics table:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OPTIMIZATION METRICS β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β Total iterations: 4 β
β Best accuracy: 93.33% (iter 3, ~199 tokens) β
β Final accuracy: 56.67% β
β Accuracy improvement: +18.33% β
β Average accuracy: 56.67% β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β ITERATION DETAILS β
β βββββββββββββ¦βββββββββββββββ¦βββββββββββββββββββββββββββββββββ£
β Iteration β Accuracy β Progress β
β βββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββββββββββββββββ£
β 1 β 38.3% β ββββββββββββββββββββ β
β 2 β 38.3% β ββββββββββββββββββββ β
β 3 β 93.3% β ββββββββββββββββββββ β
β 4 β 56.7% β ββββββββββββββββββββ β
ββββββββββββββ©βββββββββββββββ©βββββββββββββββββββββββββββββββββ
When multiple iterations achieve the same accuracy, the optimizer automatically selects the prompt with the fewest tokens. This ensures:
- Lower API costs
- Faster inference
- Reduced context window usage
prompt-optimizer/
βββ src/prompt_optimizer/
β βββ api/ # OpenRouter client, agent, mentor
β βββ config/ # Settings management
β βββ core/ # Optimizer loop and evaluator
β βββ data/ # Data loading utilities
β βββ models/ # Pydantic schemas (edit agent_model.py!)
β βββ utils/ # Logging, metrics, persistence
βββ resources/ # Example datasets
β βββ test_pii.json # NER/Entity extraction example
β βββ test_mapping.json # Calculation/mapping example
βββ tests/ # Unit, integration, e2e tests
βββ final_prompt.txt # Best prompt from last run
βββ mentor_prompts.txt # Full optimization history
# Run all tests
uv run pytest tests/ -v
# Run with coverage
uv run pytest tests/ --cov=prompt_optimizer --cov-report=htmlAny model on OpenRouter can be used:
| Model | Speed | Quality | Cost |
|---|---|---|---|
openai/gpt-4.1-nano |
Fast | Good | Low |
openai/gpt-4.1-mini |
Medium | Better | Medium |
google/gemini-2.5-flash-lite |
Fast | Good | Low |
anthropic/claude-3-haiku |
Fast | Good | Low |
This project is licensed under the MIT License - see the LICENSE file for details.