Prompt Engineering Lab

A comprehensive Python toolkit for prompt engineering techniques — template management, patterns (CoT, few-shot, role-play), optimization, A/B testing, token counting, and cost estimation.

What This Solves

Prompt iteration loops — Create, test, and compare prompts without manual copy/paste
Cost uncertainty — Estimate tokens and compare provider costs before deployment
Quality drift — A/B testing and optimization to validate prompt changes

Demo

Live demo: https://ct-prompt-lab.streamlit.app

Features

Template Management: Create, store, and chain prompt templates with variable substitution
Prompt Patterns: Apply proven patterns like Chain-of-Thought, Few-Shot, Role-Play, and Self-Refine
Optimization: Improve prompts through random search and mutation strategies
A/B Testing: Compare template effectiveness with statistical significance testing
Token Counting: Estimate token usage across Claude, OpenAI, and Gemini
Cost Calculation: Calculate and compare costs across different AI providers and models
CLI Tool: Command-line interface for quick prompt engineering workflows

Architecture

flowchart LR
    PT[Prompt Templates] --> OPT[Optimizer]
    OPT --> AB[A/B Testing Engine]
    AB --> TC[Token Counter]
    TC --> SC[Safety Checker]
    SC --> VS[Versioning System]
    VS --> SD[Streamlit Demo]

    PT -->|Variables & Chains| OPT
    OPT -->|Mutation & Search| AB
    AB -->|Z-test p<0.05| TC
    TC -->|Claude/OpenAI/Gemini| SC
    SC -->|Injection & PII| VS
    VS -->|Hash + Metadata| SD

Key Metrics

Metric	Value
Tests	190+ passing
Prompt Optimization	Random search + mutation strategies
A/B Testing	Z-test significance at p<0.05, Cohen's d effect size
Token Counting	Multi-provider (Claude, OpenAI, Gemini)
Safety Checking	Injection detection, PII masking, content policy
Version Control	Git-inspired hash + metadata, rollback, changelog

Installation

# Clone the repository
git clone https://github.com/ChunkyTortoise/prompt-engineering-lab.git
cd prompt-engineering-lab

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e ".[dev]"

Quick Start

Python API

from prompt_engineering_lab import PromptTemplate, ChainOfThought, PromptOptimizer

# Create a template
template = PromptTemplate(
    name="summarize",
    template="Summarize the following text in {word_count} words:\n\n{text}"
)

# Format with variables
result = template.format(word_count="50", text="Long article...")

# Apply Chain-of-Thought pattern
cot = ChainOfThought()
enhanced = cot.apply("Explain quantum computing")
# Output: "Let's think step by step.\n\nExplain quantum computing"

# Optimize prompts
optimizer = PromptOptimizer(scorer=lambda x: len(x))
result = optimizer.optimize("Write a blog post", n_iterations=20)
print(f"Best template: {result.best_template}")
print(f"Improvement: {result.improvement_pct}%")

CLI Usage

# List available templates
pel list

# Test a template
pel test summarize "Your text here" -v word_count=50

# Enhance a prompt with patterns
pel enhance "Explain AI" -p cot
pel enhance "Write code" -p role --role "senior developer" --expertise "Python"

# Compare two templates
pel compare "Template A" "Template B" "input text"

# Count tokens
pel count "Your prompt text" -p claude

Module Overview

Template Management (`template.py`)

PromptTemplate: Define reusable templates with variable placeholders
PromptChain: Chain multiple templates for sequential execution
TemplateRegistry: Store and retrieve templates with 4 builtin templates

Patterns (`patterns.py`)

ChainOfThought: Add step-by-step reasoning guidance
FewShotPattern: Include examples in prompts
RolePlayPattern: Add persona and expertise context
SelfRefinePattern: Generate → critique → refine workflow
MetaPromptPattern: Create prompts that generate prompts

Optimization (`optimizer.py`)

PromptOptimizer: Improve prompts through search and mutation
Random search over template candidates
Mutation strategies: word swapping, emphasis, reordering
Track optimization history and improvement metrics

A/B Testing (`ab_tester.py`)

ABTestRunner: Compare two templates statistically
Z-test for significance (p < 0.05 threshold)
Effect size calculation (Cohen's d)
Winner determination with confidence metrics

Token Counting (`token_counter.py`)

TokenCounter: Estimate tokens for Claude, OpenAI, Gemini
Message-level counting with overhead calculation
Provider-specific character-per-token ratios

Cost Calculation (`cost_calculator.py`)

CostCalculator: Estimate API costs across providers
Pricing data for Claude (Opus/Sonnet/Haiku), OpenAI (GPT-4/3.5), Gemini (Pro/Ultra)
Compare providers to find the most cost-effective option

Architecture Decisions

ADR	Title	Status
ADR-0001	Prompt Versioning Strategy	Accepted
ADR-0002	A/B Testing Framework	Accepted
ADR-0003	Safety Checker Design	Accepted

Benchmarks

See BENCHMARKS.md for methodology, evaluation metrics, and reproduction steps.

Development

# Run tests
make test

# Run tests with coverage
make coverage

# Lint code
make lint

# Format code
make format

# Clean build artifacts
make clean

Testing

The project includes 190+ comprehensive tests covering all modules:

Template tests (12): Template formatting, chaining, registry
Pattern tests (14): CoT, few-shot, role-play, self-refine
Optimizer tests (10): Random search, mutation, optimization
A/B testing tests (10): Statistical testing, winner determination
Token counter tests (8): Multi-provider counting, messages
Cost calculator tests (6): Cost estimation, provider comparison
CLI tests (6): Command-line interface functionality

Run tests:

pytest tests/ -v
pytest tests/ --cov=prompt_engineering_lab --cov-report=term-missing

Project Structure

prompt-engineering-lab/
├── prompt_engineering_lab/
│   ├── __init__.py           # Package exports
│   ├── template.py           # Template management
│   ├── patterns.py           # Prompt patterns
│   ├── optimizer.py          # Optimization strategies
│   ├── ab_tester.py          # A/B testing framework
│   ├── token_counter.py      # Token counting
│   ├── cost_calculator.py    # Cost estimation
│   └── cli.py                # Command-line interface
├── tests/
│   ├── conftest.py           # Pytest fixtures
│   ├── test_template.py
│   ├── test_patterns.py
│   ├── test_optimizer.py
│   ├── test_ab_tester.py
│   ├── test_token_counter.py
│   ├── test_cost_calculator.py
│   └── test_cli.py
├── pyproject.toml            # Project configuration
├── Makefile                  # Development commands
├── requirements.txt          # Production dependencies
├── requirements-dev.txt      # Development dependencies
└── README.md                 # This file

Service Mapping

Service 5: Prompt Engineering and System Optimization
Service 6: AI-Powered Personal and Business Automation

Certification Mapping

Vanderbilt Prompt Engineering for ChatGPT
Vanderbilt ChatGPT Personal Automation
IBM Generative AI Engineering with PyTorch, LangChain & Hugging Face
Google Cloud Generative AI Leader Certificate

Examples

Template Chaining

from prompt_engineering_lab import PromptTemplate, PromptChain

# Define chain steps
step1 = PromptTemplate(name="outline", template="Create an outline for: {topic}")
step2 = PromptTemplate(name="expand", template="Expand on this outline: {previous_output}")

# Execute chain
chain = PromptChain(templates=[step1, step2])
results = chain.run({"topic": "Machine Learning"})

A/B Testing

from prompt_engineering_lab import ABTestRunner

# Define scorer (e.g., response quality)
def scorer(template):
    # Your scoring logic here
    return len(template)  # Simplified example

# Run A/B test
runner = ABTestRunner(scorer=scorer)
result = runner.run(
    template_a="Explain {topic} concisely.",
    template_b="Provide a detailed explanation of {topic}.",
    inputs=["AI", "blockchain", "quantum"]
)

print(f"Winner: {result.winner}")
print(f"P-value: {result.p_value}")
print(f"Effect size: {result.effect_size}")

Cost Optimization

from prompt_engineering_lab import TokenCounter, CostCalculator

# Count tokens
counter = TokenCounter()
input_tokens = counter.count("Your prompt here", provider="claude")
output_tokens = 500  # Expected output

# Compare providers
calc = CostCalculator()
estimates = calc.compare_providers(input_tokens, output_tokens)

for est in estimates[:3]:  # Top 3 cheapest
    print(f"{est.provider} {est.model}: ${est.total_cost:.4f}")

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Changelog

See CHANGELOG.md for release history.

Related Projects

EnterpriseHub -- Real estate AI platform with BI dashboards and CRM integration
docqa-engine -- RAG document Q&A with hybrid retrieval and prompt engineering lab
ai-orchestrator -- AgentForge: unified async LLM interface (Claude, Gemini, OpenAI, Perplexity)
Portfolio -- Project showcase and services

License

This project is licensed under the MIT License. See LICENSE file for details.

Author

ChunkyTortoise

Acknowledgments

Inspired by modern prompt engineering research and best practices
Built with Python 3.11+ and Click for CLI functionality
Follows test-driven development with 190+ comprehensive tests

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
.serena		.serena
.streamlit		.streamlit
benchmarks		benchmarks
demo_data		demo_data
docs/adr		docs/adr
prompt_engineering_lab		prompt_engineering_lab
prompt_lab		prompt_lab
tests		tests
.env.example		.env.example
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
app.py		app.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Uh oh!

License

ChunkyTortoise/prompt-engineering-lab

Folders and files

Latest commit

History

Repository files navigation

Prompt Engineering Lab

What This Solves

Demo

Features

Architecture

Key Metrics

Installation

Quick Start

Python API

CLI Usage

Module Overview

Template Management (template.py)

Patterns (patterns.py)

Optimization (optimizer.py)

A/B Testing (ab_tester.py)

Token Counting (token_counter.py)

Cost Calculation (cost_calculator.py)

Architecture Decisions

Benchmarks

Development

Testing

Project Structure

Service Mapping

Certification Mapping

Examples

Template Chaining

A/B Testing

Cost Optimization

Contributing

Changelog

Related Projects

License

Author

Acknowledgments

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Contributors 2

Uh oh!

Languages

Template Management (`template.py`)

Patterns (`patterns.py`)

Optimization (`optimizer.py`)

A/B Testing (`ab_tester.py`)

Token Counting (`token_counter.py`)

Cost Calculation (`cost_calculator.py`)

Packages