A comprehensive Python toolkit for prompt engineering techniques — template management, patterns (CoT, few-shot, role-play), optimization, A/B testing, token counting, and cost estimation.
- Prompt iteration loops — Create, test, and compare prompts without manual copy/paste
- Cost uncertainty — Estimate tokens and compare provider costs before deployment
- Quality drift — A/B testing and optimization to validate prompt changes
Live demo: https://ct-prompt-lab.streamlit.app
- Template Management: Create, store, and chain prompt templates with variable substitution
- Prompt Patterns: Apply proven patterns like Chain-of-Thought, Few-Shot, Role-Play, and Self-Refine
- Optimization: Improve prompts through random search and mutation strategies
- A/B Testing: Compare template effectiveness with statistical significance testing
- Token Counting: Estimate token usage across Claude, OpenAI, and Gemini
- Cost Calculation: Calculate and compare costs across different AI providers and models
- CLI Tool: Command-line interface for quick prompt engineering workflows
flowchart LR
PT[Prompt Templates] --> OPT[Optimizer]
OPT --> AB[A/B Testing Engine]
AB --> TC[Token Counter]
TC --> SC[Safety Checker]
SC --> VS[Versioning System]
VS --> SD[Streamlit Demo]
PT -->|Variables & Chains| OPT
OPT -->|Mutation & Search| AB
AB -->|Z-test p<0.05| TC
TC -->|Claude/OpenAI/Gemini| SC
SC -->|Injection & PII| VS
VS -->|Hash + Metadata| SD
| Metric | Value |
|---|---|
| Tests | 190+ passing |
| Prompt Optimization | Random search + mutation strategies |
| A/B Testing | Z-test significance at p<0.05, Cohen's d effect size |
| Token Counting | Multi-provider (Claude, OpenAI, Gemini) |
| Safety Checking | Injection detection, PII masking, content policy |
| Version Control | Git-inspired hash + metadata, rollback, changelog |
# Clone the repository
git clone https://github.com/ChunkyTortoise/prompt-engineering-lab.git
cd prompt-engineering-lab
# Install dependencies
pip install -r requirements.txt
# Install in development mode
pip install -e ".[dev]"from prompt_engineering_lab import PromptTemplate, ChainOfThought, PromptOptimizer
# Create a template
template = PromptTemplate(
name="summarize",
template="Summarize the following text in {word_count} words:\n\n{text}"
)
# Format with variables
result = template.format(word_count="50", text="Long article...")
# Apply Chain-of-Thought pattern
cot = ChainOfThought()
enhanced = cot.apply("Explain quantum computing")
# Output: "Let's think step by step.\n\nExplain quantum computing"
# Optimize prompts
optimizer = PromptOptimizer(scorer=lambda x: len(x))
result = optimizer.optimize("Write a blog post", n_iterations=20)
print(f"Best template: {result.best_template}")
print(f"Improvement: {result.improvement_pct}%")# List available templates
pel list
# Test a template
pel test summarize "Your text here" -v word_count=50
# Enhance a prompt with patterns
pel enhance "Explain AI" -p cot
pel enhance "Write code" -p role --role "senior developer" --expertise "Python"
# Compare two templates
pel compare "Template A" "Template B" "input text"
# Count tokens
pel count "Your prompt text" -p claudePromptTemplate: Define reusable templates with variable placeholdersPromptChain: Chain multiple templates for sequential executionTemplateRegistry: Store and retrieve templates with 4 builtin templates
ChainOfThought: Add step-by-step reasoning guidanceFewShotPattern: Include examples in promptsRolePlayPattern: Add persona and expertise contextSelfRefinePattern: Generate → critique → refine workflowMetaPromptPattern: Create prompts that generate prompts
PromptOptimizer: Improve prompts through search and mutation- Random search over template candidates
- Mutation strategies: word swapping, emphasis, reordering
- Track optimization history and improvement metrics
ABTestRunner: Compare two templates statistically- Z-test for significance (p < 0.05 threshold)
- Effect size calculation (Cohen's d)
- Winner determination with confidence metrics
TokenCounter: Estimate tokens for Claude, OpenAI, Gemini- Message-level counting with overhead calculation
- Provider-specific character-per-token ratios
CostCalculator: Estimate API costs across providers- Pricing data for Claude (Opus/Sonnet/Haiku), OpenAI (GPT-4/3.5), Gemini (Pro/Ultra)
- Compare providers to find the most cost-effective option
| ADR | Title | Status |
|---|---|---|
| ADR-0001 | Prompt Versioning Strategy | Accepted |
| ADR-0002 | A/B Testing Framework | Accepted |
| ADR-0003 | Safety Checker Design | Accepted |
See BENCHMARKS.md for methodology, evaluation metrics, and reproduction steps.
# Run tests
make test
# Run tests with coverage
make coverage
# Lint code
make lint
# Format code
make format
# Clean build artifacts
make cleanThe project includes 190+ comprehensive tests covering all modules:
- Template tests (12): Template formatting, chaining, registry
- Pattern tests (14): CoT, few-shot, role-play, self-refine
- Optimizer tests (10): Random search, mutation, optimization
- A/B testing tests (10): Statistical testing, winner determination
- Token counter tests (8): Multi-provider counting, messages
- Cost calculator tests (6): Cost estimation, provider comparison
- CLI tests (6): Command-line interface functionality
Run tests:
pytest tests/ -v
pytest tests/ --cov=prompt_engineering_lab --cov-report=term-missingprompt-engineering-lab/
├── prompt_engineering_lab/
│ ├── __init__.py # Package exports
│ ├── template.py # Template management
│ ├── patterns.py # Prompt patterns
│ ├── optimizer.py # Optimization strategies
│ ├── ab_tester.py # A/B testing framework
│ ├── token_counter.py # Token counting
│ ├── cost_calculator.py # Cost estimation
│ └── cli.py # Command-line interface
├── tests/
│ ├── conftest.py # Pytest fixtures
│ ├── test_template.py
│ ├── test_patterns.py
│ ├── test_optimizer.py
│ ├── test_ab_tester.py
│ ├── test_token_counter.py
│ ├── test_cost_calculator.py
│ └── test_cli.py
├── pyproject.toml # Project configuration
├── Makefile # Development commands
├── requirements.txt # Production dependencies
├── requirements-dev.txt # Development dependencies
└── README.md # This file
- Service 5: Prompt Engineering and System Optimization
- Service 6: AI-Powered Personal and Business Automation
- Vanderbilt Prompt Engineering for ChatGPT
- Vanderbilt ChatGPT Personal Automation
- IBM Generative AI Engineering with PyTorch, LangChain & Hugging Face
- Google Cloud Generative AI Leader Certificate
from prompt_engineering_lab import PromptTemplate, PromptChain
# Define chain steps
step1 = PromptTemplate(name="outline", template="Create an outline for: {topic}")
step2 = PromptTemplate(name="expand", template="Expand on this outline: {previous_output}")
# Execute chain
chain = PromptChain(templates=[step1, step2])
results = chain.run({"topic": "Machine Learning"})from prompt_engineering_lab import ABTestRunner
# Define scorer (e.g., response quality)
def scorer(template):
# Your scoring logic here
return len(template) # Simplified example
# Run A/B test
runner = ABTestRunner(scorer=scorer)
result = runner.run(
template_a="Explain {topic} concisely.",
template_b="Provide a detailed explanation of {topic}.",
inputs=["AI", "blockchain", "quantum"]
)
print(f"Winner: {result.winner}")
print(f"P-value: {result.p_value}")
print(f"Effect size: {result.effect_size}")from prompt_engineering_lab import TokenCounter, CostCalculator
# Count tokens
counter = TokenCounter()
input_tokens = counter.count("Your prompt here", provider="claude")
output_tokens = 500 # Expected output
# Compare providers
calc = CostCalculator()
estimates = calc.compare_providers(input_tokens, output_tokens)
for est in estimates[:3]: # Top 3 cheapest
print(f"{est.provider} {est.model}: ${est.total_cost:.4f}")Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CHANGELOG.md for release history.
- EnterpriseHub -- Real estate AI platform with BI dashboards and CRM integration
- docqa-engine -- RAG document Q&A with hybrid retrieval and prompt engineering lab
- ai-orchestrator -- AgentForge: unified async LLM interface (Claude, Gemini, OpenAI, Perplexity)
- Portfolio -- Project showcase and services
This project is licensed under the MIT License. See LICENSE file for details.
ChunkyTortoise
- Inspired by modern prompt engineering research and best practices
- Built with Python 3.11+ and Click for CLI functionality
- Follows test-driven development with 190+ comprehensive tests