Automatically generate pytest-benchmark tests from real-world usage patterns extracted from your codebase.
- Automatic API Discovery: Finds public APIs in your Python package using LibCST
- Multi-Source Pattern Extraction: Extracts usage patterns from:
- Test files (test_*.py, *_test.py)
- Example scripts (examples/ directory)
- Execution traces (planned)
- Dependent projects (planned)
- Smart Pattern Aggregation: Ranks and deduplicates patterns based on frequency, source, and complexity
- Benchmark Generation: Creates pytest-benchmark compatible tests with:
- Performance metrics (time and memory)
- Correctness checks
- Well-formatted, readable code
# Clone the repository
git clone <repo-url>
cd benchmark-generator
# Create virtual environment and install
uv venv
uv pip install -e .# Generate benchmarks for a package
benchmark-gen generate --package mypackage --package-path ./mypackage
# Specify output directory
benchmark-gen generate --package mypackage --package-path ./mypackage --output ./benchmarks
# Run the generated benchmarks
pytest benchmarks --benchmark-only# See what APIs the tool discovers
benchmark-gen list-apis --package mypackage --package-path ./mypackageCreate a .benchmark-gen.toml file:
[package]
name = "mypackage"
path = "./src/mypackage"
[output]
directory = "./benchmarks"
[sources]
analyze_tests = true
analyze_examples = true
analyze_traces = false
analyze_dependents = false
[generation]
performance_metrics = ["time", "memory"]
correctness_check = true
warmup_rounds = 3
test_iterations = 100Then run:
benchmark-gen generate --config .benchmark-gen.tomlGiven a simple package:
# mypackage/__init__.py
def add(a: int, b: int) -> int:
"""Add two numbers."""
return a + bAnd a test:
# tests/test_mypackage.py
import mypackage
def test_add():
result = mypackage.add(2, 3)
assert result == 5The tool generates:
# benchmarks/test_benchmark_mypackage.py
import tracemalloc
import pytest
import mypackage
def test_benchmark_add_simple(benchmark):
"""Benchmark for mypackage.add
Source: test (frequency: 1)
Extracted from: test_mypackage.py
Complexity: 0.10
"""
def run_benchmark():
tracemalloc.start()
result = mypackage.add(2, 3)
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
return result, peak
result, peak_mem = benchmark(run_benchmark)
benchmark.extra_info['peak_memory_mb'] = peak_mem / 1024 / 1024
assert result is not None or result is None # Smoke test- API discovery with LibCST
- Configuration management with Pydantic
- CLI interface with Typer
- Data models
- Test analyzer
- Example analyzer
- Pattern aggregation
- Benchmark generation
- Trace analyzer
- Dependent project analyzer (GitHub API)
- Enhanced clustering
- Correctness baseline capture
- Data abstraction (files, URLs, etc.)
- Fixture generation
- LLM integration for cold start
- Comprehensive test suite
- Documentation
- Examples
Configuration Layer (YAML/TOML)
↓
API Discovery Module (LibCST parsing)
↓
Usage Pattern Extraction (parallel extractors)
- Test Analyzer
- Example Analyzer
- Trace Analyzer (planned)
- Dependent Project Analyzer (planned)
↓
Pattern Aggregation & Ranking
↓
Benchmark Generation Engine (Jinja2 templates)
benchmark_generator/
├── __init__.py
├── __main__.py # CLI entry point
├── cli.py # Typer CLI interface
├── config.py # Configuration models
├── models.py # Core data models
├── api_discovery.py # API discovery via LibCST
├── aggregator.py # Pattern ranking & deduplication
├── generator.py # Benchmark code generation
├── extractors/ # Usage pattern extractors
│ ├── base.py # Abstract base class
│ ├── test_analyzer.py # Extract from test files
│ └── example_analyzer.py # Extract from examples
├── templates/ # Jinja2 templates
│ └── benchmark_test.py.j2
└── utils/ # Utilities (planned)
This is a work in progress. Contributions welcome!
MIT