Python Auto-Benchmark Generator

Automatically generate pytest-benchmark tests from real-world usage patterns extracted from your codebase.

Features

Automatic API Discovery: Finds public APIs in your Python package using LibCST
Multi-Source Pattern Extraction: Extracts usage patterns from:
- Test files (test_*.py, *_test.py)
- Example scripts (examples/ directory)
- Execution traces (planned)
- Dependent projects (planned)
Smart Pattern Aggregation: Ranks and deduplicates patterns based on frequency, source, and complexity
Benchmark Generation: Creates pytest-benchmark compatible tests with:
- Performance metrics (time and memory)
- Correctness checks
- Well-formatted, readable code

Installation

# Clone the repository
git clone <repo-url>
cd benchmark-generator

# Create virtual environment and install
uv venv
uv pip install -e .

Quick Start

Basic Usage

# Generate benchmarks for a package
benchmark-gen generate --package mypackage --package-path ./mypackage

# Specify output directory
benchmark-gen generate --package mypackage --package-path ./mypackage --output ./benchmarks

# Run the generated benchmarks
pytest benchmarks --benchmark-only

List Discovered APIs

# See what APIs the tool discovers
benchmark-gen list-apis --package mypackage --package-path ./mypackage

Configuration File

Create a .benchmark-gen.toml file:

[package]
name = "mypackage"
path = "./src/mypackage"

[output]
directory = "./benchmarks"

[sources]
analyze_tests = true
analyze_examples = true
analyze_traces = false
analyze_dependents = false

[generation]
performance_metrics = ["time", "memory"]
correctness_check = true
warmup_rounds = 3
test_iterations = 100

Then run:

benchmark-gen generate --config .benchmark-gen.toml

Example

Given a simple package:

# mypackage/__init__.py
def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b

And a test:

# tests/test_mypackage.py
import mypackage

def test_add():
    result = mypackage.add(2, 3)
    assert result == 5

The tool generates:

# benchmarks/test_benchmark_mypackage.py
import tracemalloc
import pytest
import mypackage

def test_benchmark_add_simple(benchmark):
    """Benchmark for mypackage.add

    Source: test (frequency: 1)
    Extracted from: test_mypackage.py
    Complexity: 0.10
    """
    def run_benchmark():
        tracemalloc.start()
        result = mypackage.add(2, 3)
        _, peak = tracemalloc.get_traced_memory()
        tracemalloc.stop()
        return result, peak

    result, peak_mem = benchmark(run_benchmark)
    benchmark.extra_info['peak_memory_mb'] = peak_mem / 1024 / 1024

    assert result is not None or result is None  # Smoke test

Implementation Status

✅ Phase 1: Core Infrastructure (Complete)

API discovery with LibCST
Configuration management with Pydantic
CLI interface with Typer
Data models

✅ Phase 2: Single-Source Extraction (Complete)

Test analyzer
Example analyzer
Pattern aggregation
Benchmark generation

🚧 Phase 3: Multi-Source Extraction (Planned)

Trace analyzer
Dependent project analyzer (GitHub API)
Enhanced clustering

🚧 Phase 4: Advanced Features (Planned)

Correctness baseline capture
Data abstraction (files, URLs, etc.)
Fixture generation
LLM integration for cold start

🚧 Phase 5: Testing & Polish (Planned)

Comprehensive test suite
Documentation
Examples

Architecture

Configuration Layer (YAML/TOML)
          ↓
API Discovery Module (LibCST parsing)
          ↓
Usage Pattern Extraction (parallel extractors)
  - Test Analyzer
  - Example Analyzer
  - Trace Analyzer (planned)
  - Dependent Project Analyzer (planned)
          ↓
Pattern Aggregation & Ranking
          ↓
Benchmark Generation Engine (Jinja2 templates)

Project Structure

benchmark_generator/
├── __init__.py
├── __main__.py              # CLI entry point
├── cli.py                   # Typer CLI interface
├── config.py                # Configuration models
├── models.py                # Core data models
├── api_discovery.py         # API discovery via LibCST
├── aggregator.py            # Pattern ranking & deduplication
├── generator.py             # Benchmark code generation
├── extractors/              # Usage pattern extractors
│   ├── base.py             # Abstract base class
│   ├── test_analyzer.py    # Extract from test files
│   └── example_analyzer.py # Extract from examples
├── templates/               # Jinja2 templates
│   └── benchmark_test.py.j2
└── utils/                   # Utilities (planned)

Contributing

This is a work in progress. Contributions welcome!

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
benchmark_generator		benchmark_generator
examples		examples
sample_package		sample_package
tests		tests
.benchmark-gen.toml.example		.benchmark-gen.toml.example
.gitignore		.gitignore
ENHANCEMENT_SUMMARY.md		ENHANCEMENT_SUMMARY.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
IMPORT_TRACKING_RESULTS.md		IMPORT_TRACKING_RESULTS.md
LEROBOT_TEST_REPORT.md		LEROBOT_TEST_REPORT.md
LICENSE		LICENSE
README.md		README.md
VALIDATION_REPORT.md		VALIDATION_REPORT.md
VERIFICATION.md		VERIFICATION.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Auto-Benchmark Generator

Features

Installation

Quick Start

Basic Usage

List Discovered APIs

Configuration File

Example

Implementation Status

✅ Phase 1: Core Infrastructure (Complete)

✅ Phase 2: Single-Source Extraction (Complete)

🚧 Phase 3: Multi-Source Extraction (Planned)

🚧 Phase 4: Advanced Features (Planned)

🚧 Phase 5: Testing & Polish (Planned)

Architecture

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

aseembits93/python-benchmark-generator

Folders and files

Latest commit

History

Repository files navigation

Python Auto-Benchmark Generator

Features

Installation

Quick Start

Basic Usage

List Discovered APIs

Configuration File

Example

Implementation Status

✅ Phase 1: Core Infrastructure (Complete)

✅ Phase 2: Single-Source Extraction (Complete)

🚧 Phase 3: Multi-Source Extraction (Planned)

🚧 Phase 4: Advanced Features (Planned)

🚧 Phase 5: Testing & Polish (Planned)

Architecture

Project Structure

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages