CI Benchmarking Tool

A flexible command-line tool for benchmarking CI commands with statistical analysis and multiple output formats.

Features

Run any shell command multiple times and measure execution time
Handles failures gracefully and continues benchmarking
Calculates comprehensive statistics: mean, median, standard deviation, min, max, P90, P95
Outputs results in multiple formats: console, JSON, CSV, and Markdown
Tracks success rate and provides detailed error reporting

Installation

Build from source

go build -o ci-benchmark

Install globally (optional)

go install

Usage

Basic Command

./ci-benchmark --runs 10 --command "cargo clean && cargo build"

Shorthand Flags

./ci-benchmark -n 10 -c "cargo clean && cargo build"

With Custom Output Directory and Name

./ci-benchmark -n 5 -c "npm test" --output-dir ./results --name npm-test-benchmark

Command-Line Options

Flag	Shorthand	Required	Description
`--runs`	`-n`	Yes	Number of times to run the benchmark
`--command`	`-c`	Yes	Command to benchmark (supports shell features like `&&`, `
`--output-dir`		No	Directory to save output files (default: current directory)
`--name`		No	Benchmark name for reports (default: timestamp)

Output Files

The tool generates four types of output:

Console Output: Real-time progress and formatted summary table
JSON ({name}.json): Machine-readable results with full metadata
CSV ({name}.csv): Spreadsheet-compatible format with individual runs and statistics
Markdown ({name}.md): Human-readable report with tables

Examples

Benchmarking Cargo Build

./ci-benchmark -n 10 -c "cargo clean && cargo build"

Benchmarking with Release Mode

./ci-benchmark -n 5 -c "cargo clean && cargo build --release" --name cargo-release

Benchmarking Tests

./ci-benchmark -n 20 -c "npm run test" --name npm-tests --output-dir ./benchmark-results

Complex Shell Commands

./ci-benchmark -n 3 -c "docker-compose down && docker-compose up -d && npm test && docker-compose down"

Output Format

Console Output

CI Benchmark Tool
=================
Command: cargo clean && cargo build
Runs: 10
Output Directory: .

Starting benchmark...

Run 1/10: ✓ Completed in 45.2s
Run 2/10: ✓ Completed in 43.8s
...

Benchmark Results
=================

Command:        cargo clean && cargo build
Total Runs:     10
Successful:     10
Failed:         0
Success Rate:   100.0%
Total Duration: 7m30s

Statistics (successful runs only)
---------------------------------

Metric   Value
------   -----
N        10
Mean     45s (45.123s)
Median   44s (44.892s)
Std Dev  2s (1.845s)
Min      43s (43.123s)
Max      48s (48.456s)
P90      47s (47.234s)
P95      48s (48.012s)

JSON Structure

{
  "config": {
    "command": "cargo clean && cargo build",
    "runs": 10,
    "name": "benchmark_20250113_120000",
    "outputDir": "."
  },
  "summary": {
    "totalRuns": 10,
    "successful": 10,
    "failed": 0,
    "successRate": 100,
    "startTime": "2025-01-13T12:00:00Z",
    "endTime": "2025-01-13T12:07:30Z",
    "totalDuration": 450.123
  },
  "statistics": {
    "n": 10,
    "mean": 45.123,
    "median": 44.892,
    "stdDev": 1.845,
    "min": 43.123,
    "max": 48.456,
    "p90": 47.234,
    "p95": 48.012
  },
  "runs": [...]
}

Exit Codes

0: All runs completed successfully (100% success rate)
1: One or more runs failed (< 100% success rate) or an error occurred

Error Handling

The tool continues running even if individual benchmark iterations fail. Failed runs:

Are excluded from statistical calculations
Are reported in the summary
Include error messages in the output files
Affect the success rate metric

This allows you to benchmark flaky commands and understand their reliability.

Tips

Use cargo clean or equivalent cleanup commands as part of your benchmark command for consistent results
Run multiple iterations (-n 10 or more) for reliable statistics
Store results in a dedicated directory for easier tracking: --output-dir ./benchmark-results
Use meaningful names for easier identification: --name cargo-clean-build-release
The tool uses bash -c to execute commands, so all shell features are supported

Statistics Explained

N: Number of successful runs (used for statistics)
Mean: Average execution time
Median: Middle value when times are sorted (less affected by outliers)
Std Dev: Standard deviation, measures variability
Min/Max: Fastest and slowest execution times
P90: 90th percentile - 90% of runs were faster than this
P95: 95th percentile - 95% of runs were faster than this

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
benchmark		benchmark
docker/ubuntu-2404-go-rust		docker/ubuntu-2404-go-rust
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CI Benchmarking Tool

Features

Installation

Build from source

Install globally (optional)

Usage

Basic Command

Shorthand Flags

With Custom Output Directory and Name

Command-Line Options

Output Files

Examples

Benchmarking Cargo Build

Benchmarking with Release Mode

Benchmarking Tests

Complex Shell Commands

Output Format

Console Output

JSON Structure

Exit Codes

Error Handling

Tips

Statistics Explained

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

attunehq/ci-benchmarking

Folders and files

Latest commit

History

Repository files navigation

CI Benchmarking Tool

Features

Installation

Build from source

Install globally (optional)

Usage

Basic Command

Shorthand Flags

With Custom Output Directory and Name

Command-Line Options

Output Files

Examples

Benchmarking Cargo Build

Benchmarking with Release Mode

Benchmarking Tests

Complex Shell Commands

Output Format

Console Output

JSON Structure

Exit Codes

Error Handling

Tips

Statistics Explained

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages