A flexible command-line tool for benchmarking CI commands with statistical analysis and multiple output formats.
- Run any shell command multiple times and measure execution time
- Handles failures gracefully and continues benchmarking
- Calculates comprehensive statistics: mean, median, standard deviation, min, max, P90, P95
- Outputs results in multiple formats: console, JSON, CSV, and Markdown
- Tracks success rate and provides detailed error reporting
go build -o ci-benchmarkgo install./ci-benchmark --runs 10 --command "cargo clean && cargo build"./ci-benchmark -n 10 -c "cargo clean && cargo build"./ci-benchmark -n 5 -c "npm test" --output-dir ./results --name npm-test-benchmark| Flag | Shorthand | Required | Description |
|---|---|---|---|
--runs |
-n |
Yes | Number of times to run the benchmark |
--command |
-c |
Yes | Command to benchmark (supports shell features like &&, ` |
--output-dir |
No | Directory to save output files (default: current directory) | |
--name |
No | Benchmark name for reports (default: timestamp) |
The tool generates four types of output:
- Console Output: Real-time progress and formatted summary table
- JSON (
{name}.json): Machine-readable results with full metadata - CSV (
{name}.csv): Spreadsheet-compatible format with individual runs and statistics - Markdown (
{name}.md): Human-readable report with tables
./ci-benchmark -n 10 -c "cargo clean && cargo build"./ci-benchmark -n 5 -c "cargo clean && cargo build --release" --name cargo-release./ci-benchmark -n 20 -c "npm run test" --name npm-tests --output-dir ./benchmark-results./ci-benchmark -n 3 -c "docker-compose down && docker-compose up -d && npm test && docker-compose down"CI Benchmark Tool
=================
Command: cargo clean && cargo build
Runs: 10
Output Directory: .
Starting benchmark...
Run 1/10: ✓ Completed in 45.2s
Run 2/10: ✓ Completed in 43.8s
...
Benchmark Results
=================
Command: cargo clean && cargo build
Total Runs: 10
Successful: 10
Failed: 0
Success Rate: 100.0%
Total Duration: 7m30s
Statistics (successful runs only)
---------------------------------
Metric Value
------ -----
N 10
Mean 45s (45.123s)
Median 44s (44.892s)
Std Dev 2s (1.845s)
Min 43s (43.123s)
Max 48s (48.456s)
P90 47s (47.234s)
P95 48s (48.012s)
{
"config": {
"command": "cargo clean && cargo build",
"runs": 10,
"name": "benchmark_20250113_120000",
"outputDir": "."
},
"summary": {
"totalRuns": 10,
"successful": 10,
"failed": 0,
"successRate": 100,
"startTime": "2025-01-13T12:00:00Z",
"endTime": "2025-01-13T12:07:30Z",
"totalDuration": 450.123
},
"statistics": {
"n": 10,
"mean": 45.123,
"median": 44.892,
"stdDev": 1.845,
"min": 43.123,
"max": 48.456,
"p90": 47.234,
"p95": 48.012
},
"runs": [...]
}0: All runs completed successfully (100% success rate)1: One or more runs failed (< 100% success rate) or an error occurred
The tool continues running even if individual benchmark iterations fail. Failed runs:
- Are excluded from statistical calculations
- Are reported in the summary
- Include error messages in the output files
- Affect the success rate metric
This allows you to benchmark flaky commands and understand their reliability.
- Use
cargo cleanor equivalent cleanup commands as part of your benchmark command for consistent results - Run multiple iterations (
-n 10or more) for reliable statistics - Store results in a dedicated directory for easier tracking:
--output-dir ./benchmark-results - Use meaningful names for easier identification:
--name cargo-clean-build-release - The tool uses
bash -cto execute commands, so all shell features are supported
- N: Number of successful runs (used for statistics)
- Mean: Average execution time
- Median: Middle value when times are sorted (less affected by outliers)
- Std Dev: Standard deviation, measures variability
- Min/Max: Fastest and slowest execution times
- P90: 90th percentile - 90% of runs were faster than this
- P95: 95th percentile - 95% of runs were faster than this
MIT