Scientific benchmarking to disprove the "pitfall theory"
This comprehensive benchmark suite scientifically tests the hypothesis that optimized Rust techniques provide negligible real-world performance benefits over naive implementations (the "pitfall theory").
- Null Hypothesis (Hβ): Optimized Rust techniques provide <20% performance improvement
- Alternative Hypothesis (Hβ): Optimized Rust techniques provide β₯2x improvement in β₯3 categories
- Rust 1.70+ with cargo
- Linux system with performance governor support
- 16GB+ RAM recommended
- sudo access for environment configuration
# 1. Set up the benchmark environment (requires sudo)
./scripts/setup_environment.sh
# 2. Generate test datasets (may take several minutes)
cargo run --bin generate_data
# 3. Run the complete benchmark suite (may take 1-2 hours)
./scripts/run_benchmarks.sh
# 4. View results
open target/criterion/index.html # Criterion HTML reportsrust-benchmark-suite/
βββ Cargo.toml # Dependencies and benchmark configuration
βββ README.md # This file
βββ docs/prd.md # Complete PRD specification
βββ benches/
β βββ baseline/ # Naive implementations (debug builds)
β β βββ io_workloads.rs # File I/O benchmarks
β β βββ parsing_workloads.rs # Text/JSON parsing benchmarks
β β βββ compute_workloads.rs # CPU-intensive benchmarks
β β βββ parallel_workloads.rs# Parallelism benchmarks
β β βββ memory_workloads.rs # Memory allocation benchmarks
β βββ optimized/ # Best-practice implementations (release builds)
β βββ io_workloads.rs # Buffered I/O, streaming
β βββ parsing_workloads.rs # Zero-copy parsing, efficient hashing
β βββ compute_workloads.rs # SIMD + optimized algorithms
β βββ parallel_workloads.rs# Rayon parallelism
β βββ memory_workloads.rs # Pre-allocation, streaming iterators
βββ data/ # Test datasets
β βββ generate_data.rs # Data generation binary
β βββ samples/ # Small datasets for testing
βββ scripts/
β βββ setup_environment.sh # System configuration for benchmarking
β βββ run_benchmarks.sh # Automated benchmark execution
β βββ restore_environment.sh # Cleanup script
βββ results/ # Benchmark outputs
βββ criterion_reports/ # Criterion HTML reports
βββ flamegraphs/ # Performance profiles
βββ memory_profiles/ # Allocation analysis
- Large File Processing: 1GB text file line-by-line processing
- CSV Transformation: 100MB CSV with 1M records, transform and write
Baseline vs Optimized:
- Unbuffered vs buffered I/O (
BufReader/BufWriter) - String concatenation vs pre-allocated buffers
- Expected improvement: 3-5x
- Text Tokenization: 50MB corpus, word frequency analysis
- JSON Processing: 10M JSON records parsing and filtering
Baseline vs Optimized:
.collect()heavy vs streaming iteratorsHashMapvsAHashMap(faster hashing)- String allocations vs zero-copy
&str - Expected improvement: 2-4x
- String Similarity: 1M Jaro-Winkler comparisons
- Numeric Aggregation: 100M f64 values (sum, mean, percentiles)
Baseline vs Optimized:
- Naive loops vs SIMD operations (
widecrate) - Character-by-character vs byte-level processing
- Expected improvement: 4-8x
- Map-Reduce Word Count: Multi-file processing
- Matrix Operations: 1000x1000 matrix multiplication
Baseline vs Optimized:
- Sequential vs Rayon parallel processing
- Cache-unfriendly vs blocked algorithms
- Expected improvement: 4-8x (multi-core scaling)
- Collection Pipelines: Filter/map/reduce operations
- String Building: Large string construction
Baseline vs Optimized:
- Multiple
.collect()vs streaming - No pre-allocation vs
with_capacity() - Expected improvement: 2-10x
- Sample Size: 100+ iterations per benchmark
- Confidence Level: 95% (p < 0.05)
- Effect Size: Cohen's d > 0.8 (large effect)
- Multiple Comparisons: Bonferroni correction applied
- CPU pinned to cores 0-3 (
taskset) - Performance governor enabled
- ASLR disabled for consistent memory layout
- tmpfs used for I/O benchmarks (eliminates disk variance)
- System caches cleared between runs
- β₯3 out of 5 categories show β₯2x improvement
- All improvements statistically significant (p < 0.05)
- Large effect sizes (Cohen's d > 0.8)
- Any single category shows β₯2x improvement with statistical significance
- Debug builds (
cargo build) - Direct unbuffered I/O
- Frequent allocations and
.clone()calls - Index-based loops with bounds checking
- No SIMD, parallelism, or profiling
- Release builds with LTO (
opt-level=3,lto="fat") - Buffered I/O with appropriate buffer sizes
- Pre-allocated memory (
String::with_capacity()) - Iterator chains and zero-copy techniques
- SIMD operations where applicable
- Rayon parallelism for suitable workloads
- Criterion Reports: Open
target/criterion/index.htmlin browser - Flamegraphs: SVG files in
results/run_*/flamegraphs/ - Raw Data: JSON files in
results/run_*/
# Generate statistical analysis (requires Python)
python3 scripts/analyze_results.py
# Create comprehensive report
python3 scripts/generate_report.py- Permission denied: Run
./scripts/setup_environment.shwith sudo access - Out of memory: Reduce dataset sizes in
data/generate_data.rs - Slow execution: Check CPU governor is set to "performance"
- Inconsistent results: Ensure tmpfs is mounted and ASLR is disabled
# Check CPU governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# Check ASLR status (should be 0)
cat /proc/sys/kernel/randomize_va_space
# Check tmpfs mount
mountpoint /tmp/benchmark_data# Restore system settings
./scripts/restore_environment.sh- PRD Specification: See
docs/prd.mdfor complete requirements - Criterion.rs: Statistical benchmarking framework
- Rayon: Data parallelism library
- SIMD:
widecrate for portable SIMD operations
This benchmark suite follows strict scientific methodology. Any modifications must:
- Maintain statistical rigor (100+ samples, proper significance testing)
- Preserve realistic baseline implementations
- Document all optimization techniques used
- Include correctness validation tests
This project is designed for scientific research and performance analysis.
Generated by the Rust Performance Benchmark Suite
Scientifically disproving the pitfall theory through rigorous measurement
