Skip to content

copyleftdev/pitfall_or_rtfm

Repository files navigation

Rust Performance Benchmark Suite

Rust Performance Benchmark Suite Logo

Scientific benchmarking to disprove the "pitfall theory"

🎯 Objective

This comprehensive benchmark suite scientifically tests the hypothesis that optimized Rust techniques provide negligible real-world performance benefits over naive implementations (the "pitfall theory").

Hypothesis Test

  • Null Hypothesis (Hβ‚€): Optimized Rust techniques provide <20% performance improvement
  • Alternative Hypothesis (H₁): Optimized Rust techniques provide β‰₯2x improvement in β‰₯3 categories

πŸš€ Quick Start

Prerequisites

  • Rust 1.70+ with cargo
  • Linux system with performance governor support
  • 16GB+ RAM recommended
  • sudo access for environment configuration

Setup & Execution

# 1. Set up the benchmark environment (requires sudo)
./scripts/setup_environment.sh

# 2. Generate test datasets (may take several minutes)
cargo run --bin generate_data

# 3. Run the complete benchmark suite (may take 1-2 hours)
./scripts/run_benchmarks.sh

# 4. View results
open target/criterion/index.html  # Criterion HTML reports

πŸ“ Project Structure

rust-benchmark-suite/
β”œβ”€β”€ Cargo.toml                    # Dependencies and benchmark configuration
β”œβ”€β”€ README.md                     # This file
β”œβ”€β”€ docs/prd.md                   # Complete PRD specification
β”œβ”€β”€ benches/
β”‚   β”œβ”€β”€ baseline/                 # Naive implementations (debug builds)
β”‚   β”‚   β”œβ”€β”€ io_workloads.rs      # File I/O benchmarks
β”‚   β”‚   β”œβ”€β”€ parsing_workloads.rs # Text/JSON parsing benchmarks
β”‚   β”‚   β”œβ”€β”€ compute_workloads.rs # CPU-intensive benchmarks
β”‚   β”‚   β”œβ”€β”€ parallel_workloads.rs# Parallelism benchmarks
β”‚   β”‚   └── memory_workloads.rs  # Memory allocation benchmarks
β”‚   └── optimized/               # Best-practice implementations (release builds)
β”‚       β”œβ”€β”€ io_workloads.rs      # Buffered I/O, streaming
β”‚       β”œβ”€β”€ parsing_workloads.rs # Zero-copy parsing, efficient hashing
β”‚       β”œβ”€β”€ compute_workloads.rs # SIMD + optimized algorithms
β”‚       β”œβ”€β”€ parallel_workloads.rs# Rayon parallelism
β”‚       └── memory_workloads.rs  # Pre-allocation, streaming iterators
β”œβ”€β”€ data/                        # Test datasets
β”‚   β”œβ”€β”€ generate_data.rs         # Data generation binary
β”‚   └── samples/                 # Small datasets for testing
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ setup_environment.sh     # System configuration for benchmarking
β”‚   β”œβ”€β”€ run_benchmarks.sh        # Automated benchmark execution
β”‚   └── restore_environment.sh   # Cleanup script
└── results/                     # Benchmark outputs
    β”œβ”€β”€ criterion_reports/       # Criterion HTML reports
    β”œβ”€β”€ flamegraphs/            # Performance profiles
    └── memory_profiles/        # Allocation analysis

πŸ§ͺ Benchmark Categories

1. I/O Bound Workloads

  • Large File Processing: 1GB text file line-by-line processing
  • CSV Transformation: 100MB CSV with 1M records, transform and write

Baseline vs Optimized:

  • Unbuffered vs buffered I/O (BufReader/BufWriter)
  • String concatenation vs pre-allocated buffers
  • Expected improvement: 3-5x

2. Parsing Workloads

  • Text Tokenization: 50MB corpus, word frequency analysis
  • JSON Processing: 10M JSON records parsing and filtering

Baseline vs Optimized:

  • .collect() heavy vs streaming iterators
  • HashMap vs AHashMap (faster hashing)
  • String allocations vs zero-copy &str
  • Expected improvement: 2-4x

3. Computational Workloads

  • String Similarity: 1M Jaro-Winkler comparisons
  • Numeric Aggregation: 100M f64 values (sum, mean, percentiles)

Baseline vs Optimized:

  • Naive loops vs SIMD operations (wide crate)
  • Character-by-character vs byte-level processing
  • Expected improvement: 4-8x

4. Parallelism Workloads

  • Map-Reduce Word Count: Multi-file processing
  • Matrix Operations: 1000x1000 matrix multiplication

Baseline vs Optimized:

  • Sequential vs Rayon parallel processing
  • Cache-unfriendly vs blocked algorithms
  • Expected improvement: 4-8x (multi-core scaling)

5. Memory Allocation Patterns

  • Collection Pipelines: Filter/map/reduce operations
  • String Building: Large string construction

Baseline vs Optimized:

  • Multiple .collect() vs streaming
  • No pre-allocation vs with_capacity()
  • Expected improvement: 2-10x

πŸ“Š Statistical Methodology

Rigor Requirements

  • Sample Size: 100+ iterations per benchmark
  • Confidence Level: 95% (p < 0.05)
  • Effect Size: Cohen's d > 0.8 (large effect)
  • Multiple Comparisons: Bonferroni correction applied

Environment Controls

  • CPU pinned to cores 0-3 (taskset)
  • Performance governor enabled
  • ASLR disabled for consistent memory layout
  • tmpfs used for I/O benchmarks (eliminates disk variance)
  • System caches cleared between runs

🎯 Success Criteria

Primary Success (Disproves Pitfall Theory)

  • β‰₯3 out of 5 categories show β‰₯2x improvement
  • All improvements statistically significant (p < 0.05)
  • Large effect sizes (Cohen's d > 0.8)

Minimum Viable Success

  • Any single category shows β‰₯2x improvement with statistical significance

πŸ”§ Implementation Standards

Baseline (Naive) Code

  • Debug builds (cargo build)
  • Direct unbuffered I/O
  • Frequent allocations and .clone() calls
  • Index-based loops with bounds checking
  • No SIMD, parallelism, or profiling

Optimized Code

  • Release builds with LTO (opt-level=3, lto="fat")
  • Buffered I/O with appropriate buffer sizes
  • Pre-allocated memory (String::with_capacity())
  • Iterator chains and zero-copy techniques
  • SIMD operations where applicable
  • Rayon parallelism for suitable workloads

πŸ“ˆ Results & Analysis

Viewing Results

  • Criterion Reports: Open target/criterion/index.html in browser
  • Flamegraphs: SVG files in results/run_*/flamegraphs/
  • Raw Data: JSON files in results/run_*/

Analysis Scripts

# Generate statistical analysis (requires Python)
python3 scripts/analyze_results.py

# Create comprehensive report
python3 scripts/generate_report.py

πŸ› Troubleshooting

Common Issues

  • Permission denied: Run ./scripts/setup_environment.sh with sudo access
  • Out of memory: Reduce dataset sizes in data/generate_data.rs
  • Slow execution: Check CPU governor is set to "performance"
  • Inconsistent results: Ensure tmpfs is mounted and ASLR is disabled

Environment Validation

# Check CPU governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

# Check ASLR status (should be 0)
cat /proc/sys/kernel/randomize_va_space

# Check tmpfs mount
mountpoint /tmp/benchmark_data

Cleanup

# Restore system settings
./scripts/restore_environment.sh

πŸ“š References

  • PRD Specification: See docs/prd.md for complete requirements
  • Criterion.rs: Statistical benchmarking framework
  • Rayon: Data parallelism library
  • SIMD: wide crate for portable SIMD operations

🀝 Contributing

This benchmark suite follows strict scientific methodology. Any modifications must:

  1. Maintain statistical rigor (100+ samples, proper significance testing)
  2. Preserve realistic baseline implementations
  3. Document all optimization techniques used
  4. Include correctness validation tests

πŸ“„ License

This project is designed for scientific research and performance analysis.


Generated by the Rust Performance Benchmark Suite
Scientifically disproving the pitfall theory through rigorous measurement