Skip to content

danielendler/datason-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

DataSON Benchmarks

Open source competitive benchmarking for DataSON serialization library

Daily Benchmarks PR Performance Check

๐Ÿ”— Important Links

๐ŸŽฏ Overview

This repository provides transparent, reproducible benchmarks for DataSON using a dual benchmark system designed to serve different performance analysis needs. Using GitHub Actions for zero-cost infrastructure, we deliver accurate competitive analysis and deep optimization insights.

๐Ÿ—๏ธ Benchmark Architecture

We maintain two complementary benchmark systems that serve different purposes:

๐Ÿ† System A: Competitive Benchmarks

Daily/Weekly Market Position Analysis

Purpose: Compare DataSON against external serialization libraries
Used by: Daily benchmarks, weekly reports, market analysis
Script: run_benchmarks.py โ†’ CompetitiveBenchmarkSuite
Data Format: competitive โ†’ tiers โ†’ datasets

  • Competitors: orjson, ujson, json, pickle, jsonpickle, msgpack
  • DataSON Variants: All API levels tested as separate "competitors"
  • Fairness: Multi-tier testing (JSON-safe, object-enhanced, ML-complex)
  • Focus: External market position and competitiveness

๐Ÿ”ง System B: Optimization Benchmarks

Internal Performance Optimization & Regression Detection

Purpose: Validate DataSON optimizations and detect performance regressions
Used by: PR performance checks, optimization validation, baseline comparison
Script: pr_optimized_benchmark.py
Data Format: results_by_tier โ†’ tiers โ†’ datasets

  • API Tiers: Basic, API-optimized, Smart, ML-optimized, Compatibility
  • Profiling Integration: Detailed optimization validation
  • Focus: Internal optimization effectiveness and regression prevention
  • Baseline: Tracks performance improvements over time

๐Ÿ’ก Why Two Systems? Each system is optimized for its specific purpose. Competitive benchmarks need fair external comparisons, while optimization benchmarks need detailed internal analysis. This separation provides cleaner insights and more focused reporting.

Key Features

  • ๐Ÿ† Competitive Analysis: Head-to-head comparison with 6-8 major serialization libraries
  • ๐Ÿ”ง Deep Optimization Analysis: DataSON API-level performance insights and regression detection
  • ๐Ÿ“Š Version Evolution Tracking: Performance analysis across DataSON versions
  • ๐Ÿค– Enhanced CI/CD Integration: Smart PR performance checks with dual benchmark validation
  • ๐ŸŽจ Phase 4 Enhanced Reports: NEW! Interactive reports with comprehensive performance tables, smart units (ฮผs/ms/s), ML compatibility matrix
  • ๐Ÿ“ˆ Interactive Reports: Beautiful charts and visualizations with GitHub Pages hosting
  • ๐Ÿš€ Community Friendly: Easy setup, contribution guidelines, free infrastructure

๐Ÿ”ง Workflow Management

This repository uses a Python-to-YAML workflow generation system for maintainable GitHub Actions workflows:

How to Change CI Workflows

โœ… Recommended: Edit Python Models

# 1. Edit the workflow definitions
vim tools/gen_workflows.py

# 2. Generate updated YAML files  
make workflows

# 3. Commit both Python and generated YAML
git add tools/ .github/workflows/
git commit -m "Update CI workflows"

โŒ Don't: Edit YAML Files Directly

  • YAML files in .github/workflows/ are generated artifacts
  • Manual edits will be overwritten on next generation
  • Always edit the Python models instead

Workflow Development

# Generate workflows from Python models
make workflows

# Validate generated workflows  
make validate-workflows

# Run workflow generator tests
make test-workflows

# Set up development environment (includes pre-commit hooks)
make setup

Benefits of This Approach

  • ๐Ÿ”’ Type Safety: Python models with full IDE support
  • ๐Ÿงช Testable: Unit tests for workflow logic
  • ๐Ÿ“ DRY: Reusable components and patterns
  • โœ… Validated: Schema validation and actionlint integration
  • ๐Ÿค– AI-Friendly: Edit structured code, not whitespace-sensitive YAML

๐Ÿš€ Quick Start

Setup

# Clone the repository
git clone https://github.com/danielendler/datason-benchmarks.git
cd datason-benchmarks

# Install dependencies
pip install -r requirements.txt

๐Ÿ† Competitive Benchmarks (System A)

Compare DataSON against external libraries

# Quick competitive comparison (3-4 libraries, fast)
python scripts/run_benchmarks.py --quick --generate-report

# Full competitive analysis (all available libraries)
python scripts/run_benchmarks.py --competitive --generate-report

# Complete competitive suite with reports
python scripts/run_benchmarks.py --complete --generate-report

# DataSON version evolution analysis
python scripts/run_benchmarks.py --versioning --generate-report

๐Ÿ”ง Optimization Benchmarks (System B)

Validate DataSON optimizations and detect regressions

# PR optimization validation (fast, 5 datasets ร— 5 API tiers)
python scripts/pr_optimized_benchmark.py --output results/optimization_check.json

# Establish new performance baseline
python scripts/pr_optimized_benchmark.py --iterations 20 --output data/results/new_baseline.json

# Optimization-specific validation suite
python benchmarks/optimization_validation.py

๐Ÿ”ฌ Advanced Analysis

# Comprehensive API profiling across all DataSON APIs
python scripts/run_benchmarks.py --profile-apis

# DataSON configuration optimization testing
python scripts/run_benchmarks.py --configurations --generate-report

# Detailed profiling analysis (requires DATASON_PROFILE=1)
DATASON_PROFILE=1 python scripts/profile_stages.py --with-rust off --runs 5

Rust Core Benchmarks (experimental)

The scripts/bench_rust_core.py helper exercises datason.save_string and datason.load_basic with the optional Rust accelerator toggled on or off. Use it to measure fast-path speedups and fallback overhead.

# Run save_string with Rust enabled
python scripts/bench_rust_core.py save_string --with-rust on --sizes 10k --shapes flat --repeat 5 --output results_rust_on.json

# Run save_string with Rust disabled
python scripts/bench_rust_core.py save_string --with-rust off --sizes 10k --shapes flat --repeat 5 --output results_rust_off.json

Configuration notes:

  • --with-rust controls the DATASON_RUST environment variable (on, off, or auto to respect the existing value).
  • Ensure your DataSON wheel includes the Rust extension; otherwise the script skips --with-rust on runs.
  • Output files are JSON and can be merged or inspected directly.

๐ŸŽฏ NEW: Dagger-Based CI/CD Pipelines

Reliable, Testable, and Maintainable Automation

Latest Addition: Hybrid Dagger + GitHub Actions approach eliminates YAML complexity:

# Install Dagger CLI and Python SDK
curl -fsSL https://dl.dagger.io/dagger/install.sh | BIN_DIR=$HOME/.local/bin sh
pip install dagger-io

# Test pipelines locally (instant feedback vs 10+ minute CI cycles)
dagger call daily-benchmarks --source=. --focus-area=api_modes
dagger call weekly-benchmarks --source=. --benchmark-type=comprehensive
dagger call validate-system --source=.

# Run comprehensive test suite
dagger call test-pipeline --source=.

Benefits:

  • โœ… Zero YAML syntax errors - Complex logic moved to Python
  • โšก Local testing - 30-second iterations vs 10+ minute CI cycles
  • ๐Ÿ”ง IDE support - Full autocomplete, debugging, type safety
  • ๐Ÿ“Š Same functionality - All benchmark features preserved
  • ๐Ÿš€ Better reliability - Container-based execution

Phase 4: Enhanced Reporting & Visualization ๐ŸŽจ

NEW: Interactive reports with comprehensive performance tables and smart unit formatting:

# Generate Phase 4 enhanced report from any benchmark result
python scripts/run_benchmarks.py --phase4-report phase2_complete_1750338755.json

# Get intelligent library recommendations by domain  
python scripts/run_benchmarks.py --phase4-decide web      # Web API recommendations
python scripts/run_benchmarks.py --phase4-decide ml       # ML framework recommendations
python scripts/run_benchmarks.py --phase4-decide finance  # Financial services recommendations

# Run trend analysis and regression detection
python scripts/run_benchmarks.py --phase4-trends

Phase 2: Automated Benchmarking ๐Ÿค–

NEW: Full automation with synthetic data generation and regression detection:

# Generate realistic test data
python scripts/generate_data.py --scenario all

# Run regression analysis
python scripts/regression_detector.py current_results.json --baseline latest_baseline.json

# Analyze performance trends
python scripts/analyze_trends.py --input-dir data/results --lookback-weeks 12

View Latest Results

๐Ÿ“Š Current Competitive Landscape

Tested Libraries

Library Type Why Tested Latest Status
DataSON JSON+objects Our library โœ… Active
orjson JSON (Rust) Speed benchmark standard โœ… Available
ujson JSON (C) Popular drop-in replacement โœ… Available
json JSON (stdlib) Baseline reference โœ… Available
pickle Binary objects Python default for objects โœ… Available
jsonpickle JSON objects Direct functional competitor โœ… Available
msgpack Binary compact Cross-language efficiency โœ… Available

Performance Summary

Latest benchmark results from automated daily runs

Results updated automatically by GitHub Actions with interactive charts. View latest reports for detailed visualizations.

๐Ÿ”ง Optimization Analysis

DataSON Configuration Deep Dive

Our enhanced benchmarking system now provides deep API analysis of DataSON's optimization configurations:

๐Ÿ“‹ View Complete API Performance Guide โ†’

Available Optimization Configs

  • get_performance_config() - Speed-optimized settings (UNIX dates, VALUES orient, no type hints)
  • get_ml_config() - ML-optimized settings (UNIX_MS dates, type hints enabled, aggressive coercion)
  • get_strict_config() - Type preservation (ISO dates, strict coercion, complex/decimal preservation)
  • get_api_config() - API-compatible settings (ISO dates, ASCII encoding, string UUIDs)

New DataSON API Methods (Testing Needed)

  • dump_api() - Web API optimized serialization
  • dump_ml() - ML framework optimized serialization
  • dump_secure() - Security-focused with PII redaction
  • dump_fast() - Performance optimized serialization
  • load_smart() - Intelligent deserialization (80-90% success rate)
  • load_perfect() - 100% accurate reconstruction with templates

Key Performance Insights

Dataset Type Fastest Configuration Performance Version
Basic Types Default 0.009ms v0.9.0
DateTime Heavy Default 0.028ms v0.9.0
Decimal Precision Default 0.141ms latest
Large Datasets get_strict_config() 0.978ms latest

โš ๏ธ Critical Finding: ML Config Performance Regression

  • Latest version: get_ml_config() shows 7,800x slowdown on decimal data (1092ms vs 0.14ms)
  • v0.9.0: Normal performance across all configs
  • Investigation: Potential issue with ML config decimal handling in latest version

Configuration Parameters Analysis

Our system automatically discovers and analyzes optimization parameters:

# Example discovered differences between configs:
{
    "get_performance_config": {
        "date_format": "UNIX",           # vs ISO for strict/api
        "dataframe_orient": "VALUES",    # vs RECORDS for others  
        "include_type_hints": false,     # vs true for ml config
        "type_coercion": "SAFE"         # vs STRICT/AGGRESSIVE
    },
    "get_strict_config": {
        "preserve_complex": true,        # Enhanced preservation
        "preserve_decimals": true,       # Decimal accuracy
        "type_coercion": "STRICT"       # Strictest validation
    }
}

๐ŸŽจ Phase 4: Enhanced Reporting & Visualization

Interactive Reports with Comprehensive Analysis

Phase 4 delivers beautiful, interactive HTML reports that transform raw benchmark data into actionable insights:

๐Ÿ“Š Enhanced Features:

  • Performance Summary Tables: Real benchmark data with method comparison
  • Smart Unit Formatting: Automatic ฮผs โ†’ ms โ†’ s conversion based on values
  • ML Framework Compatibility Matrix: Complete NumPy/Pandas support analysis
  • Security Features Analysis: PII redaction effectiveness and compliance insights
  • Interactive Charts: Chart.js visualizations with real performance data
  • Domain-Specific Recommendations: Optimized advice for Web API, ML, Finance, Data Engineering

๐ŸŽฏ Quick Examples:

# Generate enhanced report from any benchmark result
python scripts/run_benchmarks.py --phase4-report phase2_complete_1750338755.json

# Get intelligent recommendations for your use case
python scripts/run_benchmarks.py --phase4-decide ml       # โ†’ datason.dump_ml() for NumPy/Pandas
python scripts/run_benchmarks.py --phase4-decide finance  # โ†’ datason.dump_secure() for PII protection
python scripts/run_benchmarks.py --phase4-decide web      # โ†’ datason.dump_api() for JSON compatibility

# Historical trend analysis with regression detection
python scripts/run_benchmarks.py --phase4-trends

๐Ÿ“ˆ Report Highlights:

  • Performance Table: Shows dump_secure() at 387.31ms vs serialize() at 0.32ms with use case guidance
  • ML Compatibility: Visual matrix showing 100% NumPy/Pandas support for DataSON ML methods
  • Security Analysis: Quantifies PII redaction effectiveness (90-95%) and performance cost (+930%)
  • Smart Units: Displays 53.0ฮผs for fast operations, 387.31ms for complex ones, 2.5s for large datasets

๐ŸŒ Automated Integration:

Phase 4 reports are automatically generated by both daily and weekly CI workflows, with enhanced reports available at:

๐Ÿค– Enhanced CI/CD Integration

Smart PR Performance Checks

Our enhanced PR workflow now provides:

  • โšก Multi-layer Caching: Python deps + competitor libraries for 3-5x faster runs
  • ๐ŸŽฏ Regression Detection: Automated performance regression analysis with severity levels
  • ๐Ÿ“Š Rich Reporting: Interactive charts with performance analysis
  • ๐Ÿ’ฌ Smart Comments: Updates existing PR comments instead of creating duplicates
  • ๐Ÿ” Detailed Analysis: Environment info, test metadata, and performance guidance

Performance Severity Levels

  • ๐Ÿš€ Excellent: <1.5x slower than fastest competitor
  • โœ… Good: 1.5-2x slower
  • โš ๏ธ Acceptable: 2-5x slower
  • โŒ Concerning: >5x slower (triggers investigation)

GitHub Actions Workflows

๐ŸŽฏ Dagger-Based Pipelines (NEW - Recommended)

๐Ÿ“Š Legacy Workflows (Maintained for compatibility)

  • PR Performance Check - Enhanced competitive check with regression analysis
  • Daily Benchmarks - Comprehensive competitive + optimization analysis
  • Weekly Benchmarks - ๐Ÿ†• Phase 2: Complete automation with trend analysis
  • Manual Triggers - Run specific benchmark suites on demand

Migration Status: New Dagger workflows are production-ready and eliminate the YAML complexity issues of legacy workflows.

๐Ÿ†• Phase 2 Automation Features

  • ๐Ÿ”„ Automated Data Generation: Fresh synthetic test data weekly
  • ๐Ÿ” Statistical Regression Detection: Blocks PRs with >25% performance degradation
  • ๐Ÿ“ˆ Historical Trend Analysis: 12-week performance evolution tracking
  • ๐Ÿค– Self-Sustaining: Runs without manual intervention
  • ๐Ÿ“Š Enhanced Reporting: Comprehensive trend analysis and insights

CI vs Local Results

  • ๐Ÿ”’ CI-Only Results: Only CI-generated results are committed (prevents local machine variance)
  • ๐Ÿ“Š Interactive Reports: Auto-generated HTML reports with Plotly.js charts
  • ๐ŸŒ GitHub Pages: Public hosting of latest benchmark reports
  • โ™ป๏ธ Smart Cleanup: Automatic 30-day artifact cleanup for storage efficiency

๐Ÿ“ˆ Enhanced Methodology

Fair Competition Principles

  • Realistic Data: API responses, ML datasets, complex objects, datetime/decimal heavy scenarios
  • Multiple Metrics: Speed, memory usage, output size, success rate, configuration variance
  • Error Handling: Graceful handling of library limitations with detailed error tracking
  • Environment Consistency: Controlled GitHub Actions runners with caching optimization
  • Reproducible: Anyone can run the same benchmarks with identical results

Test Scenarios

๐Ÿ†• Phase 2: Realistic Synthetic Data

Automated generation of 5 comprehensive scenarios with real-world data patterns:

  1. API Fast (api_fast) - REST API responses, user profiles, product catalogs

    {
      "id": "40b2da9f-1c54-4af7-b853-43ee3717a701",
      "username": "jane92", 
      "email": "gwilliams@example.net",
      "profile": {"bio": "Magazine perform foreign air.", "verified": true},
      "preferences": {"notifications": true, "theme": "dark"},
      "stats": {"login_count": 131, "last_active": "1993-01-04T03:19:33.872652"}
    }
  2. ML Training (ml_training) - ML model serialization, feature matrices, time series

    • NumPy arrays with realistic data distributions
    • Pandas DataFrames with time series patterns
    • Model parameters and training metadata
  3. Secure Storage (secure_storage) - Nested configurations, hierarchical data

    {
      "app_config": {
        "database": {"host": "61.225.172.203", "port": 2770, "ssl": true},
        "cache": {"enabled": false, "ttl": 2982, "size_limit": 268},
        "features": {"analytics": true, "debugging": false}
      }
    }
  4. Large Data (large_data) - Dataset handling, streaming data patterns

  5. Edge Cases (edge_cases) - Boundary conditions, Unicode stress tests

๐Ÿ“Š Enhanced Reporting Features

  • Adaptive Unit Formatting - Automatically chooses best units (ms, ฮผs, ns) for readability
  • Sample Data Visualization - Shows exactly what data structures are being tested
  • Interactive Charts - Performance comparison charts with Plotly.js
  • Comprehensive Analysis - Competitive, configuration, and version comparison in one report

Classic Scenarios

  1. Basic Types - Core serialization speed testing
  2. DateTime Heavy - Real-world timestamp patterns with optimization config testing
  3. Decimal Precision - Financial/scientific precision handling
  4. Large Datasets - Memory and compression optimization testing
  5. Complex Structures - Nested objects with user profiles and preferences

Enhanced Metrics

  • Configuration Performance: Per-config benchmarking across DataSON versions
  • API Evolution: Feature availability and compatibility tracking
  • Optimization Variance: Performance difference analysis between configurations
  • Version Regression: Automated detection of performance changes across versions

๐Ÿ“Š Interactive Reporting

New Visualization Features

  • ๐Ÿ“ˆ Performance Evolution Charts: Line charts tracking DataSON performance across versions
  • ๐Ÿ”ง Configuration Comparison: Bar charts comparing optimization configs
  • ๐Ÿ† Competitive Analysis: Grouped bar charts with DataSON highlighting
  • ๐Ÿ“‹ API Details: Expandable sections with deep configuration parameter analysis

Report Types

  • Competitive Reports: Head-to-head library comparisons with interactive charts
  • Optimization Reports: DataSON configuration analysis with recommendations
  • Version Evolution: Historical performance tracking across DataSON versions
  • Combined Analysis: Complete benchmarking suite with all insights

๐ŸŽ‰ Phase 2 Complete: Self-Sustaining Automation

Implementation Date: January 2025
Status: โœ… Complete

๐Ÿš€ What's New in Phase 2

  • ๐Ÿ”„ Automated Data Generation: Realistic synthetic data generated weekly
  • ๐Ÿ” Advanced Regression Detection: Statistical analysis blocks problematic PRs
  • ๐Ÿ“Š Weekly Comprehensive Benchmarks: Full automation with parallel execution
  • ๐Ÿ“ˆ Historical Trend Analysis: 12-week performance evolution tracking
  • ๐Ÿค– Self-Sustaining System: <4 hours/week maintenance as designed

Core Phase 2 Components

  • scripts/generate_data.py - Synthetic data generation CLI
  • scripts/regression_detector.py - Statistical regression analysis
  • scripts/analyze_trends.py - Historical trend analysis
  • .github/workflows/weekly-benchmarks.yml - Comprehensive automation
  • Enhanced PR workflows - Advanced regression detection

Success Metrics Achieved

  • โœ… 95%+ automated execution with error handling
  • โœ… Zero-cost infrastructure using GitHub Actions free tier
  • โœ… Part-time maintainable designed for <4 hours/week
  • โœ… Community transparent with public results and methodology
  • โœ… Regression prevention blocks PRs with >25% performance degradation

Ready for Phase 3

Phase 2 creates the foundation for Phase 3: Polish with:

  • Documentation improvements
  • Additional competitive libraries
  • Enhanced reporting with visualizations
  • Community contribution guidelines

๐Ÿ“‹ View Phase 2 Implementation Details โ†’


๐Ÿ—๏ธ Architecture

Repository Structure

datason-benchmarks/
โ”œโ”€โ”€ .github/workflows/          # Hybrid GitHub Actions + Dagger automation
โ”‚   โ”œโ”€โ”€ dagger-*.yml           # NEW: Minimal Dagger-based workflows (recommended)
โ”‚   โ””โ”€โ”€ *.yml                  # Legacy YAML workflows (maintained)
โ”œโ”€โ”€ dagger/                    # NEW: Python-based CI/CD pipeline logic
โ”‚   โ”œโ”€โ”€ benchmark_pipeline.py  # Main pipeline functions (daily/weekly/test)
โ”‚   โ””โ”€โ”€ __init__.py           # Dagger module exports
โ”œโ”€โ”€ benchmarks/                # Core benchmark suites
โ”‚   โ”œโ”€โ”€ competitive/           # Competitor comparison tests
โ”‚   โ”œโ”€โ”€ configurations/        # DataSON config testing
โ”‚   โ”œโ”€โ”€ versioning/            # Version evolution analysis (NEW)
โ”‚   โ””โ”€โ”€ regression/            # Performance regression detection
โ”œโ”€โ”€ competitors/               # Competitor library adapters
โ”œโ”€โ”€ data/                      # Test datasets and results
โ”‚   โ”œโ”€โ”€ results/              # CI-only historical results  
โ”‚   โ”œโ”€โ”€ synthetic/            # Generated test data
โ”‚   โ””โ”€โ”€ configs/              # Test configurations
โ”œโ”€โ”€ scripts/                   # Enhanced automation and utilities
โ”‚   โ”œโ”€โ”€ run_benchmarks.py     # Main benchmark runner
โ”‚   โ”œโ”€โ”€ improved_*.py         # Enhanced benchmark & reporting system
โ”‚   โ””โ”€โ”€ generate_report.py    # Interactive report generator (ENHANCED)
โ”œโ”€โ”€ docs/                      # Documentation and live reports
โ”‚   โ””โ”€โ”€ results/              # GitHub Pages hosted reports
โ”œโ”€โ”€ dagger.json               # Dagger project configuration
โ””โ”€โ”€ requirements-dagger.txt   # Dagger-specific dependencies

Enhanced Core Components

๐ŸŽฏ Dagger Pipeline Components (NEW)

  • BenchmarkPipeline - Python-based CI/CD automation with type safety
  • daily_benchmarks() - Focus area benchmarking (api_modes, competitive, versions)
  • weekly_benchmarks() - Comprehensive analysis with enhanced test data
  • validate_system() - End-to-end validation and testing

๐Ÿ“Š Legacy Core Components (Maintained)

  • DataSONVersionManager - Version switching and API compatibility testing
  • OptimizationAnalyzer - Deep configuration parameter analysis
  • EnhancedReportGenerator - Interactive charts with Plotly.js
  • CIEnvironmentDetector - Smart CI vs local environment handling

๐Ÿค Contributing

Adding New Optimization Tests

  1. Add test scenarios to create_optimization_test_data() in version suite
  2. Focus on realistic data patterns that benefit from specific configurations
  3. Test across multiple DataSON versions for evolution tracking

Enhancing Analysis

  1. Extend configuration parameter discovery in _discover_config_parameters()
  2. Add new visualization types to report generator
  3. Contribute performance optimization insights

Adding New Competitors

  1. Create adapter in competitors/ directory
  2. Implement CompetitorAdapter interface
  3. Add to CompetitorRegistry
  4. Test with python scripts/run_benchmarks.py --quick --generate-report

๐Ÿ“‹ Requirements

System Requirements

  • Python 3.8+
  • Memory: 4GB+ recommended for optimization analysis
  • Time: 5-45 minutes depending on benchmark scope

Library Dependencies

Core dependencies automatically installed:

  • datason>=0.9.0 - The library being benchmarked
  • orjson, ujson, msgpack, jsonpickle - Competitive libraries
  • numpy, pandas - Realistic ML data generation
  • plotly - Interactive chart generation

๐Ÿ”ฎ Roadmap

Current Status: Phase 1 Complete โœ…

  • Competitive Benchmarking: 7 major serialization libraries
  • GitHub Actions Automation: Daily runs with enhanced caching
  • Optimization Analysis: Deep DataSON configuration testing
  • Version Evolution: Performance tracking across DataSON versions
  • Enhanced PR Checks: Regression detection with smart caching
  • Interactive Reports: Beautiful visualizations with GitHub Pages
  • CI/Local Separation: Consistent results with local development support

Phase 2: Advanced Analysis ๐Ÿšง

  • Memory Usage Profiling: Detailed memory consumption analysis
  • Cross-platform Testing: Windows, macOS, Linux consistency verification
  • Extended ML Integration: PyTorch, TensorFlow, scikit-learn model benchmarking
  • Real-world Datasets: Integration with common ML datasets and API schemas
  • Performance Alerts: Automated notifications for regressions
  • Competitor Version Tracking: Monitor competitive library updates

Phase 3: Community Growth ๐Ÿ“ˆ

  • User-contributed Scenarios: Community test case submissions
  • Conference Materials: Presentation templates and research papers
  • CI/CD Integrations: Plugins for popular CI systems
  • Academic Collaboration: Research partnership opportunities
  • Benchmarking Standards: Industry methodology contributions

๐Ÿ“œ License

MIT License - See LICENSE for details.

This benchmarking methodology and infrastructure is open source and freely available for use by the serialization library community.

๐Ÿ™ Acknowledgments

  • Community Contributors - Test scenarios, optimization insights, and improvements
  • Competitive Libraries - orjson, ujson, msgpack, jsonpickle teams for excellent tools
  • GitHub Actions - Free infrastructure enabling open source benchmarking with enhanced caching
  • DataSON Users - Real-world feedback and optimization requirements
  • Open Source Community - Plotly.js for interactive visualizations

Latest Update: Results automatically updated by Daily Benchmarks workflow with interactive reports available at GitHub Pages

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages