Skip to content

paiml/copia

Repository files navigation

copia

Pure Rust rsync-style file synchronization library

Crates.io Documentation License: MIT Build Status

Why copia?

  • Embeddable: Use rsync's delta-transfer algorithm as a library, not a subprocess
  • Pure Rust: 100% safe Rust, no unsafe code, fully auditable
  • Zero C Dependencies: No OpenSSL, no librsync, no external binaries
  • Async Support: First-class tokio integration for non-blocking I/O
  • Memory Safe: No buffer overflows, no use-after-free, guaranteed by Rust

Performance

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Scenario                   β”‚ rsync (ms) β”‚ copia (ms) β”‚ Result           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1KB identical              β”‚      43.55 β”‚       0.05 β”‚   Library wins   β”‚
β”‚ 100KB identical            β”‚      43.23 β”‚       0.12 β”‚   Library wins   β”‚
β”‚ 1MB identical              β”‚      43.40 β”‚       0.33 β”‚   Library wins   β”‚
β”‚ 1MB 5% changed             β”‚      44.72 β”‚       4.54 β”‚   Library wins   β”‚
β”‚ 10MB identical             β”‚      43.68 β”‚       3.92 β”‚   Library wins   β”‚
β”‚ 10MB 1% changed            β”‚      46.91 β”‚      43.05 β”‚   Comparable     β”‚
β”‚ 10MB 100% different        β”‚      52.84 β”‚      43.88 β”‚   Comparable     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

⚠️  IMPORTANT: rsync times include ~40ms process spawn overhead.
    This benchmark compares copia as a library vs rsync as a subprocess.
    For embedded/library use cases, copia avoids this overhead entirely.
    For CLI-to-CLI comparison, performance is comparable on large files.

When copia shines:

  • Embedded in applications (no process spawn overhead)
  • High-frequency sync operations (amortize startup cost)
  • Small file synchronization (overhead dominates)
  • When you need async I/O or Rust integration

When rsync is fine:

  • One-off large file transfers (spawn overhead negligible)
  • Shell scripts and CLI workflows
  • When you need rsync's full feature set (permissions, links, etc.)

Installation

Add to your Cargo.toml:

[dependencies]
copia = "0.1"

For async support:

[dependencies]
copia = { version = "0.1", features = ["async"] }

CLI Installation

cargo install copia --features cli

Usage

Library Usage

use copia::{CopiaSync, Sync};
use std::io::Cursor;

// Create sync engine
let sync = CopiaSync::with_block_size(2048);

// Generate signature from basis (old) file
let basis = b"original file content here";
let signature = sync.signature(Cursor::new(basis.as_slice()))?;

// Compute delta from source (new) file
let source = b"modified file content here";
let delta = sync.delta(Cursor::new(source.as_slice()), &signature)?;

// Apply delta to reconstruct the new file
let mut output = Vec::new();
sync.patch(Cursor::new(basis.as_slice()), &delta, &mut output)?;

assert_eq!(output, source);

Async Usage

use copia::async_sync::AsyncCopiaSync;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let sync = AsyncCopiaSync::with_block_size(2048);

    // Sync source file to destination
    let result = sync.sync_files("source.txt", "dest.txt").await?;

    println!("Matched: {} bytes", result.bytes_matched);
    println!("Literal: {} bytes", result.bytes_literal);
    println!("Compression: {:.1}%", result.compression_ratio() * 100.0);

    Ok(())
}

CLI Usage

# Sync a file
copia sync source.txt dest.txt

# Generate signature
copia signature file.txt -o file.sig

# Compute delta
copia delta newfile.txt file.sig -o file.delta

# Apply patch
copia patch oldfile.txt file.delta -o newfile.txt

How It Works

Copia implements the rsync delta-transfer algorithm:

  1. Signature Generation: The basis file is divided into fixed-size blocks. For each block, a rolling checksum (Adler-32 variant) and strong hash (BLAKE3) are computed.

  2. Delta Computation: The source file is scanned with a sliding window. When the rolling checksum matches a known block, the strong hash verifies the match. Matching blocks become "copy" operations; non-matching data becomes "literal" operations.

  3. Patch Application: The delta is applied to the basis file, copying matched blocks and inserting literal data to reconstruct the source.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Basis File │────▢│  Signature  β”‚     β”‚ Source File β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                          β”‚                   β”‚
                          β–Ό                   β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚    Delta Computation     β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚ Delta: [Copy, Literal..] β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
         β”‚  Basis File │─────────
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚    Patch Application     β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚   Reconstructed Source   β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation Details

Component Implementation
Rolling Checksum Adler-32 variant with lazy modulo (normalize every 5000 rolls)
Strong Hash BLAKE3 (32 bytes, cryptographic)
Hash Table FxHashMap for fast u32 key lookups
Parallelism Rayon for multi-core signature generation

API Reference

Core Types

  • CopiaSync - Main synchronization engine
  • Signature - Block signatures for a file
  • Delta - Difference between two files
  • RollingChecksum - Adler-32 variant rolling checksum
  • StrongHash - BLAKE3 cryptographic hash

Async Types

  • AsyncCopiaSync - Async synchronization engine
  • SyncResult - Statistics from sync operation

Feature Flags

Feature Description
async Enable tokio async support
tracing Enable structured tracing instrumentation
cli Build command-line interface (includes async + tracing)

Benchmarks

Run benchmarks yourself:

# Compare against rsync (note: includes process spawn overhead)
cargo bench --bench rsync_comparison --features async

# Run criterion benchmarks (algorithm-only, no spawn overhead)
cargo bench --bench benchmarks

Statistical Methodology

  • Sample size: 100 iterations per benchmark (Criterion default)
  • Warm-up: 3 seconds per benchmark group
  • Confidence interval: 95% with automatic outlier detection
  • Effect size: Cohen's d reported for regressions
  • Outlier detection: Tukey's fences (k=1.5)
  • Reproducibility: rust-toolchain.toml pins compiler version

Comparison with rsync

Feature copia rsync
Language Pure Rust C
Memory Safety Guaranteed Manual
Use as Library Native Subprocess only
Async I/O Native No
Process Overhead None ~40ms spawn
Permissions/ACLs Not yet Yes
Symbolic Links Not yet Yes
Compression Not yet Yes (zlib)

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please read our contributing guidelines and submit PRs to the main branch.

Acknowledgments

  • rsync algorithm by Andrew Tridgell and Paul Mackerras
  • BLAKE3 team for the fast cryptographic hash
  • Rust community for excellent tooling

About

Pure Rust rsync-style delta synchronization library

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages