Skip to content

Fast, pure Swift implementation of Google's Snappy compression algorithm - 64-128 MB/s compression, 203-261 MB/s decompression, 100% C++ compatible

License

Notifications You must be signed in to change notification settings

codelynx/snappy-swift

Repository files navigation

SnappySwift

A pure Swift implementation of Google's Snappy compression algorithm, providing fast compression and decompression with excellent compatibility.

Status

Production Ready - Full implementation complete

  • ✅ Package structure
  • ✅ Complete compression implementation
  • ✅ Complete decompression implementation
  • ✅ Comprehensive test suite (90 tests)
  • ✅ Performance optimizations
  • ✅ 100% C++ compatibility verified
  • ✅ Large payload support (tested up to 10MB)

Overview

Snappy is a compression library optimized for speed rather than maximum compression. It's designed for scenarios where compression/decompression speed is critical.

Key Features

  • Fast: Compression at 64-128 MB/s, decompression at 203-261 MB/s
  • Excellent compression: 20-21x for highly compressible data, 1.5-3x for text
  • 100% C++ compatible: Produces output within 0.03-4% of reference implementation
  • Zero dependencies: Pure Swift, no external libraries
  • Well tested: 90 tests including C++ roundtrip validation
  • Cross-platform: macOS, iOS, watchOS, tvOS (Linux support ready)

Use Cases

  • Database storage (LevelDB, Cassandra, MongoDB)
  • Network protocols (Protocol Buffers, Hadoop)
  • In-memory compression for caching
  • Real-time data pipelines
  • Mobile app data compression

Installation

Swift Package Manager

Add to your Package.swift:

dependencies: [
    .package(url: "https://github.com/codelynx/snappy-swift.git", from: "1.0.0")
]

Then add the dependency to your target:

.target(
    name: "YourTarget",
    dependencies: ["SnappySwift"]
)

Usage

Basic Compression/Decompression

import SnappySwift
import Foundation

// Compress data
let original = "Hello, World! This is a test of Snappy compression.".data(using: .utf8)!
let compressed = try original.snappyCompressed()

print("Original: \(original.count) bytes")
print("Compressed: \(compressed.count) bytes")
print("Ratio: \(Double(original.count) / Double(compressed.count))x")

// Decompress data
let decompressed = try compressed.snappyDecompressed()
assert(decompressed == original)

Working with Large Files

import SnappySwift
import Foundation

// Compress a file
let fileData = try Data(contentsOf: URL(fileURLWithPath: "large-file.txt"))
let compressed = try fileData.snappyCompressed()
try compressed.write(to: URL(fileURLWithPath: "large-file.txt.snappy"))

// Decompress a file
let compressedData = try Data(contentsOf: URL(fileURLWithPath: "large-file.txt.snappy"))
let decompressed = try compressedData.snappyDecompressed()
try decompressed.write(to: URL(fileURLWithPath: "large-file.txt"))

Low-Level Buffer API

For maximum performance with existing buffers:

import SnappySwift

// Using buffer-based API
let input: [UInt8] = [/* your data */]
let maxCompressedSize = Snappy.maxCompressedLength(input.count)
var output = [UInt8](repeating: 0, count: maxCompressedSize)

let compressedSize = try input.withUnsafeBufferPointer { inputBuf in
    try output.withUnsafeMutableBufferPointer { outputBuf in
        try Snappy.compress(inputBuf, to: outputBuf)
    }
}

print("Compressed \(input.count) bytes to \(compressedSize) bytes")

Validation

// Check if data is valid Snappy-compressed
if compressed.isValidSnappyCompressed() {
    print("Data is valid Snappy format")
}

// Get uncompressed length without decompressing
compressed.withUnsafeBytes { buffer in
    let bytes = buffer.bindMemory(to: UInt8.self)
    if let length = Snappy.getUncompressedLength(bytes) {
        print("Will decompress to \(length) bytes")
    }
}

Error Handling

import SnappySwift

do {
    let compressed = try data.snappyCompressed()
    // Use compressed data
} catch SnappyError.inputTooLarge {
    print("Input exceeds maximum size (4GB)")
} catch SnappyError.insufficientBuffer {
    print("Output buffer too small")
} catch SnappyError.invalidData {
    print("Data is not valid Snappy format")
} catch {
    print("Compression failed: \(error)")
}

Performance

Measured on Apple Silicon (M1/M2):

Compression Throughput

  • 1KB data: ~64 MB/s
  • 10KB data: ~102 MB/s
  • 100KB data: ~127 MB/s
  • 1MB data: ~128 MB/s

Decompression Throughput (2x faster)

  • 1KB data: ~203 MB/s
  • 10KB data: ~257 MB/s
  • 100KB data: ~261 MB/s
  • 1MB data: ~248 MB/s

Compression Ratios

Data Type Original Size Compressed Size Ratio
Highly compressible (repeated) 10,000 B 476 B 21.01x
Pattern (8-byte) 10,000 B 483 B 20.70x
Repeated text 9,000 B 469 B 19.19x
Incompressible (random) 10,000 B 10,005 B 1.00x

C++ Compatibility

Compression output compared to Google's C++ reference:

Test Case C++ Size Swift Size Difference
100KB 4,781 B 4,783 B +0.04%
1MB 96,750 B 100,754 B +4.1%
10MB 491,843 B 492,004 B +0.03%

All Swift-compressed data successfully decompresses with C++ implementation.

Architecture

SnappySwift/
├── SnappySwift.swift      # Public API
├── Data+Snappy.swift      # Foundation extensions
├── Internal.swift         # Format types (Tags, Varint)
├── Compression.swift      # LZ77-based compression with 64 KiB fragmentation
└── Decompression.swift    # Fast decompression with validation

Implementation Highlights

  • 64 KiB Fragmentation: Matches C++ reference, prevents UInt16 hash table overflow
  • Skip Heuristic: Adaptive skipping for incompressible data
  • 8-byte Match Finding: Optimized match length detection
  • Branchless Decompression: Fast copy operations
  • Comprehensive Validation: Prevents buffer overflows and invalid data

Development

Building

swift build

Testing

# Run all tests (90 tests)
swift test

# Run specific test suite
swift test --filter CompressionTests
swift test --filter DecompressionTests
swift test --filter CompatibilityTests
swift test --filter BenchmarkTests

Running Benchmarks

# Compression throughput
swift test --filter BenchmarkTests.testCompressionThroughput

# Decompression throughput
swift test --filter BenchmarkTests.testDecompressionThroughput

# Compression ratios
swift test --filter BenchmarkTests.testCompressionRatios

C++ Compatibility Testing

The project includes C++ validation tools for development and testing. The C++ reference implementation is not included in the repository to ensure clean SPM installation for end users.

For Contributors: If you need to regenerate test data or verify compatibility with the C++ reference implementation, see CONTRIBUTING.md for detailed setup instructions.

Most developers don't need this:

  • ❌ Using SnappySwift in your app (SPM handles everything)
  • ❌ Running the existing test suite (test data is pre-generated)
  • ❌ General bug fixes or feature additions

You only need C++ setup when:

  • ✅ Regenerating test data from C++ reference
  • ✅ Validating Swift-compressed output against C++
  • ✅ Updating to newer C++ Snappy versions
  • ✅ Contributing compatibility tests

Compatibility

This implementation is 100% compatible with Google's C++ Snappy implementation:

  • Format: Snappy 1.x format specification
  • Tested against: Google C++ Snappy v1.2.2
  • Interoperability: 100% compatible (reads and writes same format)
  • ✅ Decompresses all C++ Snappy output correctly
  • ✅ Swift-compressed data decompresses correctly with C++
  • ✅ Follows the same format specification
  • ✅ Tested against reference test data (13 test cases)
  • ✅ Large payload support (up to 10MB tested, supports up to 4GB)

Version Compatibility:

  • SnappySwift 1.0.0 implements the Snappy 1.x format
  • Compatible with all Snappy 1.x implementations (C++, Java, Python, etc.)
  • Binary format is stable and backward-compatible

Best Practices

When to Use Snappy

Good Use Cases:

  • High-throughput data pipelines
  • In-memory caching with compression
  • Network protocols requiring fast compression
  • Database storage (key-value stores)
  • Mobile apps (fast decompression, battery efficient)

Not Ideal For:

  • Maximum compression ratio (use zlib, brotli, zstd instead)
  • Very small data (<100 bytes, overhead may exceed savings)
  • Already compressed data (JPEG, PNG, MP4, etc.)

Performance Tips

  1. Reuse buffers when compressing multiple items:

    let maxSize = Snappy.maxCompressedLength(largestInput)
    var outputBuffer = [UInt8](repeating: 0, count: maxSize)
    
    for input in inputs {
        let size = try compress(input, to: &outputBuffer)
        // Use outputBuffer[0..<size]
    }
  2. Validate before decompression for untrusted data:

    if data.isValidSnappyCompressed() {
        let decompressed = try data.snappyDecompressed()
    }
  3. Use Data extensions for convenience, buffers for performance:

    // Convenient
    let compressed = try data.snappyCompressed()
    
    // Faster for repeated operations
    try data.withUnsafeBytes { input in
        try output.withUnsafeMutableBytes { output in
            try Snappy.compress(input, to: output)
        }
    }

Troubleshooting

Common Errors

SnappyError.inputTooLarge

  • Input exceeds 4GB limit (UInt32.max bytes)
  • Solution: Split large files into chunks

SnappyError.insufficientBuffer

  • Output buffer too small for compressed data
  • Solution: Use Snappy.maxCompressedLength() to allocate buffer

SnappyError.invalidData

  • Data is corrupted or not Snappy format
  • Solution: Validate with isValidSnappyCompressed() first

SnappyError.bufferOverflow

  • Malformed compressed data references out of bounds
  • Solution: Ensure data comes from trusted source

Roadmap

Completed ✅

  • Core compression/decompression
  • C++ compatibility verification
  • Performance optimizations
  • Comprehensive test suite
  • Benchmarking suite

Future Enhancements (Optional)

  • Streaming API for large files
  • Framed format support
  • SIMD optimizations
  • Hardware CRC32 on supported platforms
  • Linux platform validation

References

License

BSD 3-Clause License (same as Google Snappy)

See LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Guidelines

  • Maintain C++ compatibility
  • Add tests for new features
  • Run benchmarks to verify performance
  • Follow Swift naming conventions

Acknowledgments

This implementation is based on Google's Snappy C++ library. Special thanks to:

  • The Snappy team for creating an elegant and fast compression algorithm
  • The Swift community for excellent tooling and package ecosystem

About

Fast, pure Swift implementation of Google's Snappy compression algorithm - 64-128 MB/s compression, 203-261 MB/s decompression, 100% C++ compatible

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published