Skip to content

letsmakecakes/huffman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huffman Compression

A high-performance, production-ready implementation of Huffman coding compression algorithm in Go. This library provides lossless data compression with excellent compression ratios for text and repetitive data.

Features

  • Lossless Compression: Perfect data reconstruction guaranteed
  • High Performance: Optimized encoding with 82x performance improvements
  • Deterministic: Consistent compression output for the same input
  • Large File Support: Handles files up to 4GB
  • Full Byte Range: Supports all 256 possible byte values
  • Comprehensive Testing: 26+ unit and integration tests with 100% pass rate
  • Memory Efficient: Optimized data structures and algorithms

Installation

go get github.com/letsmakecakes/huffman

Quick Start

Building the Project

Build the executable:

go build -o huffman main.go

Or build for different platforms:

# Windows
go build -o huffman.exe main.go

# Linux/Mac
go build -o huffman main.go

Command Line Usage

Compress a file:

./huffman compress input.txt output.huff

Decompress a file:

./huffman decompress output.huff restored.txt

Or run directly without building:

go run main.go

Try with the included test file:

# Build the project
go build -o huffman main.go

# Compress the test file
./huffman compress data/test/test.txt data/test/test.txt.huff

# Decompress it back
./huffman decompress data/test/test.txt.huff data/test/test_restored.txt

# Verify the files match
diff data/test/test.txt data/test/test_restored.txt

Using as a Library

See main.go for a complete example. Basic usage:

package main

import (
    "log"
    "github.com/letsmakecakes/huffman/pkg/huffman"
)

func main() {
    // Compress a file
    if err := huffman.CompressFile("input.txt", "output.huff"); err != nil {
        log.Fatal(err)
    }
    
    // Decompress a file
    if err := huffman.DecompressFile("output.huff", "restored.txt"); err != nil {
        log.Fatal(err)
    }
}

Programmatic API

// Build frequency table
data := []byte("hello world")
freq := huffman.BuildFrequencyTable(data)

// Build Huffman tree
root := huffman.BuildHuffmanTree(freq)

// Generate code table
codes := huffman.GenerateCodeTable(root)

// Encode data
encoded := huffman.EncodeData(data, codes)

// Decode data
decoded, err := huffman.DecodeData(encoded, root, int64(len(data)), 0)

Performance

Benchmarked on AMD Ryzen 7 4800H:

Operation Time per Operation Throughput
Frequency Table ~105μs ~9.5MB/s
Tree Building ~13μs N/A
Encoding ~222μs ~4.5MB/s
Decoding ~200μs ~5MB/s

Compression Ratios

Real-world compression ratios achieved:

Data Type Compression Ratio Example
Highly Repetitive 13.5% Log files, DNA sequences
English Text 60% Documents, code files
High Entropy >100% Encrypted data, compressed files

How It Works

Huffman Coding Algorithm

Huffman coding is a lossless data compression algorithm that assigns variable-length codes to characters based on their frequency:

  1. Frequency Analysis: Count occurrences of each byte in the input
  2. Tree Construction: Build a binary tree where frequent bytes have shorter paths
  3. Code Generation: Assign binary codes based on tree paths (left=0, right=1)
  4. Encoding: Replace each byte with its variable-length code
  5. Decoding: Traverse the tree using bits to reconstruct original data

File Format

The compressed file uses a custom binary format:

[Magic:1][FileSize:4][Padding:1][TableSize:1][FreqTable:N×3][EncodedData:variable]
  • Magic Byte: 0x48 ('H') - File identifier
  • File Size: 4 bytes (uint32) - Original file size, max 4GB
  • Padding: 1 byte - Number of padding bits (0-7)
  • Table Size: 1 byte - Number of unique characters (0-255)
  • Frequency Table: 3 bytes per entry
    • Character: 1 byte (uint8)
    • Frequency: 2 bytes (uint16) - Supports frequencies up to 65,535
  • Encoded Data: Variable length - Huffman-encoded bits

Project Structure

huffman/
├── pkg/
│   └── huffman/
│       ├── huffman.go          # Core compression algorithm
│       ├── compress.go         # File operations
│       └── huffman_test.go     # Unit tests
├── test/
│   └── integration_test.go     # Integration tests
├── data/
│   └── test/
│       └── test.txt            # Test data
├── main.go                     # Example usage
├── go.mod                      # Go module file
└── README.md                   # This file

Testing

The project includes comprehensive test coverage:

Run All Tests

go test -v ./pkg/huffman ./test

Run Benchmarks

go test -bench=. ./pkg/huffman

Test Coverage

  • Unit Tests: 11 test cases covering core functionality

    • Frequency table building
    • Huffman tree construction
    • Code table generation
    • Encoding/decoding operations
    • Header serialization
  • Integration Tests: 15+ test cases covering real-world scenarios

    • Large file compression (57KB+)
    • Various data types (text, binary, unicode)
    • Edge cases (single byte, all same character)
    • Error handling (invalid files, corrupted data)
    • Compression ratio validation

Algorithm Optimizations

  1. Deterministic Tree Building: Characters are sorted alphabetically before tree construction to ensure consistent results
  2. Efficient String Building: Uses strings.Builder instead of string concatenation (82x performance improvement)
  3. Nil-Safe Traversal: Comprehensive pointer validation during tree navigation
  4. uint16 Frequencies: Supports character frequencies up to 65,535 for large files

Limitations

  • Maximum File Size: 4GB (uint32 limit for file size field)
  • High Entropy Data: Already-compressed or encrypted data may expand slightly due to header overhead
  • Unique Characters: Maximum 255 unique byte values (uint8 limit for table size)

Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Write tests for your changes
  4. Ensure all tests pass (go test -v ./...)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Adwaith Rajeev

Acknowledgments

  • David A. Huffman for the Huffman coding algorithm (1952)
  • Go community for excellent standard library support

References


Note: This is a pure Go implementation with no external dependencies beyond the Go standard library.

Releases

No releases published

Packages

No packages published

Languages