A high-performance, production-ready implementation of Huffman coding compression algorithm in Go. This library provides lossless data compression with excellent compression ratios for text and repetitive data.
- ✅ Lossless Compression: Perfect data reconstruction guaranteed
- ✅ High Performance: Optimized encoding with 82x performance improvements
- ✅ Deterministic: Consistent compression output for the same input
- ✅ Large File Support: Handles files up to 4GB
- ✅ Full Byte Range: Supports all 256 possible byte values
- ✅ Comprehensive Testing: 26+ unit and integration tests with 100% pass rate
- ✅ Memory Efficient: Optimized data structures and algorithms
go get github.com/letsmakecakes/huffmanBuild the executable:
go build -o huffman main.goOr build for different platforms:
# Windows
go build -o huffman.exe main.go
# Linux/Mac
go build -o huffman main.goCompress a file:
./huffman compress input.txt output.huffDecompress a file:
./huffman decompress output.huff restored.txtOr run directly without building:
go run main.goTry with the included test file:
# Build the project
go build -o huffman main.go
# Compress the test file
./huffman compress data/test/test.txt data/test/test.txt.huff
# Decompress it back
./huffman decompress data/test/test.txt.huff data/test/test_restored.txt
# Verify the files match
diff data/test/test.txt data/test/test_restored.txtSee main.go for a complete example. Basic usage:
package main
import (
"log"
"github.com/letsmakecakes/huffman/pkg/huffman"
)
func main() {
// Compress a file
if err := huffman.CompressFile("input.txt", "output.huff"); err != nil {
log.Fatal(err)
}
// Decompress a file
if err := huffman.DecompressFile("output.huff", "restored.txt"); err != nil {
log.Fatal(err)
}
}// Build frequency table
data := []byte("hello world")
freq := huffman.BuildFrequencyTable(data)
// Build Huffman tree
root := huffman.BuildHuffmanTree(freq)
// Generate code table
codes := huffman.GenerateCodeTable(root)
// Encode data
encoded := huffman.EncodeData(data, codes)
// Decode data
decoded, err := huffman.DecodeData(encoded, root, int64(len(data)), 0)Benchmarked on AMD Ryzen 7 4800H:
| Operation | Time per Operation | Throughput |
|---|---|---|
| Frequency Table | ~105μs | ~9.5MB/s |
| Tree Building | ~13μs | N/A |
| Encoding | ~222μs | ~4.5MB/s |
| Decoding | ~200μs | ~5MB/s |
Real-world compression ratios achieved:
| Data Type | Compression Ratio | Example |
|---|---|---|
| Highly Repetitive | 13.5% | Log files, DNA sequences |
| English Text | 60% | Documents, code files |
| High Entropy | >100% | Encrypted data, compressed files |
Huffman coding is a lossless data compression algorithm that assigns variable-length codes to characters based on their frequency:
- Frequency Analysis: Count occurrences of each byte in the input
- Tree Construction: Build a binary tree where frequent bytes have shorter paths
- Code Generation: Assign binary codes based on tree paths (left=0, right=1)
- Encoding: Replace each byte with its variable-length code
- Decoding: Traverse the tree using bits to reconstruct original data
The compressed file uses a custom binary format:
[Magic:1][FileSize:4][Padding:1][TableSize:1][FreqTable:N×3][EncodedData:variable]
- Magic Byte:
0x48('H') - File identifier - File Size: 4 bytes (uint32) - Original file size, max 4GB
- Padding: 1 byte - Number of padding bits (0-7)
- Table Size: 1 byte - Number of unique characters (0-255)
- Frequency Table: 3 bytes per entry
- Character: 1 byte (uint8)
- Frequency: 2 bytes (uint16) - Supports frequencies up to 65,535
- Encoded Data: Variable length - Huffman-encoded bits
huffman/
├── pkg/
│ └── huffman/
│ ├── huffman.go # Core compression algorithm
│ ├── compress.go # File operations
│ └── huffman_test.go # Unit tests
├── test/
│ └── integration_test.go # Integration tests
├── data/
│ └── test/
│ └── test.txt # Test data
├── main.go # Example usage
├── go.mod # Go module file
└── README.md # This file
The project includes comprehensive test coverage:
go test -v ./pkg/huffman ./testgo test -bench=. ./pkg/huffman-
Unit Tests: 11 test cases covering core functionality
- Frequency table building
- Huffman tree construction
- Code table generation
- Encoding/decoding operations
- Header serialization
-
Integration Tests: 15+ test cases covering real-world scenarios
- Large file compression (57KB+)
- Various data types (text, binary, unicode)
- Edge cases (single byte, all same character)
- Error handling (invalid files, corrupted data)
- Compression ratio validation
- Deterministic Tree Building: Characters are sorted alphabetically before tree construction to ensure consistent results
- Efficient String Building: Uses
strings.Builderinstead of string concatenation (82x performance improvement) - Nil-Safe Traversal: Comprehensive pointer validation during tree navigation
- uint16 Frequencies: Supports character frequencies up to 65,535 for large files
- Maximum File Size: 4GB (uint32 limit for file size field)
- High Entropy Data: Already-compressed or encrypted data may expand slightly due to header overhead
- Unique Characters: Maximum 255 unique byte values (uint8 limit for table size)
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Write tests for your changes
- Ensure all tests pass (
go test -v ./...) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Adwaith Rajeev
- David A. Huffman for the Huffman coding algorithm (1952)
- Go community for excellent standard library support
Note: This is a pure Go implementation with no external dependencies beyond the Go standard library.