Releases: ZON-Format/ZON
v1.2.0
Full Changelog: v1.0.5...v1.2.0
[1.2.0] - 2024-12-09
Major Release: Enterprise Features & Production Readiness
This release brings major enhancements aligned with the TypeScript v1.3.0 implementation, focusing on adaptive encoding, binary format, versioning, developer tools, and production-ready features.
Added
Binary Format (ZON-B)
- MessagePack-Inspired Encoding: Compact binary format with magic header (
ZNB\x01) - 40-60% Space Savings: Significantly smaller than JSON while maintaining structure
- Full Type Support: Primitives, arrays, objects, nested structures
- APIs:
encode_binary(),decode_binary()with round-trip validation - Test Coverage: 27 tests for binary format
Document-Level Schema Versioning
- Version Embedding/Extraction:
embed_version()andextract_version()for metadata management - Migration Manager:
ZonMigrationManagerwith BFS path-finding for schema evolution - Backward/Forward Compatibility: Automatic migration between schema versions
- Utilities:
compare_versions(),is_compatible(),strip_version() - Test Coverage: 39 tests covering all versioning scenarios
Adaptive Encoding System
- 3 Encoding Modes:
compact,readable,llm-optimizedfor optimal output - Data Complexity Analyzer: Automatic analysis of nesting depth, irregularity, field count
- Mode Recommendation:
recommend_mode()suggests optimal encoding based on data structure - Intelligent Format Selection:
encode_adaptive()with customizable options - Readable Mode Enhancement: Pretty-printing with indentation and multi-line nested objects
- LLM Mode Enhancement: Long booleans (
true/false) and integer type preservation - Test Coverage: 17 tests for adaptive encoding functionality
Developer Tools
- Helper Utilities:
size(),compare_formats(),analyze(),infer_schema(),compare(),is_safe() - Enhanced Validator:
ZonValidatorwith linting rules for depth, fields, performance - Pretty Printer:
expand_print()for readable mode with multi-line formatting and indentation - Test Coverage: 37 tests for developer tools
Changed
- Version: Updated to 1.2.0 for feature parity with TypeScript package
- API: Expanded exports to include binary, versioning, and tools modules
- Documentation: Aligned with TypeScript v1.3.0 feature set
Performance
- Binary Format: 40-60% smaller than JSON
- ZON Text: Maintains 16-19% smaller than JSON
- Adaptive Selection: Automatically chooses best encoding for your data
- Test Suite: All 340 tests passing (up from 237)
v1.1.0
Full Changelog: v1.0.4...v1.1.0
[1.1.0] - 2024-12-01
Added
- Delta Encoding: Efficient encoding for numeric sequences (e.g.,
id:delta). - Dictionary Compression: Compression for repetitive string columns.
- LLM Optimization:
encode_llmfor token-efficient prompts. - Advanced Schema Validation:
- Regex patterns (
.regex()) - UUID validation (
.uuid()) - DateTime validation (
.datetime(),.date(),.time()) - Literal values (
.literal()) - Union types (
.union()) - Default values (
.default()) - Custom refinements (
.refine())
- Regex patterns (
- CLI: Full implementation of
convert,validate,stats, andformatcommands.
Changed
- Decoder:
decodemethod now acceptstype_coercionandstrictkeyword arguments. - Performance: Improved table parsing and sparse field handling.
Fixed
- CLI: Fixed relative import issues and command execution.
- Schema: Fixed missing validation methods (
min,max,email, etc.).
Fixed
- Package Exports: Fixed
AttributeErrorby properly exportingencodeanddecodefunctions inzon/__init__.py.
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.
v1.0.4
What's Changed
- Update ZON Python package to v1.0.4 with schema validation in #3
Full Changelog: v1.0.3...v1.0.4
[1.0.4] - 2025-11-30
Added
- Colon-less Syntax: Objects and arrays in nested positions now use
key{...}andkey[...]syntax, removing redundant colons. - Smart Flattening: Top-level nested objects are automatically flattened to dot notation (e.g.,
config.db{...}). - Control Character Escaping: All control characters (ASCII 0-31) are now properly escaped to prevent binary file creation.
- Runtime Schema Validation: New
zonbuilder andvalidate()function for LLM guardrails. - Algorithmic Benchmark Generation: Replaced LLM-based question generation with deterministic algorithm for consistent benchmarks.
- Expanded Dataset: Added "products" and "feed" data to unified dataset for real-world e-commerce scenarios.
- Tricky Questions: Introduced edge cases (non-existent fields, logic traps, case sensitivity) to stress-test LLM reasoning.
- Robust Benchmark Runner: Added exponential backoff and rate limiting to handle Azure OpenAI S0 tier constraints.
Changed
- Benchmark Formats: Refined tested formats to ZON, TOON, JSON, JSON (Minified), and CSV for focused analysis.
- Documentation: Updated README and API references with the latest benchmark results (GPT-5 Nano) and accurate token counts.
- Token Efficiency: Recalculated efficiency scores based on the expanded dataset, confirming ZON's leadership (1430.6 score).
Improved
- Token Efficiency: Achieved up to 23.8% reduction vs JSON (GPT-4o) thanks to syntax optimizations.
- Readability: Cleaner, block-like structure for nested data.
Fixed
- Critical Data Integrity: Fixed roundtrip failures for strings containing newlines, empty strings, and escaped characters.
- Decoder Logic: Fixed
_split_by_delimiterto correctly handle nested arrays and objects within table cells (e.g.,[10, 20]). - Encoder Logic: Added mandatory quoting for empty strings and strings with newlines to prevent data loss.
- Rate Limiting: Resolved 429 errors during benchmarking with robust retry logic.
v1.0.3
[1.0.3] - 2025-11-28
Fixed
- Critical Data Integrity: Fixed roundtrip failures for strings containing newlines, empty strings, and escaped characters.
- Decoder Logic: Fixed
_splitByDelimiterto correctly handle nested arrays and objects within table cells (e.g.,[10, 20]). - Encoder Logic: Added mandatory quoting for empty strings and strings with newlines to prevent data loss.
Documentation
- Updated
SPEC.mdandsyntax-cheatsheet.mdto explicitly require quoting for empty strings and escape sequences.
🎯 100% LLM Retrieval Accuracy Achieved
Major Achievement: ZON now achieves 100% LLM retrieval accuracy while maintaining superior token efficiency over TOON!
Changed
- Explicit Sequential Columns: Disabled automatic sequential column omission (
[id]notation)- All columns now explicitly listed in table headers for better LLM comprehension
- Example:
users:@(5):active,id,lastLogin,name,role(wasusers:@(5)[id]:active,lastLogin,name,role) - Trade-off: +1.7% token increase for 100% LLM accuracy
Performance
- LLM Accuracy: 100% (24/24 questions) vs TOON 100%, JSON 91.7%
- Token Efficiency: 19,995 tokens (5.0% fewer than TOON's 20,988)
- Overall Savings vs TOON: 4.6% (Claude) to 17.6% (GPT-4o)
Quality
- ✅ All unit tests pass (28/28)
- ✅ All roundtrip tests pass (27/27 datasets)
- ✅ No data loss or corruption
- ✅ Production ready
ACHIEVEMENT: 8/8 Perfect Sweep vs All Competitors!
Breaking Changes:
- Compact header syntax:
@count:instead of@data(count): - Sequential ID auto-omission:
[id]notation for 1..N sequences - Adaptive format selection based on data complexity
Added
- Sparse Table Encoding: Automatically detects semi-uniform data and uses
key:valuenotation for optional fields - Irregularity Score Calculation: Jaccard similarity-based scoring to choose optimal table format
- Sequential Column Detection: Identifies and omits columns with sequential values (1, 2, 3, ..., N)
- Smart Date Detection: ISO 8601 dates output unquoted for token efficiency
- Context-Aware String Quoting: Only quotes strings when necessary to preserve type semantics
Performance
- Total Tokens: 1,945 (down from 2,081 in v1.0.2)
- -136 tokens saved (-6.5% improvement)
- 8/8 wins vs CSV (previously 4/8 tied)
- 8/8 wins vs TOON (-24.4% better)
- -57.2% better than JSON formatted
- -27.0% better than JSON compact
Benchmark Results (8 datasets)
- Employees: 132 tokens (CSV: 138) - ZON WINS -4.3%
- Time-Series: 245 tokens (CSV: 247) - ZON WINS -0.8%
- GitHub Repos: 148 tokens (CSV: 164) - ZON WINS -9.8%
- Event Logs: 220 tokens (CSV: 231) - ZON WINS -4.8% ← Sparse tables!
- E-commerce: 193 tokens (CSV: 313) - ZON WINS -38.3%
- Hike Data: 62 tokens (CSV: 85) - ZON WINS -27.1%
- Deep Config: 111 tokens (CSV: 182) - ZON WINS -39.0%
- Heavily Nested: 764 tokens (CSV: 1,044) - ZON WINS -26.8%
Competitive Analysis
- vs CSV: -20.1% tokens overall
- vs TOON: -24.4% tokens overall (beats on ALL datasets)
- vs JSON: -57.2% formatted, -27.0% compact
- Real Cost Savings: $4,890/month vs CSV at 1M API calls (GPT-4)
Fixed
- Improved irregular schema detection to enable sparse tables for Event Logs
- Enhanced sparse encoding threshold to support up to 5 optional columns
- Better handling of undefined/null values in standard tables
Documentation
- Added comprehensive competitive analysis vs TOON, CSV, JSON, YAML, XML
- Documented sparse table encoding mechanism
- Added real-world cost savings calculations
- Updated benchmarks with CSV comparison
What's Changed
- Port ZON v1.0.3 features from TypeScript to Python by @Copilot in #2
New Contributors
- @Copilot made their first contribution in #2
Full Changelog: v1.0.2...v1.0.3
v1.0.2
import zon
# Your data
users = {
"context": {
"task": "Our favorite hikes together",
"location": "Boulder",
"season": "spring_2025"
},
"friends": [
"ana",
"luis",
"sam"
],
"hikes": [
{
"id": 1,
"name": "Blue Lake Trail",
"distanceKm": 7.5,
"elevationGain": 320,
"companion": "ana",
"wasSunny": true
},
{
"id": 2,
"name": "Ridge Overlook",
"distanceKm": 9.2,
"elevationGain": 540,
"companion": "luis",
"wasSunny": false
},
{
"id": 3,
"name": "Wildflower Loop",
"distanceKm": 5.1,
"elevationGain": 180,
"companion": "sam",
"wasSunny": true
}
]
}
# Encode (compress)
compressed = zon.encode(users)
# Decode (decompress)
original = zon.decode(compressed)
assert original == users # ✓ Perfect reconstruction!- ZON (96 tokens, 264 bytes)
context:"{task:Our favorite hikes together,location:Boulder,season:spring_2025}"
friends:"[ana,luis,sam]"
@hikes(3):companion,distanceKm,elevationGain,id,name,wasSunny
ana,7.5,320,1,Blue Lake Trail,T
luis,9.2,540,2,Ridge Overlook,F
sam,5.1,180,3,Wildflower Loop,T
vs TOON Compression comparison:
- TOON (104 tokens, 286 bytes):
context:
task: Our favorite hikes together
location: Boulder
season: spring_2025
friends[3]: ana,luis,sam
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
1,Blue Lake Trail,7.5,320,ana,true
2,Ridge Overlook,9.2,540,luis,false
3,Wildflower Loop,5.1,180,sam,true
Compression's:
- JSON (compact) (139 tokens, 451 bytes)
- ZON (96 tokens, 264 bytes)
- TOON (104 tokens, 286 bytes)
ZON v1.0.0 - Entropy Engine (First Official Release)
🚀 ZON v1.0.0 - Entropy Engine
First Official Release - Production-ready data serialization format optimized for LLM token efficiency.
🎯 What is ZON?
Zero Overhead Notation (ZON) is a human-readable data format that achieves 24-40% better compression than TOON and 30-42% compression vs JSON on real-world data, while remaining 100% readable and debuggable.
✨ Key Features
- 🧠 Entropy Tournament: Automatically selects optimal compression strategy per column
- 📊 8 Compression Strategies: ENUM, VALUE, DELTA, GAS_INT, GAS_PAT, GAS_MULT, LIQUID, SOLID
- 👁️ Human Readable: Unlike binary formats, ZON is plain text
- ✅ 100% Lossless: Guaranteed perfect reconstruction with safety anchors
- ⚡ Zero Configuration: Works out of the box
📈 Performance Benchmarks
Standard Datasets
- employees.json: 63.1% compression vs JSON, +9.7% vs TOON
- complex_nested.json: 76.0% compression vs JSON, +76.6% vs TOON
- orders.json: 30.3% compression vs JSON, +2.7% vs TOON
Real-World API Data
- Random Users API: 42.4% compression, +40.4% vs TOON 🏆
- StackOverflow Q&A: 42.4% compression, +40.4% vs TOON 🏆
- GitHub Repos: 33.9% compression, +32.8% vs TOON
- Average: 30.5% compression, +24.1% vs TOON
🔧 Installation
pip install zon-format
💡 Quick Start
python
import zon
# Your data
data = [
{"id": 1, "name": "Alice", "role": "Admin"},
{"id": 2, "name": "Bob", "role": "User"}
]
# Compress
compressed = zon.encode(data)
# Decompress
original = zon.decode(compressed)🤖 LLM Integration
Works seamlessly with:
- OpenAI (GPT-3.5, GPT-4)
- Anthropic Claude
- LangChain
- LlamaIndex
- Hugging Face Transformers
- Cost Savings: Reduces LLM API costs by 30-40% through fewer tokens!
📄 License
Proprietary - Free for Production Use
- ✅ Use in production (commercial/non-commercial)
- ✅ Integrate into applications
- ❌ Cannot redistribute source code
Copyright © 2025 Roni Bhakta. All Rights Reserved.
📚 Documentation
Acknowledgments
Inspired by TOON format. Tested on datasets from JSONPlaceholder, GitHub API, Random User Generator, and StackExchange API.
Ready for production use! 🎉
For licensing inquiries:
ronibhakta1@gmail.com
Pre-release checkbox
☐ Leave unchecked (this is a stable release)
Create a discussion for this release
☑ Check this to engage with community
Assets to attach (optional)
You can attach:
- Source code (zip/tar.gz) - GitHub does this automatically
dist/folder files if you built withpython -m build(optional)
After publishing, you can:
- Share on Twitter/LinkedIn
- Post on Reddit (r/Python, r/MachineLearning)
- Submit to awesome-python lists
- Announce in LLM/AI communities
This creates a professional, comprehensive release that highlights ZON's unique value! 🚀