09 Dec 05:06

ed73a87

v1.2.0 Latest

Latest

Full Changelog: v1.0.5...v1.2.0

[1.2.0] - 2024-12-09

Major Release: Enterprise Features & Production Readiness

This release brings major enhancements aligned with the TypeScript v1.3.0 implementation, focusing on adaptive encoding, binary format, versioning, developer tools, and production-ready features.

Added

Binary Format (ZON-B)

MessagePack-Inspired Encoding: Compact binary format with magic header (ZNB\x01)
40-60% Space Savings: Significantly smaller than JSON while maintaining structure
Full Type Support: Primitives, arrays, objects, nested structures
APIs: encode_binary(), decode_binary() with round-trip validation
Test Coverage: 27 tests for binary format

Document-Level Schema Versioning

Version Embedding/Extraction: embed_version() and extract_version() for metadata management
Migration Manager: ZonMigrationManager with BFS path-finding for schema evolution
Backward/Forward Compatibility: Automatic migration between schema versions
Utilities: compare_versions(), is_compatible(), strip_version()
Test Coverage: 39 tests covering all versioning scenarios

Adaptive Encoding System

3 Encoding Modes: compact, readable, llm-optimized for optimal output
Data Complexity Analyzer: Automatic analysis of nesting depth, irregularity, field count
Mode Recommendation: recommend_mode() suggests optimal encoding based on data structure
Intelligent Format Selection: encode_adaptive() with customizable options
Readable Mode Enhancement: Pretty-printing with indentation and multi-line nested objects
LLM Mode Enhancement: Long booleans (true/false) and integer type preservation
Test Coverage: 17 tests for adaptive encoding functionality

Developer Tools

Helper Utilities: size(), compare_formats(), analyze(), infer_schema(), compare(), is_safe()
Enhanced Validator: ZonValidator with linting rules for depth, fields, performance
Pretty Printer: expand_print() for readable mode with multi-line formatting and indentation
Test Coverage: 37 tests for developer tools

Changed

Version: Updated to 1.2.0 for feature parity with TypeScript package
API: Expanded exports to include binary, versioning, and tools modules
Documentation: Aligned with TypeScript v1.3.0 feature set

Performance

Binary Format: 40-60% smaller than JSON
ZON Text: Maintains 16-19% smaller than JSON
Adaptive Selection: Automatically chooses best encoding for your data
Test Suite: All 340 tests passing (up from 237)

Assets 2

02 Dec 23:22

ronibhakta1

v1.1.0

058d3fe

v1.1.0

Full Changelog: v1.0.4...v1.1.0

[1.1.0] - 2024-12-01

Added

Delta Encoding: Efficient encoding for numeric sequences (e.g., id:delta).
Dictionary Compression: Compression for repetitive string columns.
LLM Optimization: encode_llm for token-efficient prompts.
Advanced Schema Validation:
- Regex patterns (.regex())
- UUID validation (.uuid())
- DateTime validation (.datetime(), .date(), .time())
- Literal values (.literal())
- Union types (.union())
- Default values (.default())
- Custom refinements (.refine())
CLI: Full implementation of convert, validate, stats, and format commands.

Changed

Decoder: decode method now accepts type_coercion and strict keyword arguments.
Performance: Improved table parsing and sparse field handling.

Fixed

CLI: Fixed relative import issues and command execution.
Schema: Fixed missing validation methods (min, max, email, etc.).

Fixed

Package Exports: Fixed AttributeError by properly exporting encode and decode functions in zon/__init__.py.

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

Assets 2

30 Nov 10:40

ronibhakta1

v1.0.4

b8f4fa6

v1.0.4

What's Changed

Update ZON Python package to v1.0.4 with schema validation in #3

Full Changelog: v1.0.3...v1.0.4

[1.0.4] - 2025-11-30

Added

Colon-less Syntax: Objects and arrays in nested positions now use key{...} and key[...] syntax, removing redundant colons.
Smart Flattening: Top-level nested objects are automatically flattened to dot notation (e.g., config.db{...}).
Control Character Escaping: All control characters (ASCII 0-31) are now properly escaped to prevent binary file creation.
Runtime Schema Validation: New zon builder and validate() function for LLM guardrails.
Algorithmic Benchmark Generation: Replaced LLM-based question generation with deterministic algorithm for consistent benchmarks.
Expanded Dataset: Added "products" and "feed" data to unified dataset for real-world e-commerce scenarios.
Tricky Questions: Introduced edge cases (non-existent fields, logic traps, case sensitivity) to stress-test LLM reasoning.
Robust Benchmark Runner: Added exponential backoff and rate limiting to handle Azure OpenAI S0 tier constraints.

Changed

Benchmark Formats: Refined tested formats to ZON, TOON, JSON, JSON (Minified), and CSV for focused analysis.
Documentation: Updated README and API references with the latest benchmark results (GPT-5 Nano) and accurate token counts.
Token Efficiency: Recalculated efficiency scores based on the expanded dataset, confirming ZON's leadership (1430.6 score).

Improved

Token Efficiency: Achieved up to 23.8% reduction vs JSON (GPT-4o) thanks to syntax optimizations.
Readability: Cleaner, block-like structure for nested data.

Fixed

Critical Data Integrity: Fixed roundtrip failures for strings containing newlines, empty strings, and escaped characters.
Decoder Logic: Fixed _split_by_delimiter to correctly handle nested arrays and objects within table cells (e.g., [10, 20]).
Encoder Logic: Added mandatory quoting for empty strings and strings with newlines to prevent data loss.
Rate Limiting: Resolved 429 errors during benchmarking with robust retry logic.

Assets 2

29 Nov 00:22

ronibhakta1

v1.0.3

e3cc192

v1.0.3

[1.0.3] - 2025-11-28

Fixed

Critical Data Integrity: Fixed roundtrip failures for strings containing newlines, empty strings, and escaped characters.
Decoder Logic: Fixed _splitByDelimiter to correctly handle nested arrays and objects within table cells (e.g., [10, 20]).
Encoder Logic: Added mandatory quoting for empty strings and strings with newlines to prevent data loss.

Documentation

Updated SPEC.md and syntax-cheatsheet.md to explicitly require quoting for empty strings and escape sequences.

🎯 100% LLM Retrieval Accuracy Achieved

Major Achievement: ZON now achieves 100% LLM retrieval accuracy while maintaining superior token efficiency over TOON!

Changed

Explicit Sequential Columns: Disabled automatic sequential column omission ([id] notation)
- All columns now explicitly listed in table headers for better LLM comprehension
- Example: users:@(5):active,id,lastLogin,name,role (was users:@(5)[id]:active,lastLogin,name,role)
- Trade-off: +1.7% token increase for 100% LLM accuracy

Performance

LLM Accuracy: 100% (24/24 questions) vs TOON 100%, JSON 91.7%
Token Efficiency: 19,995 tokens (5.0% fewer than TOON's 20,988)
Overall Savings vs TOON: 4.6% (Claude) to 17.6% (GPT-4o)

Quality

✅ All unit tests pass (28/28)
✅ All roundtrip tests pass (27/27 datasets)
✅ No data loss or corruption
✅ Production ready

ACHIEVEMENT: 8/8 Perfect Sweep vs All Competitors!

Breaking Changes:

Compact header syntax: @count: instead of @data(count):
Sequential ID auto-omission: [id] notation for 1..N sequences
Adaptive format selection based on data complexity

Added

Sparse Table Encoding: Automatically detects semi-uniform data and uses key:value notation for optional fields
Irregularity Score Calculation: Jaccard similarity-based scoring to choose optimal table format
Sequential Column Detection: Identifies and omits columns with sequential values (1, 2, 3, ..., N)
Smart Date Detection: ISO 8601 dates output unquoted for token efficiency
Context-Aware String Quoting: Only quotes strings when necessary to preserve type semantics

Performance

Total Tokens: 1,945 (down from 2,081 in v1.0.2)
-136 tokens saved (-6.5% improvement)
8/8 wins vs CSV (previously 4/8 tied)
8/8 wins vs TOON (-24.4% better)
-57.2% better than JSON formatted
-27.0% better than JSON compact

Benchmark Results (8 datasets)

Employees: 132 tokens (CSV: 138) - ZON WINS -4.3%
Time-Series: 245 tokens (CSV: 247) - ZON WINS -0.8%
GitHub Repos: 148 tokens (CSV: 164) - ZON WINS -9.8%
Event Logs: 220 tokens (CSV: 231) - ZON WINS -4.8% ← Sparse tables!
E-commerce: 193 tokens (CSV: 313) - ZON WINS -38.3%
Hike Data: 62 tokens (CSV: 85) - ZON WINS -27.1%
Deep Config: 111 tokens (CSV: 182) - ZON WINS -39.0%
Heavily Nested: 764 tokens (CSV: 1,044) - ZON WINS -26.8%

Competitive Analysis

vs CSV: -20.1% tokens overall
vs TOON: -24.4% tokens overall (beats on ALL datasets)
vs JSON: -57.2% formatted, -27.0% compact
Real Cost Savings: $4,890/month vs CSV at 1M API calls (GPT-4)

Fixed

Improved irregular schema detection to enable sparse tables for Event Logs
Enhanced sparse encoding threshold to support up to 5 optional columns
Better handling of undefined/null values in standard tables

Documentation

Added comprehensive competitive analysis vs TOON, CSV, JSON, YAML, XML
Documented sparse table encoding mechanism
Added real-world cost savings calculations
Updated benchmarks with CSV comparison

What's Changed

Port ZON v1.0.3 features from TypeScript to Python by @Copilot in #2

New Contributors

@Copilot made their first contribution in #2

Full Changelog: v1.0.2...v1.0.3

Assets 2

26 Nov 02:25

ronibhakta1

v1.0.2

9e28490

v1.0.2

import zon

# Your data
users = {
  "context": {
    "task": "Our favorite hikes together",
    "location": "Boulder",
    "season": "spring_2025"
  },
  "friends": [
    "ana",
    "luis",
    "sam"
  ],
  "hikes": [
    {
      "id": 1,
      "name": "Blue Lake Trail",
      "distanceKm": 7.5,
      "elevationGain": 320,
      "companion": "ana",
      "wasSunny": true
    },
    {
      "id": 2,
      "name": "Ridge Overlook",
      "distanceKm": 9.2,
      "elevationGain": 540,
      "companion": "luis",
      "wasSunny": false
    },
    {
      "id": 3,
      "name": "Wildflower Loop",
      "distanceKm": 5.1,
      "elevationGain": 180,
      "companion": "sam",
      "wasSunny": true
    }
  ]
}

# Encode (compress)
compressed = zon.encode(users)
# Decode (decompress)
original = zon.decode(compressed)
assert original == users  # ✓ Perfect reconstruction!

ZON (96 tokens, 264 bytes)

context:"{task:Our favorite hikes together,location:Boulder,season:spring_2025}"
friends:"[ana,luis,sam]"

@hikes(3):companion,distanceKm,elevationGain,id,name,wasSunny
ana,7.5,320,1,Blue Lake Trail,T
luis,9.2,540,2,Ridge Overlook,F
sam,5.1,180,3,Wildflower Loop,T

vs TOON Compression comparison:

TOON (104 tokens, 286 bytes):

context:
  task: Our favorite hikes together
  location: Boulder
  season: spring_2025
friends[3]: ana,luis,sam
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
  1,Blue Lake Trail,7.5,320,ana,true
  2,Ridge Overlook,9.2,540,luis,false
  3,Wildflower Loop,5.1,180,sam,true

Compression's:

JSON (compact) (139 tokens, 451 bytes)
ZON (96 tokens, 264 bytes)
TOON (104 tokens, 286 bytes)

Assets 2

23 Nov 13:16

ronibhakta1

v1.0.0

33ce21d

ZON v1.0.0 - Entropy Engine (First Official Release)

🚀 ZON v1.0.0 - Entropy Engine

First Official Release - Production-ready data serialization format optimized for LLM token efficiency.

🎯 What is ZON?

Zero Overhead Notation (ZON) is a human-readable data format that achieves 24-40% better compression than TOON and 30-42% compression vs JSON on real-world data, while remaining 100% readable and debuggable.

✨ Key Features

🧠 Entropy Tournament: Automatically selects optimal compression strategy per column
📊 8 Compression Strategies: ENUM, VALUE, DELTA, GAS_INT, GAS_PAT, GAS_MULT, LIQUID, SOLID
👁️ Human Readable: Unlike binary formats, ZON is plain text
✅ 100% Lossless: Guaranteed perfect reconstruction with safety anchors
⚡ Zero Configuration: Works out of the box

📈 Performance Benchmarks

Standard Datasets

employees.json: 63.1% compression vs JSON, +9.7% vs TOON
complex_nested.json: 76.0% compression vs JSON, +76.6% vs TOON
orders.json: 30.3% compression vs JSON, +2.7% vs TOON

Real-World API Data

Random Users API: 42.4% compression, +40.4% vs TOON 🏆
StackOverflow Q&A: 42.4% compression, +40.4% vs TOON 🏆
GitHub Repos: 33.9% compression, +32.8% vs TOON
Average: 30.5% compression, +24.1% vs TOON

🔧 Installation

pip install zon-format
💡 Quick Start
python
import zon

# Your data
data = [
    {"id": 1, "name": "Alice", "role": "Admin"},
    {"id": 2, "name": "Bob", "role": "User"}
]

# Compress
compressed = zon.encode(data)

# Decompress
original = zon.decode(compressed)

🤖 LLM Integration

Works seamlessly with:

OpenAI (GPT-3.5, GPT-4)
Anthropic Claude
LangChain
LlamaIndex
Hugging Face Transformers
Cost Savings: Reduces LLM API costs by 30-40% through fewer tokens!

📄 License

Proprietary - Free for Production Use

✅ Use in production (commercial/non-commercial)
✅ Integrate into applications
❌ Cannot redistribute source code

📚 Documentation

README - Complete guide
SPEC - Technical specification
Examples - Sample data

Acknowledgments

Inspired by TOON format. Tested on datasets from JSONPlaceholder, GitHub API, Random User Generator, and StackExchange API.

Ready for production use! 🎉

For licensing inquiries:
ronibhakta1@gmail.com

Pre-release checkbox

☐ Leave unchecked (this is a stable release)

Create a discussion for this release

☑ Check this to engage with community

Assets to attach (optional)

You can attach:

Source code (zip/tar.gz) - GitHub does this automatically
dist/ folder files if you built with python -m build (optional)

After publishing, you can:

Share on Twitter/LinkedIn
Post on Reddit (r/Python, r/MachineLearning)
Submit to awesome-python lists
Announce in LLM/AI communities

This creates a professional, comprehensive release that highlights ZON's unique value! 🚀

Assets 2

Releases: ZON-Format/ZON

v1.2.0

[1.2.0] - 2024-12-09

Major Release: Enterprise Features & Production Readiness

Added

Binary Format (ZON-B)

Document-Level Schema Versioning

Adaptive Encoding System

Developer Tools

Changed

Performance

Uh oh!

v1.1.0

[1.1.0] - 2024-12-01

Added

Changed

Fixed

Fixed

Uh oh!

v1.0.4

What's Changed

[1.0.4] - 2025-11-30

Added

Changed

Improved

Fixed

Uh oh!

v1.0.3

[1.0.3] - 2025-11-28

Fixed

Documentation

🎯 100% LLM Retrieval Accuracy Achieved

Changed

Performance

Quality

ACHIEVEMENT: 8/8 Perfect Sweep vs All Competitors!

Added

Performance

Benchmark Results (8 datasets)

Competitive Analysis

Fixed

Documentation

What's Changed

New Contributors

Uh oh!

v1.0.2

Uh oh!