Skip to content

toon-format/toon-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

TOON Format for Python

Tests Python Versions

⚠️ Beta Status (v0.9.x): This library is in active development and working towards spec compliance. Beta published to PyPI. API may change before 1.0.0 release.

Compact, human-readable serialization format for LLM contexts with 30-60% token reduction vs JSON. Combines YAML-like indentation with CSV-like tabular arrays. Working towards full compatibility with the official TOON specification.

Key Features: Minimal syntax β€’ Tabular arrays for uniform data β€’ Array length validation β€’ Python 3.8+ β€’ Comprehensive test coverage.

# Beta published to PyPI - install from source:
git clone https://github.com/toon-format/toon-python.git
cd toon-python
uv sync

# Or install directly from GitHub:
pip install git+https://github.com/toon-format/toon-python.git

Quick Start

from toon_format import encode, decode

# Simple object
encode({"name": "Alice", "age": 30})
# name: Alice
# age: 30

# Tabular array (uniform objects)
encode([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}])
# [2,]{id,name}:
#   1,Alice
#   2,Bob

# Decode back to Python
decode("items[2]: apple,banana")
# {'items': ['apple', 'banana']}

CLI Usage

# Auto-detect format by extension
toon input.json -o output.toon      # Encode
toon data.toon -o output.json       # Decode
echo '{"x": 1}' | toon -            # Stdin/stdout

# Options
toon data.json --encode --delimiter "\t" --length-marker
toon data.toon --decode --no-strict --indent 4

Options: -e/--encode -d/--decode -o/--output --delimiter --indent --length-marker --no-strict

API Reference

encode(value, options=None) β†’ str

encode({"id": 123}, {"delimiter": "\t", "indent": 4, "lengthMarker": "#"})

Options:

  • delimiter: "," (default), "\t", "|"
  • indent: Spaces per level (default: 2)
  • lengthMarker: "" (default) or "#" to prefix array lengths

decode(input_str, options=None) β†’ Any

decode("id: 123", {"indent": 2, "strict": True})

Options:

  • indent: Expected indent size (default: 2)
  • strict: Validate syntax, lengths, delimiters (default: True)

Token Counting & Comparison

Measure token efficiency and compare formats:

from toon_format import estimate_savings, compare_formats, count_tokens

# Measure savings
data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}
result = estimate_savings(data)
print(f"Saves {result['savings_percent']:.1f}% tokens")  # Saves 42.3% tokens

# Visual comparison
print(compare_formats(data))
# Format Comparison
# ────────────────────────────────────────────────
# Format      Tokens    Size (chars)
# JSON            45             123
# TOON            28              85
# ────────────────────────────────────────────────
# Savings: 17 tokens (37.8%)

# Count tokens directly
toon_str = encode(data)
tokens = count_tokens(toon_str)  # Uses tiktoken (gpt5/gpt5-mini)

Requires tiktoken: uv add tiktoken (benchmark features are optional)

Format Specification

Type Example Input TOON Output
Object {"name": "Alice", "age": 30} name: Alice
age: 30
Primitive Array [1, 2, 3] [3]: 1,2,3
Tabular Array [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}] [2,]{id,name}:
Β Β 1,A
Β Β 2,B
Mixed Array [{"x": 1}, 42, "hi"] [3]:
Β Β - x: 1
Β Β - 42
Β Β - hi

Quoting: Only when necessary (empty, keywords, numeric strings, whitespace, structural chars, delimiters)

Type Normalization: Infinity/NaN/Functions β†’ null β€’ Decimal β†’ float β€’ datetime β†’ ISO 8601 β€’ -0 β†’ 0

Development

# Setup (requires uv: https://docs.astral.sh/uv/)
git clone https://github.com/toon-format/toon-python.git
cd toon-python
uv sync

# Run tests (792 tests, 91% coverage, 85% enforced)
uv run pytest --cov=toon_format --cov-report=term

# Code quality
uv run ruff check src/ tests/        # Lint
uv run ruff format src/ tests/       # Format
uv run mypy src/                     # Type check

CI/CD: GitHub Actions β€’ Python 3.8-3.14 β€’ Coverage enforcement β€’ PR coverage comments

Project Status & Roadmap

Following semantic versioning towards 1.0.0:

  • v0.8.x - Initial code set, tests, documentation βœ…
  • v0.9.x - Serializer improvements, spec compliance testing, publishing setup (current)
  • v1.0.0-rc.x - Release candidates for production readiness
  • v1.0.0 - First stable release with full spec compliance

See CONTRIBUTING.md for detailed guidelines.

Documentation

License

MIT License - see LICENSE

About

🐍 Community-driven Python implementation of TOON

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 5

Languages