Skip to content
/ tauq Public

A schema-driven data format built for the AI era where every token counts.

License

Notifications You must be signed in to change notification settings

Epistates/tauq

Repository files navigation

Tauq - Token-Efficient Data Notation

44% fewer tokens than JSON overall. 11% more efficient than TOON. Verified with tiktoken.

Crates.io npm PyPI Downloads Tests License: MIT


What is Tauq?

Tauq (τq) is three things:

  1. Tauq Notation (.tqn): A schema-driven text format that achieves 44-54% fewer tokens than JSON (verified with tiktoken cl100k_base).
  2. Tauq Binary Format (TBF): A high-performance binary format achieving 83% size reduction vs JSON with schema-aware columnar encoding.
  3. Tauq Query (.tqq): A pre-processor with shell integration for data transformations.

Built for the AI era where every token counts.


Benchmarks

Token Efficiency (1000 Records)

Format Tokens vs JSON
JSON (minified) 24,005 baseline
TOON 12,002 -50.0%
Tauq (TQN) 11,012 -54.1%

All counts verified with tiktoken cl100k_base (GPT-4/Claude tokenizer).

Binary Size (1000 Records)

Format Size vs JSON
JSON (minified) 92 KB baseline
Tauq (TQN) 43 KB -53%
Tauq (TBF) 16 KB -83%

Overall (10 datasets, 55,647 tokens): Tauq saves 44.2% vs JSON, 10.8% vs TOON. See benchmarks/ for full results.

Quick Example

JSON:

[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]

Tauq:

!def User id name
1 Alice
2 Bob

Features

Token-Optimal (TQN)

  • 44-54% fewer tokens than JSON (verified benchmarks)
  • 11% more efficient than TOON overall
  • Space delimiters tokenize better than commas

Binary Format (TBF)

  • Up to 83% smaller than JSON (with schema-aware encoding)
  • Generic serde encoder: ~44-56% reduction (CLI default)
  • Schema-aware encoder: ~83% reduction (Rust API + type hints)
  • Adaptive integer and dictionary compression
  • Apache Iceberg integration for data lakes

True Streaming

  • StreamingParser iterator API
  • Process records one at a time
  • No count required (unlike TOON's [N])

Schema-Driven

  • Define data shapes with !def
  • Switch schemas with !use
  • Nested types and typed arrays
  • Type hints for binary encoding optimization

Programmable

  • Tauq Query for data transformations
  • Unix pipe model
  • Polyglot support (Python, Rhai, JavaScript)

Production-Ready CLI

  • tauq build - Smart build (TQN→JSON, TQQ→Tauq, supports TBF output)
  • tauq format - JSON → Tauq
  • tauq query - Filter/transform with Rhai expressions
  • tauq exec - Run Tauq Query pipelines
  • tauq minify - Compress to one line
  • tauq prettify - Format to readable Tauq
  • tauq validate - Check syntax

Quick Start

Installation

CLI Tool:

cargo install tauq

Language Bindings

Rust:

[dependencies]
tauq = "0.1"

Python:

pip install tauq

JavaScript/TypeScript:

npm install tauq

Go:

go get github.com/epistates/tauq

Other languages: Java, C#, Swift - see Language Bindings

Hello World

Create config.tqn:

app_name "MyService"
version "1.0.0"
port 8080
debug true
features [api websockets metrics]

Parse to JSON:

$ tauq build config.tqn --pretty
{
  "app_name": "MyService",
  "version": "1.0.0",
  "port": 8080,
  "debug": true,
  "features": ["api", "websockets", "metrics"]
}

Syntax Guide

Simple Values

name "Alice"
age 30
active true
score 99.5
missing null
role admin  # Barewords don't need quotes

Arrays

tags [web api backend]
ids [1 2 3 4 5]
mixed [1 "two" true null]

Tabular Data (The Killer Feature)

!def User id name email role

1 Alice "alice@example.com" admin
2 Bob "bob@example.com" user
3 Carol "carol@example.com" user

Schema Block

Define schemas upfront with --- to separate from data:

!def User id name role
---
users [
  !use User
  1 Alice admin
  2 Bob user
]

The --- separator clears the implicit schema scope, allowing structured key-value data that uses !use inside arrays.

Nested Types

!def Address street city
!def User id name addr:Address

1 Alice { "123 Main" "NYC" }
2 Bob { "456 Oak" "LA" }

Lists of Objects

!def Employee name role
!def Department name budget employees:[Employee]

Engineering 1000000 [
    Alice "Principal Engineer"
    Bob "Senior Engineer"
]

Minified Syntax

!def U id name; 1 Alice; 2 Bob

All on one line for maximum compression!


Examples

We have provided a comprehensive set of examples in the examples/ directory:

  • Basics: Simple configuration and primitive types.
  • Schemas: Typed schemas and nested types.
  • Modularity: Multi-file imports and modular configurations.
  • Real World: Production configurations like Kubernetes deployments.
  • Queries: ETL pipelines and data generation with TauqQ.
  • Minified: Compact single-line syntax examples.

CLI Usage

Build: Tauq → JSON

# To stdout
tauq build data.tqn

# To file with pretty formatting
tauq build data.tqn -o data.json --pretty

# From stdin
cat data.tqn | tauq build -

Format: JSON → Tauq

The formatter intelligently detects arrays of uniform objects and creates schemas automatically:

# Convert JSON to Tauq (auto-generates schemas for nested arrays)
tauq format data.json -o data.tqn

# From stdin
echo '{"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}' | tauq format -
# Output:
# !def User id name
# ---
# users [
#   !use User
#   1 Alice
#   2 Bob
# ]

Execute Tauq Query

# Run data transformations
tauq exec pipeline.tqq -o output.json

# Run in SAFE MODE (disable shell execution)
tauq exec pipeline.tqq --safe

Minify

# Compress to single line
tauq minify data.tqn -o data.min.tqn

Binary Format (TBF)

For high-performance scenarios where tokens don't matter but size and speed do:

use tauq::tbf::{TableSchemaBuilder, FieldEncoding, TableEncode};
use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize, TableEncode)]
struct Employee {
    #[tauq(encoding = "u16")]
    id: u32,
    name: String,
    #[tauq(encoding = "u8", offset = 18)]  // Age 18-273 as 0-255
    age: u32,
}

let employees = vec![/* ... */];
let bytes = employees.encode_tbf();  // 83% smaller than JSON

Apache Iceberg Integration

Enable the iceberg feature for data lake integration:

[dependencies]
tauq = { version = "0.1", features = ["iceberg"] }
use tauq::tbf_iceberg::{TbfFileWriterBuilder, ArrowToTbf};

// Write Arrow RecordBatches as TBF
let mut writer = TbfFileWriterBuilder::new()
    .with_iceberg_schema(&iceberg_schema)
    .build();

writer.write(&record_batch);
let tbf_data = writer.finish();

Contributing

Tauq is in active development. Contributions welcome!

Areas of interest:

  • Parser optimizations
  • Error message improvements
  • Language bindings (Python, JS, Go)
  • Documentation
  • Real-world use cases

License

MIT


Tauq (τq) - Stop wasting tokens on JSON. Start using the future. 🚀

About

A schema-driven data format built for the AI era where every token counts.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published