Tauq - Token-Efficient Data Notation

44% fewer tokens than JSON overall. 11% more efficient than TOON. Verified with tiktoken.

What is Tauq?

Tauq (τq) is three things:

Tauq Notation (.tqn): A schema-driven text format that achieves 44-54% fewer tokens than JSON (verified with tiktoken cl100k_base).
Tauq Binary Format (TBF): A high-performance binary format achieving 83% size reduction vs JSON with schema-aware columnar encoding.
Tauq Query (.tqq): A pre-processor with shell integration for data transformations.

Built for the AI era where every token counts.

Benchmarks

Token Efficiency (1000 Records)

Format	Tokens	vs JSON
JSON (minified)	24,005	baseline
TOON	12,002	-50.0%
Tauq (TQN)	11,012	-54.1%

All counts verified with tiktoken cl100k_base (GPT-4/Claude tokenizer).

Binary Size (1000 Records)

Format	Size	vs JSON
JSON (minified)	92 KB	baseline
Tauq (TQN)	43 KB	-53%
Tauq (TBF)	16 KB	-83%

Overall (10 datasets, 55,647 tokens): Tauq saves 44.2% vs JSON, 10.8% vs TOON. See benchmarks/ for full results.

Quick Example

JSON:

[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]

Tauq:

!def User id name
1 Alice
2 Bob

Features

Token-Optimal (TQN)

44-54% fewer tokens than JSON (verified benchmarks)
11% more efficient than TOON overall
Space delimiters tokenize better than commas

Binary Format (TBF)

Up to 83% smaller than JSON (with schema-aware encoding)
Generic serde encoder: ~44-56% reduction (CLI default)
Schema-aware encoder: ~83% reduction (Rust API + type hints)
Adaptive integer and dictionary compression
Apache Iceberg integration for data lakes

True Streaming

StreamingParser iterator API
Process records one at a time
No count required (unlike TOON's [N])

Schema-Driven

Define data shapes with !def
Switch schemas with !use
Nested types and typed arrays
Type hints for binary encoding optimization

Programmable

Tauq Query for data transformations
Unix pipe model
Polyglot support (Python, Rhai, JavaScript)

Production-Ready CLI

tauq build - Smart build (TQN→JSON, TQQ→Tauq, supports TBF output)
tauq format - JSON → Tauq
tauq query - Filter/transform with Rhai expressions
tauq exec - Run Tauq Query pipelines
tauq minify - Compress to one line
tauq prettify - Format to readable Tauq
tauq validate - Check syntax

Quick Start

Installation

CLI Tool:

cargo install tauq

Language Bindings

Rust:

[dependencies]
tauq = "0.1"

Python:

pip install tauq

JavaScript/TypeScript:

npm install tauq

Go:

go get github.com/epistates/tauq

Other languages: Java, C#, Swift - see Language Bindings

Hello World

Create config.tqn:

app_name "MyService"
version "1.0.0"
port 8080
debug true
features [api websockets metrics]

Parse to JSON:

$ tauq build config.tqn --pretty
{
  "app_name": "MyService",
  "version": "1.0.0",
  "port": 8080,
  "debug": true,
  "features": ["api", "websockets", "metrics"]
}

Syntax Guide

Simple Values

name "Alice"
age 30
active true
score 99.5
missing null
role admin  # Barewords don't need quotes

Arrays

tags [web api backend]
ids [1 2 3 4 5]
mixed [1 "two" true null]

Tabular Data (The Killer Feature)

!def User id name email role

1 Alice "alice@example.com" admin
2 Bob "bob@example.com" user
3 Carol "carol@example.com" user

Schema Block

Define schemas upfront with --- to separate from data:

!def User id name role
---
users [
  !use User
  1 Alice admin
  2 Bob user
]

The --- separator clears the implicit schema scope, allowing structured key-value data that uses !use inside arrays.

Nested Types

!def Address street city
!def User id name addr:Address

1 Alice { "123 Main" "NYC" }
2 Bob { "456 Oak" "LA" }

Lists of Objects

!def Employee name role
!def Department name budget employees:[Employee]

Engineering 1000000 [
    Alice "Principal Engineer"
    Bob "Senior Engineer"
]

Minified Syntax

!def U id name; 1 Alice; 2 Bob

All on one line for maximum compression!

Examples

We have provided a comprehensive set of examples in the examples/ directory:

Basics: Simple configuration and primitive types.
Schemas: Typed schemas and nested types.
Modularity: Multi-file imports and modular configurations.
Real World: Production configurations like Kubernetes deployments.
Queries: ETL pipelines and data generation with TauqQ.
Minified: Compact single-line syntax examples.

CLI Usage

Build: Tauq → JSON

# To stdout
tauq build data.tqn

# To file with pretty formatting
tauq build data.tqn -o data.json --pretty

# From stdin
cat data.tqn | tauq build -

Format: JSON → Tauq

The formatter intelligently detects arrays of uniform objects and creates schemas automatically:

# Convert JSON to Tauq (auto-generates schemas for nested arrays)
tauq format data.json -o data.tqn

# From stdin
echo '{"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}' | tauq format -
# Output:
# !def User id name
# ---
# users [
#   !use User
#   1 Alice
#   2 Bob
# ]

Execute Tauq Query

# Run data transformations
tauq exec pipeline.tqq -o output.json

# Run in SAFE MODE (disable shell execution)
tauq exec pipeline.tqq --safe

Minify

# Compress to single line
tauq minify data.tqn -o data.min.tqn

Binary Format (TBF)

For high-performance scenarios where tokens don't matter but size and speed do:

use tauq::tbf::{TableSchemaBuilder, FieldEncoding, TableEncode};
use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize, TableEncode)]
struct Employee {
    #[tauq(encoding = "u16")]
    id: u32,
    name: String,
    #[tauq(encoding = "u8", offset = 18)]  // Age 18-273 as 0-255
    age: u32,
}

let employees = vec![/* ... */];
let bytes = employees.encode_tbf();  // 83% smaller than JSON

Apache Iceberg Integration

Enable the iceberg feature for data lake integration:

[dependencies]
tauq = { version = "0.1", features = ["iceberg"] }

use tauq::tbf_iceberg::{TbfFileWriterBuilder, ArrowToTbf};

// Write Arrow RecordBatches as TBF
let mut writer = TbfFileWriterBuilder::new()
    .with_iceberg_schema(&iceberg_schema)
    .build();

writer.write(&record_batch);
let tbf_data = writer.finish();

Contributing

Tauq is in active development. Contributions welcome!

Areas of interest:

Parser optimizations
Error message improvements
Language bindings (Python, JS, Go)
Documentation
Real-world use cases

License

MIT

Tauq (τq) - Stop wasting tokens on JSON. Start using the future. 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github		.github
benches		benches
benchmarks		benchmarks
bindings		bindings
docs		docs
editors/vscode		editors/vscode
examples		examples
fuzz		fuzz
include		include
src		src
tbf_derive		tbf_derive
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

License

Epistates/tauq

Folders and files

Latest commit

History

Repository files navigation

Tauq - Token-Efficient Data Notation

What is Tauq?

Benchmarks

Token Efficiency (1000 Records)

Binary Size (1000 Records)

Quick Example

Features

Token-Optimal (TQN)

Binary Format (TBF)

True Streaming

Schema-Driven

Programmable

Production-Ready CLI

Quick Start

Installation

Language Bindings

Hello World

Syntax Guide

Simple Values

Arrays

Tabular Data (The Killer Feature)

Schema Block

Nested Types

Lists of Objects

Minified Syntax

Examples

CLI Usage

Build: Tauq → JSON

Format: JSON → Tauq

Execute Tauq Query

Minify

Binary Format (TBF)

Apache Iceberg Integration

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages