44% fewer tokens than JSON overall. 11% more efficient than TOON. Verified with tiktoken.
Tauq (τq) is three things:
- Tauq Notation (
.tqn): A schema-driven text format that achieves 44-54% fewer tokens than JSON (verified with tiktoken cl100k_base). - Tauq Binary Format (TBF): A high-performance binary format achieving 83% size reduction vs JSON with schema-aware columnar encoding.
- Tauq Query (
.tqq): A pre-processor with shell integration for data transformations.
Built for the AI era where every token counts.
| Format | Tokens | vs JSON |
|---|---|---|
| JSON (minified) | 24,005 | baseline |
| TOON | 12,002 | -50.0% |
| Tauq (TQN) | 11,012 | -54.1% |
All counts verified with tiktoken cl100k_base (GPT-4/Claude tokenizer).
| Format | Size | vs JSON |
|---|---|---|
| JSON (minified) | 92 KB | baseline |
| Tauq (TQN) | 43 KB | -53% |
| Tauq (TBF) | 16 KB | -83% |
Overall (10 datasets, 55,647 tokens): Tauq saves 44.2% vs JSON, 10.8% vs TOON. See benchmarks/ for full results.
JSON:
[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]Tauq:
!def User id name
1 Alice
2 Bob
- 44-54% fewer tokens than JSON (verified benchmarks)
- 11% more efficient than TOON overall
- Space delimiters tokenize better than commas
- Up to 83% smaller than JSON (with schema-aware encoding)
- Generic serde encoder: ~44-56% reduction (CLI default)
- Schema-aware encoder: ~83% reduction (Rust API + type hints)
- Adaptive integer and dictionary compression
- Apache Iceberg integration for data lakes
StreamingParseriterator API- Process records one at a time
- No count required (unlike TOON's
[N])
- Define data shapes with
!def - Switch schemas with
!use - Nested types and typed arrays
- Type hints for binary encoding optimization
- Tauq Query for data transformations
- Unix pipe model
- Polyglot support (Python, Rhai, JavaScript)
tauq build- Smart build (TQN→JSON, TQQ→Tauq, supports TBF output)tauq format- JSON → Tauqtauq query- Filter/transform with Rhai expressionstauq exec- Run Tauq Query pipelinestauq minify- Compress to one linetauq prettify- Format to readable Tauqtauq validate- Check syntax
CLI Tool:
cargo install tauqRust:
[dependencies]
tauq = "0.1"Python:
pip install tauqJavaScript/TypeScript:
npm install tauqGo:
go get github.com/epistates/tauqOther languages: Java, C#, Swift - see Language Bindings
Create config.tqn:
app_name "MyService"
version "1.0.0"
port 8080
debug true
features [api websockets metrics]
Parse to JSON:
$ tauq build config.tqn --pretty
{
"app_name": "MyService",
"version": "1.0.0",
"port": 8080,
"debug": true,
"features": ["api", "websockets", "metrics"]
}name "Alice"
age 30
active true
score 99.5
missing null
role admin # Barewords don't need quotes
tags [web api backend]
ids [1 2 3 4 5]
mixed [1 "two" true null]
!def User id name email role
1 Alice "alice@example.com" admin
2 Bob "bob@example.com" user
3 Carol "carol@example.com" user
Define schemas upfront with --- to separate from data:
!def User id name role
---
users [
!use User
1 Alice admin
2 Bob user
]
The --- separator clears the implicit schema scope, allowing structured key-value data that uses !use inside arrays.
!def Address street city
!def User id name addr:Address
1 Alice { "123 Main" "NYC" }
2 Bob { "456 Oak" "LA" }
!def Employee name role
!def Department name budget employees:[Employee]
Engineering 1000000 [
Alice "Principal Engineer"
Bob "Senior Engineer"
]
!def U id name; 1 Alice; 2 Bob
All on one line for maximum compression!
We have provided a comprehensive set of examples in the examples/ directory:
- Basics: Simple configuration and primitive types.
- Schemas: Typed schemas and nested types.
- Modularity: Multi-file imports and modular configurations.
- Real World: Production configurations like Kubernetes deployments.
- Queries: ETL pipelines and data generation with TauqQ.
- Minified: Compact single-line syntax examples.
# To stdout
tauq build data.tqn
# To file with pretty formatting
tauq build data.tqn -o data.json --pretty
# From stdin
cat data.tqn | tauq build -The formatter intelligently detects arrays of uniform objects and creates schemas automatically:
# Convert JSON to Tauq (auto-generates schemas for nested arrays)
tauq format data.json -o data.tqn
# From stdin
echo '{"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}' | tauq format -
# Output:
# !def User id name
# ---
# users [
# !use User
# 1 Alice
# 2 Bob
# ]# Run data transformations
tauq exec pipeline.tqq -o output.json
# Run in SAFE MODE (disable shell execution)
tauq exec pipeline.tqq --safe# Compress to single line
tauq minify data.tqn -o data.min.tqnFor high-performance scenarios where tokens don't matter but size and speed do:
use tauq::tbf::{TableSchemaBuilder, FieldEncoding, TableEncode};
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize, TableEncode)]
struct Employee {
#[tauq(encoding = "u16")]
id: u32,
name: String,
#[tauq(encoding = "u8", offset = 18)] // Age 18-273 as 0-255
age: u32,
}
let employees = vec![/* ... */];
let bytes = employees.encode_tbf(); // 83% smaller than JSONEnable the iceberg feature for data lake integration:
[dependencies]
tauq = { version = "0.1", features = ["iceberg"] }use tauq::tbf_iceberg::{TbfFileWriterBuilder, ArrowToTbf};
// Write Arrow RecordBatches as TBF
let mut writer = TbfFileWriterBuilder::new()
.with_iceberg_schema(&iceberg_schema)
.build();
writer.write(&record_batch);
let tbf_data = writer.finish();Tauq is in active development. Contributions welcome!
Areas of interest:
- Parser optimizations
- Error message improvements
- Language bindings (Python, JS, Go)
- Documentation
- Real-world use cases
MIT
Tauq (τq) - Stop wasting tokens on JSON. Start using the future. 🚀