Skip to content

Harsh-Daga/Lattice

Repository files navigation

LATTICE

LLM Transport & Efficiency Layer
Make every LLM call cheaper, faster, and safe — without changing your model.

PyPI CI License Tests Python


LATTICE sits between your application and any LLM provider. It compresses prompts, caches responses, manages concurrency (TACC), supports a native binary protocol, and routes coding agents through one self-hosted proxy. Your app sends standard OpenAI-format requests; LATTICE makes them smaller, faster, and cache-friendlier.

It is not a router. LATTICE never changes your model, never falls back between providers, never guesses. One provider per request. LATTICE optimises transport and execution.

Table of Contents


Installation

pip install lattice-transport

Optional extras:

pip install "lattice-transport[redis]"   # Multi-process session store
pip install "lattice-transport[mcp]"     # MCP tool support
pip install "lattice-transport[all]"     # Everything

Requirements: Python 3.10+. No external services required for single-process mode.

Quick Start

# Start the proxy
lattice proxy run --port 8787

# Point any OpenAI SDK at it
export OPENAI_BASE_URL=http://localhost:8787/v1

# Or route an agent through it
lattice lace claude
from lattice import LatticeClient

client = LatticeClient()
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Explain transport protocols"}],
)
print(response.choices[0].message.content)

Every request is automatically compressed, cached, and optimized in proxy mode — zero application code changes.


Architecture

Application
   │ OpenAI / Anthropic API format
   ▼
LATTICE Proxy :8787
   │
   ├── state/     Session, segments, SemanticCache
   ├── planner/   RequestClassifier → UnifiedPlanner → ExecutionPlan
   ├── pipeline/  Pipeline.compress() — IR-native transforms + gates
   ├── telemetry/ Metrics, downgrade, cost, agent stats
   └── providers/ adapters/ (17) + transport/ (HTTP pool, TACC, streaming)
            │
            ▼
       LLM Provider (exactly one per request)

Request flow

  1. Client sends POST /v1/chat/completions (or Anthropic /v1/messages).
  2. SessionManager creates or loads a session (CAS versioning).
  3. content_profiler builds a semantic profile; UnifiedPlanner produces an ExecutionPlan.
  4. Pipeline.compress() runs transforms with policy, budget, risk, and MILV gates.
  5. SemanticCache checks exact hash, then approximate fingerprint.
  6. On miss, the provider adapter sends via HTTP/2 pool; TACC manages admission.
  7. Response reverse pass + x-lattice-* headers via LatticeHeaderMiddleware.

Runtime architecture


Novel Technology

LATTICE applies classical systems techniques to LLM workloads — transport and execution, not model features.

Capability Summary Deep dive
TACC Token-aware AIMD congestion control docs/novel/tacc.md
Binary framing 15-byte headers, 17 frame types, CRC32 docs/novel/binary-framing.md
Delta encoding Turn 2+ sends deltas only; CAS sessions docs/novel/delta-encoding.md
Streaming Stall detection, resume tokens, multiplex docs/novel/streaming.md
Batching Groups compatible requests docs/novel/batching-speculation.md
Speculation Rule-based next-turn precompute docs/novel/batching-speculation.md

Batching overhead reduction and long-conversation dedup savings are measured in the canonical benchmark suite — see Claim traceability 1 2.


Compression Pipeline

LATTICE ships 20 transforms (list_transform_names() in lattice.transforms.registry). Six run in the default pipeline; three are execution-only (batching, speculative, delta); the rest are planner-selected or off by default. Every transform is safety-classified and risk-gated.

P Transform Safety What it does Default
1 content_profiler SAFE Classifies content, computes semantic risk score yes
2 runtime_contract SAFE Per-transform budget and timeout yes
2 speculative SAFE Speculative token generation exec
3 batching SAFE Request batching for multi-turn workloads exec
5 delta_encoder SAFE Session-based delta encoding exec
9 cache_arbitrage SAFE KV-cache alignment reorder yes
9 causal_chain SAFE Causal chain extraction no
15 message_dedup CONDITIONAL Exact/near-duplicate turn removal no
17 diagnostic_rle SAFE Diagnostic repetition RLE no
18 context_selector SAFE Submodular context selection no
19 columnar_pack SAFE Columnar table packing no
20 reference_sub CONDITIONAL UUID/URL/hash → short refs yes
21 json_shape SAFE JSON shape factoring no
22 extractive_compress SAFE Extractive compression no
22 rate_distortion CONDITIONAL Rate-distortion semantic compression no
23 path_prefix SAFE Filesystem path prefix compression no
25 format_conversion CONDITIONAL Table/JSON format conversion no
29 tool_projection SAFE Query-aware tool field projection no
30 tool_filter SAFE Tool output filtering yes
40 output_cleanup SAFE Response-side whitespace/JSON cleanup yes

exec = execution-only transform (outside default pipeline list).

Headline compression on the canonical feature suite (ollama-cloud / kimi-k2.6:cloud): 40.3% average reduction 3. Pipeline latency ~36 ms 4.

Transform reference · Claim traceability · Feature parity checklist (61 rows)


Safety

Transforms are classified SAFE, CONDITIONAL, or DANGEROUS. A 0–100 semantic risk score gates lossy transforms; expansion guards cap token growth.

Safety guide · SIG · RATS · PSG · MILV


Observability

curl http://localhost:8787/stats | jq
curl http://localhost:8787/metrics
  • /stats — transforms, sessions, pools, TACC, maintenance, downgrades
  • /metrics — Prometheus counters and histograms
  • Response headersx-lattice-compression, x-lattice-session-id, x-lattice-delta, x-lattice-cost-usd, x-lattice-provider, x-lattice-transforms-applied

Observability


Supported Providers

17 direct adapters. No routing — one provider per request.

Provider Prefix HTTP/2 Streaming
OpenAI openai/ yes SSE
Anthropic anthropic/, claude- yes SSE
Azure azure/ yes SSE
Bedrock bedrock/ yes SSE
Gemini gemini/, google/ yes SSE
Vertex AI vertex/ yes SSE
Groq groq/ yes SSE
DeepSeek deepseek/ yes SSE
Mistral mistral/ yes SSE
Cohere cohere/ yes SSE
Ollama ollama/ SSE
Ollama Cloud ollama-cloud/ yes SSE
OpenRouter openrouter/ yes SSE
Fireworks fireworks/ yes SSE
Together together/ yes SSE
Perplexity perplexity/ yes SSE
AI21 ai21/ yes SSE

Provider details


CLI Reference

lattice proxy run --port 8787
lattice proxy start|stop|restart|status
lattice init
lattice lace|unlace <agent>
lattice info|config|status|health|doctor
lattice benchmark --suite feature   # wraps benchmarks/evals/cli.py

CLI reference


Agent Integration

lattice lace claude    # Claude Code
lattice lace codex     # OpenAI Codex
lattice lace cursor    # Cursor
lattice lace opencode  # OpenCode
lattice lace copilot   # GitHub Copilot

lattice doctor (no args) checks all five agents. lattice init applies durable config; lattice lace uses transient routing + tunnel sidecar.

Integrations


Development

git clone https://github.com/Harsh-Daga/lattice
cd lattice
uv sync
uv run pytest tests/ -q              # 2039 collected, 1824 passed (215 skipped)
uv run pytest tests/contract/ -q
uv run ruff check src/ tests/ benchmarks/
uv run ruff format --check src/ tests/ benchmarks/
uv run mypy src/lattice/

uv run python benchmarks/evals/cli.py --suite all \
  --providers ollama-cloud \
  --provider-model ollama-cloud=kimi-k2.6:cloud \
  --iterations 3 --warmup 1

AGENTS.md for AI agent contributors


Migrating from v0.x

Internal Python imports changed in 1.0.0. CLI and HTTP are stable.

Migration guide · CHANGELOG


Documentation

Section Documents
Getting Started Quick Start · Installation · CLI
Architecture Runtime · Safety · SDK
Novel Tech TACC · Binary framing · Delta · Streaming
Compression Transforms · Caching
Providers 17 providers
Operations Agent integrations

Full index


Claim footnotes


License

MIT © Harsh Daga

GitHub · Issues · PyPI · Changelog

Footnotes

  1. benchmarks/results/CLAIMS.md — batching overhead row

  2. benchmarks/results/CLAIMS.md — message_dedup row

  3. benchmarks/results/CLAIMS.md — feature suite avg_reduction_ratio from v1.0.0.json

  4. benchmarks/results/CLAIMS.mdavg_pipeline_latency_ms from v1.0.0.json