Stop prompting. Start decomposing.
Deterministic text classification for AI agents. Decompose turns any text into classified, structured semantic units — instantly. No LLM. No setup. One function call.
The contractor shall provide all materials per ASTM C150-20. Maximum load
shall not exceed 500 psf per ASCE 7-22. Notice to proceed within 14 calendar
days of contract execution. Retainage of 10% applies to all payments.
For general background, the project is located in Denver, CO...
[
{
"text": "The contractor shall provide all materials per ASTM C150-20.",
"authority": "mandatory",
"risk": "compliance",
"type": "requirement",
"irreducible": true,
"attention": 8.0,
"entities": ["ASTM C150-20"]
},
{
"text": "Maximum load shall not exceed 500 psf per ASCE 7-22.",
"authority": "prohibitive",
"risk": "safety_critical",
"type": "constraint",
"irreducible": true,
"attention": 10.0,
"entities": ["ASCE 7-22"]
}
]Every unit classified. Every standard extracted. Every risk scored. Your agent knows what matters.
pip install decompose-mcpAdd to your agent's MCP config (Claude Code, Cursor, Windsurf, etc.):
{
"mcpServers": {
"decompose": {
"command": "uvx",
"args": ["decompose-mcp", "--serve"]
}
}
}Your agent gets two tools:
decompose_text— decompose any textdecompose_url— fetch a URL and decompose its content
Install the skill from ClawHub or configure directly:
{
"mcpServers": {
"decompose": {
"command": "python3",
"args": ["-m", "decompose", "--serve"]
}
}
}Or install the skill: clawdhub install decompose-mcp
# Pipe text
cat spec.txt | decompose --pretty
# Inline
decompose --text "The contractor shall provide all materials per ASTM C150-20."
# Compact output (smaller JSON)
cat document.md | decompose --compactfrom decompose import decompose_text, filter_for_llm
result = decompose_text("The contractor shall provide all materials per ASTM C150-20.")
for unit in result["units"]:
print(f"[{unit['authority']}] [{unit['risk']}] {unit['text'][:60]}...")
# Pre-filter for LLM context — keep only high-value units
filtered = filter_for_llm(result, max_tokens=4000)
print(f"{filtered['meta']['reduction_pct']}% token reduction")
llm_input = filtered["text"] # Ready for your LLM| Field | Values | What It Tells Your Agent |
|---|---|---|
authority |
mandatory, prohibitive, directive, permissive, conditional, informational | Is this a hard requirement or background? |
risk |
safety_critical, security, compliance, financial, contractual, advisory, informational | How much does this matter? |
type |
requirement, definition, reference, constraint, narrative, data | What kind of content is this? |
irreducible |
true/false | Must this be preserved verbatim? |
attention |
0.0 - 10.0 | How much compute should the agent spend here? |
entities |
standards, codes, regulations | What formal references are cited? |
actionable |
true/false | Does someone need to do something? |
Decompose is not the destination. It's the step before the LLM that most developers skip — not because it's hard, but because nobody showed them it exists. Documents have structure. That structure is classifiable. And classification should happen before reasoning.
Without: document → chunk → embed → retrieve → LLM → answer (100% of tokens)
With: document → decompose → filter/route → LLM → answer (20-40% of tokens)
filter_for_llm() keeps mandatory, safety-critical, financial, and compliance units — drops boilerplate before it reaches your LLM or vector store.
from decompose import decompose_text, filter_for_llm
result = decompose_text(open("contract.md").read())
filtered = filter_for_llm(result, max_tokens=4000)
# filtered["text"] = high-value units only, ready for LLM
# filtered["meta"]["reduction_pct"] = how much was dropped (typically 60-80%)
# Or use the units directly for embedding
for unit in filtered["units"]:
embed_and_store(unit["text"], metadata={
"authority": unit["authority"],
"risk": unit["risk"],
"attention": unit["attention"],
})Safety-critical content goes to one chain. Financial content goes to another. Boilerplate gets skipped.
from decompose import decompose_text
result = decompose_text(spec_text)
for unit in result["units"]:
if unit["risk"] == "safety_critical":
safety_chain.process(unit) # Full analysis + human review
elif unit["risk"] == "financial":
audit_chain.process(unit) # Flag for finance team
elif unit["attention"] < 0.5:
pass # Skip boilerplate
else:
general_chain.process(unit) # Standard LLM analysisfrom decompose import decompose_text
result = decompose_text(spec_text)
total = len(result["units"])
high = [u for u in result["units"] if u["attention"] >= 1.0]
print(f"{len(high)}/{total} units need LLM analysis")
print(f"{100 - len(high) * 100 // total}% token reduction")See examples/ for runnable scripts.
Decompose runs on pure regex and heuristics. No Ollama, no API key, no GPU, no inference cost.
This is intentional:
- Fast: <500ms for a 50-page spec
- Deterministic: Same input always produces same output
- Offline: Works air-gapped, on a plane, on CI
- Composable: Your agent's LLM reasons over the structured output — decompose handles the preprocessing
The LLM is what your agent uses. Decompose makes whatever model you're running work better.
Decompose is built by Echology and extracted from AECai, a document intelligence platform for Architecture, Engineering, and Construction firms. The classification patterns, entity extraction, and irreducibility detection are battle-tested against thousands of real AEC documents — specs, contracts, RFIs, inspection reports, pay applications.
Decompose earned its independence — it started as AECai's text classification module, proved general enough to work across domains (insurance, trading, regulatory), and was released standalone. Free, MIT-licensed.
- When Regex Beats an LLM — Decompose classifies the MCP spec in 3.78ms
- Why Your Agent Needs a Cognitive Primitive — attention scoring, irreducibility, and routing
- What "Simulation-Aware" Actually Means — the architecture behind AECai
License: MIT — Copyright (c) 2025-2026 Echology, Inc.