Build optimal LLM context windows. Priority-based assembly, token budgeting, smart truncation.
Zero mandatory dependencies. Drop into any RAG pipeline or agent in 3 lines.
You have a system prompt, retrieved chunks, conversation history, and tool results. They don't all fit. Which do you drop? How do you truncate? In what order?
context-forge answers this. You assign priorities. It fits as much as possible, in the right order, with the right truncation strategy.
pip install context-forge # zero dependencies
pip install 'context-forge[tiktoken]' # + precise token countingfrom context_forge import ContextAssembler, Priority, TruncationStrategy
assembler = ContextAssembler(token_budget=4000)
# System prompt — always included, never dropped
assembler.add(
"You are a helpful assistant. Answer only from the provided context.",
Priority.REQUIRED,
label="system_prompt",
)
# Retrieved docs — ordered by relevance score within same priority
assembler.add(chunk_1, Priority.HIGH, relevance_score=0.95, label="doc_1", category="retrieved")
assembler.add(chunk_2, Priority.HIGH, relevance_score=0.82, label="doc_2", category="retrieved")
assembler.add(chunk_3, Priority.HIGH, relevance_score=0.61, label="doc_3", category="retrieved")
# Conversation history — keep the most recent (TAIL truncation)
assembler.add(
chat_history,
Priority.MEDIUM,
label="history",
truncation=TruncationStrategy.TAIL,
)
# Background context — dropped first if budget is tight
assembler.add(background_info, Priority.LOW, label="background")
# Assemble
result = assembler.assemble()
print(result.text) # assembled context, ready to send
print(result.utilisation) # 0.91 — 91% of budget used
print(result.items_excluded) # items that didn't fit
print(result.tokens_used) # 3641| Priority | Behaviour |
|---|---|
REQUIRED |
Always included. Never dropped. Truncated only if it alone exceeds the budget. |
HIGH |
Included before MEDIUM and LOW. |
MEDIUM |
Included if budget allows. |
LOW |
Dropped first when budget is tight. |
| Strategy | Keeps | Best for |
|---|---|---|
HEAD (default) |
Beginning of text | Documents, articles |
TAIL |
End of text | Conversation history (keep recent) |
MIDDLE |
Beginning + end | Long docs with intro and conclusion |
NONE |
Either fully included or fully excluded | Items that must not be cut |
assembler.add(
chat_history,
Priority.MEDIUM,
truncation=TruncationStrategy.TAIL, # keep most recent messages
)
assembler.add(
tool_result,
Priority.HIGH,
truncation=TruncationStrategy.NONE, # include fully or not at all
)from context_forge import count_tokens, truncate_to_tokens
# Count tokens (approximation by default, precise with tiktoken installed)
tokens = count_tokens("Your text here")
# Truncate to a budget
truncated = truncate_to_tokens(long_text, max_tokens=500, strategy="tail")With tiktoken installed (pip install 'context-forge[tiktoken]'), counting uses the actual tokenizer for your model. Without it, an approximation accurate to within ~10% is used.
result = assembler.assemble()
result.text # the assembled context string
result.tokens_used # tokens consumed
result.tokens_budget # your original budget
result.tokens_remaining # budget left over
result.utilisation # fraction used (0.0–1.0)
result.items_included # fully included ContextItems
result.items_truncated # partially included (truncated) ContextItems
result.items_excluded # items that didn't fit at all
result.items_dropped # count of items not included at allprint(assembler.preview())
# Context Forge Preview — budget: 4000 tokens
# ───────────────────────────────────────────────────────
# PRIORITY LABEL TOKENS SCORE
# ───────────────────────────────────────────────────────
# REQUIRED system_prompt 120 0.50 ✓
# HIGH doc_1 850 0.95 ✓
# HIGH doc_2 720 0.82 ✓
# HIGH doc_3 680 0.61 ✓
# MEDIUM history 1200 0.50 ✓
# LOW background 850 0.50 ✗
# ───────────────────────────────────────────────────────
# Total 4420 / 4000
# ⚠ Over budget by 420 tokens — 420 tokens will be dropped/truncatedresult = (
ContextAssembler(token_budget=8000, reserve_for_output=1000)
.add(system_prompt, Priority.REQUIRED, label="system")
.add(doc_1, Priority.HIGH, relevance_score=0.95)
.add(doc_2, Priority.HIGH, relevance_score=0.78)
.add(history, Priority.MEDIUM, truncation=TruncationStrategy.TAIL)
.assemble()
)# Count tokens in a file or string
context-forge count "Your text here"
context-forge count --file document.txt
# Truncate to a token budget
context-forge truncate --file long_doc.txt --tokens 500 --strategy tail
# Preview assembly before building
context-forge preview --budget 4000 --file system.txt --file doc1.txt --file doc2.txt| context-forge | headroom | context-engineering-toolkit | |
|---|---|---|---|
| pip install | ✓ | ✓ | ✗ |
| Zero mandatory dependencies | ✓ | ✗ | ✗ |
| No server/proxy required | ✓ | ✗ | ✓ |
| Priority-based assembly | ✓ | ✗ | ✓ |
| Truncation strategies | ✓ | ✗ | ✓ |
| AssemblyResult metadata | ✓ | ✗ | partial |
| Relevance score ordering | ✓ | ✗ | ✗ |