Skip to content

obielin/context-forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

context-forge

Build optimal LLM context windows. Priority-based assembly, token budgeting, smart truncation.

Zero mandatory dependencies. Drop into any RAG pipeline or agent in 3 lines.

PyPI Tests Dependencies Python License LinkedIn


The problem

You have a system prompt, retrieved chunks, conversation history, and tool results. They don't all fit. Which do you drop? How do you truncate? In what order?

context-forge answers this. You assign priorities. It fits as much as possible, in the right order, with the right truncation strategy.


Install

pip install context-forge                  # zero dependencies
pip install 'context-forge[tiktoken]'      # + precise token counting

Quick Start

from context_forge import ContextAssembler, Priority, TruncationStrategy

assembler = ContextAssembler(token_budget=4000)

# System prompt — always included, never dropped
assembler.add(
    "You are a helpful assistant. Answer only from the provided context.",
    Priority.REQUIRED,
    label="system_prompt",
)

# Retrieved docs — ordered by relevance score within same priority
assembler.add(chunk_1, Priority.HIGH, relevance_score=0.95, label="doc_1", category="retrieved")
assembler.add(chunk_2, Priority.HIGH, relevance_score=0.82, label="doc_2", category="retrieved")
assembler.add(chunk_3, Priority.HIGH, relevance_score=0.61, label="doc_3", category="retrieved")

# Conversation history — keep the most recent (TAIL truncation)
assembler.add(
    chat_history,
    Priority.MEDIUM,
    label="history",
    truncation=TruncationStrategy.TAIL,
)

# Background context — dropped first if budget is tight
assembler.add(background_info, Priority.LOW, label="background")

# Assemble
result = assembler.assemble()

print(result.text)              # assembled context, ready to send
print(result.utilisation)       # 0.91 — 91% of budget used
print(result.items_excluded)    # items that didn't fit
print(result.tokens_used)       # 3641

Priority Levels

Priority Behaviour
REQUIRED Always included. Never dropped. Truncated only if it alone exceeds the budget.
HIGH Included before MEDIUM and LOW.
MEDIUM Included if budget allows.
LOW Dropped first when budget is tight.

Truncation Strategies

Strategy Keeps Best for
HEAD (default) Beginning of text Documents, articles
TAIL End of text Conversation history (keep recent)
MIDDLE Beginning + end Long docs with intro and conclusion
NONE Either fully included or fully excluded Items that must not be cut
assembler.add(
    chat_history,
    Priority.MEDIUM,
    truncation=TruncationStrategy.TAIL,   # keep most recent messages
)
assembler.add(
    tool_result,
    Priority.HIGH,
    truncation=TruncationStrategy.NONE,   # include fully or not at all
)

Token Counting

from context_forge import count_tokens, truncate_to_tokens

# Count tokens (approximation by default, precise with tiktoken installed)
tokens = count_tokens("Your text here")

# Truncate to a budget
truncated = truncate_to_tokens(long_text, max_tokens=500, strategy="tail")

With tiktoken installed (pip install 'context-forge[tiktoken]'), counting uses the actual tokenizer for your model. Without it, an approximation accurate to within ~10% is used.


AssemblyResult

result = assembler.assemble()

result.text              # the assembled context string
result.tokens_used       # tokens consumed
result.tokens_budget     # your original budget
result.tokens_remaining  # budget left over
result.utilisation       # fraction used (0.0–1.0)
result.items_included    # fully included ContextItems
result.items_truncated   # partially included (truncated) ContextItems
result.items_excluded    # items that didn't fit at all
result.items_dropped     # count of items not included at all

Preview Without Assembling

print(assembler.preview())

# Context Forge Preview — budget: 4000 tokens
# ───────────────────────────────────────────────────────
# PRIORITY     LABEL                TOKENS  SCORE
# ───────────────────────────────────────────────────────
# REQUIRED     system_prompt           120   0.50 ✓
# HIGH         doc_1                   850   0.95 ✓
# HIGH         doc_2                   720   0.82 ✓
# HIGH         doc_3                   680   0.61 ✓
# MEDIUM       history                1200   0.50 ✓
# LOW          background              850   0.50 ✗
# ───────────────────────────────────────────────────────
# Total                               4420 / 4000
#   ⚠  Over budget by 420 tokens — 420 tokens will be dropped/truncated

Chaining API

result = (
    ContextAssembler(token_budget=8000, reserve_for_output=1000)
    .add(system_prompt, Priority.REQUIRED, label="system")
    .add(doc_1, Priority.HIGH, relevance_score=0.95)
    .add(doc_2, Priority.HIGH, relevance_score=0.78)
    .add(history, Priority.MEDIUM, truncation=TruncationStrategy.TAIL)
    .assemble()
)

CLI

# Count tokens in a file or string
context-forge count "Your text here"
context-forge count --file document.txt

# Truncate to a token budget
context-forge truncate --file long_doc.txt --tokens 500 --strategy tail

# Preview assembly before building
context-forge preview --budget 4000 --file system.txt --file doc1.txt --file doc2.txt

Compared to Alternatives

context-forge headroom context-engineering-toolkit
pip install
Zero mandatory dependencies
No server/proxy required
Priority-based assembly
Truncation strategies
AssemblyResult metadata partial
Relevance score ordering

Linda Oraegbunam | LinkedIn | Twitter

About

Build optimal LLM context windows. Priority-based assembly, token budgeting, smart truncation (HEAD/TAIL/MIDDLE). Zero mandatory dependencies. Drop into any RAG pipeline in 3 lines.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages