ContextForge

ContextForge is a local-first Retrieval-Augmented Generation (RAG) system designed for deep document understanding, structured retrieval, and long-context conversations.

Most RAG systems rely purely on vector search, which can struggle with exact terminology, structured documents, and long-running conversations. ContextForge explores a hybrid approach that combines dense retrieval, sparse retrieval, and hierarchical conversational memory to improve answer grounding and contextual continuity.

The system is designed for developers and researchers experimenting with reliable long-context AI systems, especially in environments where privacy, transparency, and local inference are important.

Key Architecture

ContextForge utilizes a modular architecture with two distinct, selectable retrieval strategies and a multi-layered memory system.

1. Dual-Mode Retrieval Engine

The system supports two mutually exclusive retrieval modes, selectable at runtime (CLI) or via configuration (API).

Hybrid Mode (Default)
- Mechanism: Combines results from Dense Vector Search (mediated by ChromaDB) and Sparse Keyword Search (BM25).
- Fusion: Uses Reciprocal Rank Fusion (RRF) to merge and normalize scores from both retrievers.
- Refinement: Top candidates are passed through a Cross-Encoder Reranker (using sentence-transformers) to maximize relevance before generation.
- Use Case: Best for general-purpose queries requiring both semantic understanding and exact keyword matching.
SPLADE Mode (Experimental)
- Mechanism: Uses SPLADE (Sparse Lexical and Expansion Model) to generate learned sparse representations.
- Efficiency: Performs retrieval in a single pass using inverted indices, capturing both semantic meaning and specific terms without requiring a separate heavy reranking step.
- Use Case: High-efficiency environments or domains where learned sparse representations outperform traditional dense vectors.

2. Hierarchical Memory System

ContextForge maintains state across three levels to support long-running, context-heavy conversations:

Short-Term Memory: Buffers the immediate conversation history (recent turns) for direct context.
Episodic Memory: Archives past conversation turns in SQLite. Retrieves relevant historical interactions using FTS5 (Full-Text Search) based on the current query.
Semantic Profile: Extracts and persists enduring user facts and preferences to personalize responses over time.

3. Smart Document Processing

Logical Page Mapping: Automatically detects physical-to-logical page offsets in PDFs (e.g., matching the printed "Page 1" to physical page 12) for accurate citations.
Chapter Awareness: Scans and indexes document structures to support scoped queries like "summarize chapter 4" or "what is in the conclusion?".
Chunk Registry: Tracks the linear relationship between text chunks, allowing the system to expand context windows dynamically (retrieving previous/next chunks) during generation.

Installation

Prerequisites

Python 3.10+
Ollama (for local LLM inference) or a Groq API Key (for cloud inference).

Setup

Clone the Repository

git clone https://github.com/vijayabhaskar78/-ContextForge.git
cd contextforge-main

Install Dependencies
```
pip install -r final/requirements.txt
```
Configure Environment Copy the example configuration and edit it:
```
cp .env.example .env
```
To use cloud models, set USE_API_LLM=True and provide GROQ_API_KEY.

Configuration

The system is configured via environment variables or the .env file.

Variable	Default	Description
Retrieval
`RETRIEVAL_MODE`	`hybrid`	Selects the retrieval engine: `hybrid` or `splade`.
`Use_SEMANTIC_CHUNKING`	`False`	Toggles experimental semantic chunking (requires `langchain_experimental`).
`CHUNK_SIZE`	`1000`	Token size for document chunks.
Inference
`USE_API_LLM`	`False`	`True` for Groq API, `False` for local Ollama.
`LOCAL_MODEL_NAME`	`granite3.3:2b`	Model tag for local Ollama instance.
`API_MODEL_NAME`	`gemma2-9b-it`	Model ID for Groq API.
System
`DB_PATH`	`./data/vector_store_db`	Storage path for ChromaDB.

Usage

CLI Application

The CLI provides an interactive shell for document ingestion and Q&A.

Start the Application:
```
python final/src/contextforge/app.py
```
Workflow:
- Upload: Use option 1 to ingest PDF or text files.
- Chat: Use option 3 to start a session.
- Select Mode: You will be prompted to choose between Standard Hybrid or Experimental SPLADE mode for the session.

API Server

The FastAPI server exposes the RAG pipeline for external integrations.

Start the Server:
```
uvicorn final.src.contextforge.api_server:app --host 0.0.0.0 --port 8000
```
Note: The server uses the RETRIEVAL_MODE environment variable to determine the engine.
Endpoints:
- POST /query: Submit a question.
```
{
  "query": "Summarize the safety protocols",
  "session_id": "optional-client-session-id"
}
```
- GET /metrics: Retrieve system performance telemetry (latency, token usage, cost).

Docker Deployment

Run the complete system in a containerized environment.

Build Image:

docker build -t contextforge-api -f final/Dockerfile .

Run Container:

docker run -d -p 8000:8000 --env-file .env contextforge-api

Observability

The system logs comprehensive metrics to logs/metrics.jsonl, including:

Latency: Request processing time (ms).
Token Usage: Input and output token counts.
Cost Estimation: USD cost based on configured model pricing.
Error Rates: Tracking of failed requests.

These logs can be ingested by monitoring tools or analyzed directly for performance tuning.

Research Direction

ContextForge serves as an experimental platform for studying retrieval strategies and long-context reasoning systems. Future work explores:

• structured memory for long-running AI agents • hybrid sparse-dense retrieval strategies • document-aware reasoning pipelines • autonomous agents operating over large document collections

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
final		final
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ContextForge

Key Architecture

1. Dual-Mode Retrieval Engine

2. Hierarchical Memory System

3. Smart Document Processing

Installation

Prerequisites

Setup

Configuration

Usage

CLI Application

API Server

Docker Deployment

Observability

Research Direction

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ContextForge

Key Architecture

1. Dual-Mode Retrieval Engine

2. Hierarchical Memory System

3. Smart Document Processing

Installation

Prerequisites

Setup

Configuration

Usage

CLI Application

API Server

Docker Deployment

Observability

Research Direction

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages