A comprehensive system for managing enterprise requirements, generating test cases, and tracking changes in banking and financial services projects. Built with privacy and compliance in mind, this system operates entirely on-premises and can work with local LLMs, Anthropic Claude, Google Gemini, or OpenAI.
# Clone the repository
git clone https://github.com/ArshanBhanage/TraceQ.git
cd TraceQA
# Start with Docker Compose
docker-compose up -d
# Or run locally
cd backend
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000Enterprise banking projects face several challenges:
- Multiple teams and companies working on different environments (Dev, SIT, UAT)
- 500+ test cases need to be generated from confidential client requirements (FSDs)
- Frequent requirement changes via addendums, annexures, and email communications
- Time-consuming bug triage due to difficulty tracking requirement changes
- Compliance requirements for government banks that don't trust cloud/AI solutions
- Document Processing Service - Extracts text from PDF, DOCX, and plain text files
- RAG (Retrieval-Augmented Generation) Service - Chunks, embeds, and searches requirements
- Requirements Versioning - Tracks timeline of changes with semantic diffing
- Test Generation - Creates test cases using retrieved context and LLM
- Background Processing - Handles large operations asynchronously
- Provider Abstraction - Switchable between local (Ollama), Anthropic Claude, Google Gemini, and OpenAI
graph TB
%% Input Layer
subgraph "π Document Input"
A[PDF/DOCX Files] --> B[Document Upload API]
B --> C[Text Extraction Service]
end
%% Processing Layer
subgraph "π Processing Pipeline"
C --> D[Document Chunking]
D --> E[Embedding Generation]
E --> F[Vector Database Storage]
F --> G[Version Control System]
end
%% AI Layer
subgraph "π€ AI Services"
H[LLM Provider<br/>Claude/Gemini/OpenAI/Ollama]
I[RAG Service]
J[Test Generation Engine]
K[Change Analysis Engine]
end
%% Storage Layer
subgraph "πΎ Data Storage"
F --> L[Pinecone Vector DB]
G --> M[Timeline Storage]
N[Document Metadata]
O[Test Case Repository]
end
%% Output Layer
subgraph "π Output & Analytics"
P[Generated Test Cases]
Q[Change Impact Reports]
R[Fact-Checking Results]
S[Timeline Visualization]
end
%% User Interface
subgraph "π₯ User Interface"
T[Web Dashboard]
U[API Endpoints]
V[Background Jobs]
end
%% Connections
B --> T
I --> F
I --> H
J --> H
J --> I
K --> H
K --> I
J --> P
K --> Q
I --> R
G --> S
T --> U
U --> V
%% Styling
classDef inputStyle fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef processStyle fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef aiStyle fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef storageStyle fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px
classDef outputStyle fill:#fce4ec,stroke:#880e4f,stroke-width:2px
classDef uiStyle fill:#f1f8e9,stroke:#33691e,stroke-width:2px
class A,B,C inputStyle
class D,E,F,G processStyle
class H,I,J,K aiStyle
class L,M,N,O storageStyle
class P,Q,R,S outputStyle
class T,U,V uiStyle
The system follows a modern microservices architecture with clear separation of concerns:
- π Document Ingestion: Multi-format document processing with automatic text extraction
- π Processing Pipeline: Intelligent chunking, embedding generation, and vector storage
- π€ AI Services: Flexible LLM integration with RAG-powered context retrieval
- πΎ Data Storage: Scalable vector database with version control and metadata tracking
- π Output & Analytics: Comprehensive reporting and visualization capabilities
- π₯ User Interface: RESTful APIs with real-time dashboard and background processing
- Supported Formats: PDF, DOCX, Plain Text
- Automatic Text Extraction: Uses python-magic for format detection
- Journey-based Organization: Group requirements by business process (e.g., "Point of Settlement")
- Metadata Tracking: Source type, effective date, notes, auto-generated summaries
- Semantic Search: Find relevant requirements using natural language queries
- RAG-powered: Combines vector similarity with LLM reasoning
- Context-aware: Retrieves chunks with relevance scoring and categorization
- Version Timeline: Complete history of requirement changes
- Semantic Diffing: LLM-powered analysis of functional vs. cosmetic changes
- Impact Assessment: Automatic evaluation of how changes affect existing tests
- Recommendations: Actionable guidance for test updates
- Context-aware: Uses retrieved requirements to generate relevant tests
- Batch Processing: Handle 500+ test cases with background jobs
- Multiple Scenarios: Positive, negative, boundary, and compliance cases
- Traceability: Link tests back to source requirements
- Evidence Retrieval: Find supporting documentation for claims
- Confidence Scoring: Assess strength of evidence
- Source Attribution: Track document origins and timestamps
- Triage Acceleration: Reduce time spent on bug investigation
- Python 3.8+
- Ollama (for local LLM support)
- FastAPI + Uvicorn
- Clone and Install Dependencies
git clone <repository>
cd enterprise-requirements-ai/backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt- Environment Configuration
# LLM Provider (claude, gemini, ollama, or openai)
export LLM_PROVIDER=claude
# For Claude (default)
export ANTHROPIC_API_KEY=your_anthropic_api_key_here
# For Gemini (optional)
export GEMINI_API_KEY=your_gemini_api_key_here
# For OpenAI (optional)
export OPENAI_API_KEY=your_openai_api_key_here
export EMBED_MODEL=text-embedding-3-small
# For Ollama (local)
export LLM_PROVIDER=ollama
export EMBED_MODEL=nomic-embed-text
# Storage paths (customizable)
export OBJECT_STORE=/path/to/document/storage
export RAG_INDEX_DIR=/path/to/rag/index
export REQ_VERSIONS_DIR=/path/to/versions- Start Ollama (if using local LLMs)
ollama serve
ollama pull llama3.1:8b-instruct
ollama pull nomic-embed-text- Run the Application
cd backend
source venv/bin/activate
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000GET /api/requirements/provider-infoPOST /api/requirements/test-claudePOST /api/requirements/ingest
{
"journey": "Point of Settlement",
"document_uri": "/path/to/fsd.pdf",
"source_type": "fsd",
"effective_date": "2024-01-15",
"notes": "Initial FSD for POS journey"
}POST /api/requirements/search
{
"journey": "Point of Settlement",
"query": "settlement failure scenarios",
"top_k": 10
}POST /api/requirements/analyze-changes
{
"journey": "Point of Settlement",
"from_version": "20240115T100000Z-fsd",
"to_version": "20240120T140000Z-addendum"
}GET /api/requirements/timeline/Point of SettlementPOST /api/requirements/fact-check
{
"journey": "Point of Settlement",
"claim": "Settlement must complete within 2 hours",
"top_k": 10
}POST /api/tests/generate
{
"journey": "Point of Settlement",
"max_cases": 100,
"context_top_k": 20,
"provider": "gemini"
}POST /api/background/batch-test-generation?journey=Point of Settlement&max_cases=500GET /api/background/tasks/{task_id}GET /api/background/tasksDELETE /api/background/tasks/{task_id}- No Cloud Dependencies: All processing happens locally (unless using Claude/Gemini/OpenAI)
- File-based Storage: Documents stored on local filesystem
- Air-gapped Capable: Can operate without internet access when using Ollama
- Audit Trail: Complete logging of all operations
- Local Embeddings: Vector embeddings generated and stored locally
- Provider Selection: Choose between local (Ollama), Claude, Gemini, or OpenAI
- Document Isolation: Each journey's data is separately indexed
- Access Control: File system permissions control document access
- Version Control: Complete audit trail of requirement changes
- Source Attribution: Track origin of every requirement
- Change Impact: Assess effect of modifications on existing tests
- Evidence Chain: Link tests back to source requirements
# 1. Upload FSD document
curl -X POST "http://localhost:8000/api/upload" \
-F "file=@fsd_point_of_settlement.pdf"
# 2. Ingest the requirement
curl -X POST "http://localhost:8000/api/requirements/ingest" \
-H "Content-Type: application/json" \
-d '{
"journey": "Point of Settlement",
"document_uri": "/path/to/uploaded/fsd.pdf",
"source_type": "fsd",
"effective_date": "2024-01-15"
}'
# 3. Generate initial test cases
curl -X POST "http://localhost:8000/api/tests/generate" \
-H "Content-Type: application/json" \
-d '{
"journey": "Point of Settlement",
"max_cases": 100
}'# Check which provider is active
curl -X GET "http://localhost:8000/api/requirements/provider-info"
# Test Claude with a simple prompt
curl -X POST "http://localhost:8000/api/requirements/test-claude"# 1. Ingest addendum
curl -X POST "http://localhost:8000/api/requirements/ingest" \
-H "Content-Type: application/json" \
-d '{
"journey": "Point of Settlement",
"document_uri": "/path/to/addendum.pdf",
"source_type": "addendum",
"effective_date": "2024-01-20"
}'
# 2. Analyze impact of changes
curl -X POST "http://localhost:8000/api/requirements/analyze-changes" \
-H "Content-Type: application/json" \
-d '{
"journey": "Point of Settlement",
"from_version": "20240115T100000Z-fsd",
"to_version": "20240120T140000Z-addendum"
}'
# 3. Regenerate affected tests
curl -X POST "http://localhost:8000/api/tests/generate" \
-H "Content-Type: application/json" \
-d '{
"journey": "Point of Settlement",
"max_cases": 50,
"context_top_k": 30
}'# 1. Fact-check a claim during triage
curl -X POST "http://localhost:8000/api/requirements/fact-check" \
-H "Content-Type: application/json" \
-d '{
"journey": "Point of Settlement",
"claim": "Settlement timeout is configurable",
"top_k": 15
}'
# 2. Get timeline to understand context
curl -X GET "http://localhost:8000/api/requirements/timeline/Point of Settlement"# Use Anthropic Claude (default)
export LLM_PROVIDER=claude
export ANTHROPIC_API_KEY=your_api_key_here
# Use local Ollama models
export LLM_PROVIDER=ollama
export EMBED_MODEL=nomic-embed-text
# Use OpenAI (requires internet)
export LLM_PROVIDER=openai
export OPENAI_API_KEY=your_key
export EMBED_MODEL=text-embedding-3-small# Customize storage locations
export OBJECT_STORE=/enterprise/docs
export RAG_INDEX_DIR=/enterprise/rag
export REQ_VERSIONS_DIR=/enterprise/versions# Background processing workers
export MAX_WORKERS=8
# RAG chunk sizes
export MAX_CHUNK_TOKENS=800
export CHUNK_OVERLAP=120cd backend
source venv/bin/activate
python -m pytest tests/python -m pytest --cov=app tests/GET /api/healthGET /api/background/tasks
GET /api/requirements/supported-formats
GET /api/requirements/provider-info# Clean up old background tasks
curl -X POST "http://localhost:8000/api/background/cleanup-completed?max_age_hours=24"
# Clean up old document versions
curl -X POST "http://localhost:8000/api/background/document-cleanup?journey=Point of Settlement&older_than_days=90"- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.