Skip to content

encryptedtouhid/threat-intel-graph-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Threat Intelligence Graph RAG

A fully containerized Graph RAG application for cybersecurity threat intelligence, powered by local LLMs via Ollama.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                              Docker Network                                 │
│                                                                             │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌───────────┐  │
│  │   Frontend  │     │   Backend   │     │    Neo4j    │     │  Ollama   │  │
│  │   (Nginx)   │────▶│  (FastAPI)  │────▶│  (Graph DB) │     │  (LLM)    │  │
│  │   Port 8501 │     │  Port 8000  │     │  Port 7474  │     │ Port 11434│  │
│  └─────────────┘     └──────┬──────┘     └─────────────┘     └───────────┘  │
│                             │                                       ▲       │
│                             └───────────────────────────────────────┘       │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                        Data Ingestion (Job)                         │    │
│  │              Loads MITRE ATT&CK data into Neo4j on startup          │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────┘

Components

1. Neo4j (Graph Database)

  • Image: neo4j:5.15-community
  • Purpose: Stores threat intelligence as a knowledge graph
  • Ports:
    • 7474 - Browser UI
    • 7687 - Bolt protocol
  • Data: Persisted via Docker volume

2. Ollama (Local LLM)

  • Image: ollama/ollama:latest
  • Model: mistral:7b
  • Purpose:
    • LLM for natural language understanding and generation
    • Embedding generation via nomic-embed-text
  • Port: 11434
  • Note: Runs on CPU (no GPU available), 8GB memory limit

3. Backend API (FastAPI)

  • Image: Custom Python image
  • Purpose:
    • REST API for frontend
    • RAG pipeline orchestration
    • Cypher query generation from natural language
    • Graph traversal and context retrieval
  • Port: 8000
  • Endpoints:
    POST /query                              - Natural language query
    GET  /graph/stats                        - Graph statistics
    GET  /graph/actors                       - List threat actors
    GET  /graph/techniques                   - List techniques
    GET  /graph/actors/{name}/techniques     - Get actor's techniques
    GET  /graph/actors/{name}/attack-path    - Get actor's kill chain
    GET  /graph/techniques/{id}/mitigations  - Get technique mitigations
    GET  /graph/search?q=                    - Search across all entities
    GET  /graph/visualize                    - Get graph data for visualization
    GET  /health                             - Health check
    

4. Frontend (Nginx + Static Web App)

  • Image: Nginx Alpine
  • Purpose: Modern web UI for querying threat intelligence
  • Port: 8501 (mapped from internal port 80)
  • Tech Stack:
    • HTML5/CSS3/JavaScript
    • jQuery for AJAX requests
    • Chart.js for statistics visualization
    • vis-network for interactive graph visualization
    • marked.js for markdown rendering
  • Features:
    • Query Page: Natural language queries with example suggestions
    • Explore Page: Browse threat actors, techniques, and search
    • Graph Map: Interactive network visualization with filtering
    • Statistics: Charts showing node/relationship distribution

5. Data Ingestion (Init Job)

  • Image: Custom Python image
  • Purpose: One-time job to load MITRE ATT&CK data
  • Data Sources:
    • MITRE ATT&CK Enterprise (STIX format)
    • Relationships: Actors → Techniques → Tactics → Mitigations

Graph Schema

Nodes

Label Properties Description
ThreatActor id, name, description, aliases, country APT groups, criminal orgs
Technique id, name, description, platforms, detection ATT&CK techniques
Tactic id, name, description, shortname ATT&CK tactics (kill chain phases)
Malware id, name, description, platforms Malware families
Tool id, name, description Legitimate tools used maliciously
Mitigation id, name, description Defensive measures

Relationships

(:ThreatActor)-[:USES]->(:Technique)
(:ThreatActor)-[:USES]->(:Malware)
(:ThreatActor)-[:USES]->(:Tool)
(:Technique)-[:BELONGS_TO]->(:Tactic)
(:Technique)-[:MITIGATED_BY]->(:Mitigation)
(:Malware)-[:EMPLOYS]->(:Technique)
(:Tool)-[:EMPLOYS]->(:Technique)

RAG Pipeline

User Query
    │
    ▼
┌─────────────────────┐
│ 1. Query Analysis   │  ← Ollama extracts intent & entities
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 2. Graph Retrieval  │  ← Cypher query against Neo4j
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 3. Context Building │  ← Combine graph results + embeddings
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 4. Response Gen     │  ← Ollama generates final answer
└─────────────────────┘

Query Examples

Natural Language Query Graph Retrieval
"What techniques does APT29 use?" Match path from actor to techniques
"How do I defend against phishing?" Find mitigations for T1566
"Which actors target healthcare?" Filter actors by target industry
"Show the kill chain for Lazarus" Traverse actor → techniques → tactics

Project Structure

graph-rag/
├── docker-compose.yml
├── .env.example
├── Makefile                     # Useful commands
├── README.md
│
├── backend/
│   ├── Dockerfile
│   ├── requirements.txt
│   └── app/
│       ├── main.py              # FastAPI app
│       ├── config.py            # Settings
│       ├── routers/
│       │   ├── query.py         # Query endpoints
│       │   └── graph.py         # Graph endpoints
│       ├── services/
│       │   ├── neo4j_service.py # Graph operations
│       │   ├── ollama_service.py# LLM operations
│       │   └── rag_pipeline.py  # RAG orchestration
│       └── models/
│           └── schemas.py       # Pydantic models
│
├── frontend/
│   ├── Dockerfile
│   ├── nginx.conf               # Nginx configuration
│   ├── index.html               # Main HTML page
│   ├── css/
│   │   └── style.css            # Styles
│   └── js/
│       └── app.js               # JavaScript application
│
├── ingestion/
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── ingest.py                # Main ingestion script
│   └── parsers/
│       └── mitre_attack.py      # MITRE ATT&CK parser
│
└── data/
    └── .gitkeep                 # Downloaded data stored here

Configuration

Environment Variables

# Neo4j
NEO4J_URI=bolt://neo4j:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=threatintel123

# Ollama
OLLAMA_HOST=http://ollama:11434
OLLAMA_MODEL=mistral:7b
OLLAMA_EMBED_MODEL=nomic-embed-text

# Backend
LOG_LEVEL=INFO

Deployment

Prerequisites

  • Docker & Docker Compose installed on target machine
  • At least 16GB RAM (for Ollama + Neo4j)
  • ~10GB disk space

Quick Start

# Clone repository
git clone https://github.com/encryptedtouhid/graph-rag.git
cd graph-rag

# Copy environment file
cp .env.example .env

# Start all services
docker-compose up -d

# Watch logs
docker-compose logs -f

# Access services
# - Frontend: http://localhost:8501
# - Backend API: http://localhost:8000/docs
# - Neo4j Browser: http://localhost:7474

First Run

  1. Ollama init container will auto-pull mistral:7b and nomic-embed-text models
  2. Ingestion job loads MITRE ATT&CK data into Neo4j
  3. System ready when all health checks pass

Makefile Commands

make help          # Show all available commands
make build         # Build all Docker images
make up            # Start all services in background
make up-logs       # Start all services with logs
make down          # Stop all services
make logs          # View logs from all services
make logs-backend  # View backend logs only
make status        # Show status of all services
make restart       # Restart all services
make clean         # Stop and remove containers, volumes, images
make rebuild       # Clean rebuild and start
make shell-backend # Open shell in backend container
make shell-neo4j   # Open cypher-shell in Neo4j
make reset-db      # Clear database and re-run ingestion

Future Enhancements

  • Add IOC ingestion (AlienVault OTX)
  • Add CVE/NVD data
  • Implement semantic search with vector index
  • Add query caching
  • Add authentication
  • Kubernetes deployment manifests
  • GPU support for Ollama

About

Graph RAG application for cybersecurity threat intelligence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published