A Streamlit application that transforms risk assessment documents into interactive knowledge graphs using Neo4j. Choose between rule-based extraction (free) or LLM-powered extraction (more accurate).
- Dual Extraction Methods:
- Rule-Based: Fast, free pattern matching
- LLM-Based: Intelligent extraction with GPT-4 or Claude (optional)
- PDF Processing: Extract and analyze risk documents
- Entity Recognition: Identify risks, controls, assets, and stakeholders
- Relationship Discovery: Map connections between entities
- Interactive Visualization: 3D graph with PyVis
- Quality Metrics: Evaluate extraction quality
- Export Options: JSON export and Neo4j queries
- Docker and Docker Compose installed
- At least 4GB of available RAM
- Ports 8501, 7474, 7687, and 4566 available
-
Clone the repository
git clone <repository-url> cd Risk-Assessment-KG-Streamlit
-
Set Neo4j Password (Optional - defaults to secure password)
export NEO4J_PASSWORD=YourSecurePasswordHere
-
Start the system
./start.sh
Or manually:
docker compose up -d # Wait for services to be healthy, then: docker exec risk-kg-localstack sh /docker-entrypoint-initaws.d/01-create-resources.sh
-
Access the application
- Streamlit App: http://localhost:8501
- Neo4j Browser: http://localhost:7474
- LocalStack: http://localhost:4566
- β Free - No API costs
- β Fast - Processes documents in seconds
- β Basic - Pattern matching only
- Perfect for quick analysis and testing
- β Intelligent - Understands context
- β Accurate - 85-95% accuracy
- β Reasoning - Explains why entities were extracted
- β Costs - ~$0.01-0.05 per page
# Set your API key before starting
export OPENAI_API_KEY="your-key-here"
# OR
export ANTHROPIC_API_KEY="your-key-here"
docker compose up -d
- Start the app normally
- Select "LLM-Based" in sidebar
- Enter your API key in the text field
- OpenAI: https://platform.openai.com/api-keys
- Anthropic: https://console.anthropic.com/
- Upload PDF: Click "Choose a PDF file" in sidebar
- Select Method:
- Rule-Based for free, quick analysis
- LLM-Based for detailed, accurate extraction
- Process: Click "Process Document"
- Explore:
- View interactive graph
- Browse entities and relationships
- Check quality metrics
- Export results
Risk-Assessment-KG-Streamlit/
βββ app.py # Main Streamlit application
βββ src/
β βββ document_processor.py # PDF text extraction
β βββ graph_generator.py # Rule-based extraction
β βββ llm_graph_generator.py # LLM-based extraction
β βββ neo4j_service.py # Graph database operations
β βββ visualizer.py # Graph visualization
β βββ graph_evaluator.py # Quality metrics
βββ data/ # Place PDF files here
βββ docker-compose.yml # Docker configuration
βββ Dockerfile # Container definition
βββ requirements.txt # Python dependencies
{
"label": "high risk",
"type": "RISK",
"confidence": 0.8
}
{
"label": "Supply Chain Disruption Risk",
"type": "RISK",
"confidence": 0.92,
"reasoning": "Identified as critical operational risk affecting procurement",
"evidence": "The supply chain disruption risk has increased by 40%..."
}
Entities:
- RISK: Threats, vulnerabilities, hazards
- CONTROL: Mitigations, safeguards
- ASSET: Systems, data, resources
- STAKEHOLDER: People, teams
- IMPACT: Consequences, effects
- COMPLIANCE: Standards, regulations
Relationships:
- MITIGATES: Control reduces risk
- AFFECTS: Risk impacts asset
- OWNS: Stakeholder owns asset/control
- REQUIRES: Dependencies
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install openai anthropic # For LLM support
# Download spaCy model
python -m spacy download en_core_web_sm
# Set Neo4j connection
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=password123
# Run app
streamlit run app.py
# Build and start
docker compose up --build -d
# View logs
docker compose logs -f
# Stop
docker compose down
# Reset everything
docker compose down -v
- Start with Rule-Based: Test your document first with free extraction
- Use LLM for Important Docs: When accuracy matters, use LLM extraction
- Cost Control: LLM mode processes only first 5 chunks by default
- Privacy: LLM mode sends content to OpenAI/Anthropic APIs
Neo4j Connection Issues:
# Check if Neo4j is running
docker compose ps
# View Neo4j logs
docker compose logs neo4j
API Key Issues:
- Ensure key has credits
- Check usage: OpenAI Dashboard / Anthropic Console
Performance:
- Increase chunk limit in app.py for longer documents
- Use GPU for faster spaCy processing
- Why Two Methods? Rule-based is great for quick analysis and when you can't use external APIs. LLM-based provides superior accuracy when you need it.
- Data Privacy: Rule-based processing is 100% local. LLM-based sends data to API providers.
- Customization: Edit patterns in
graph_generator.py
or prompts inllm_graph_generator.py
- Hybrid extraction (rules + LLM)
- Custom entity types
- Batch processing
- Fine-tuned models
- Export to other formats
This project is for educational/portfolio purposes.
Built for AI Engineers and Risk Management Professionals